Transformer-based structuring of free-text radiology report databases.

Saved in:
Bibliographic Details
Title: Transformer-based structuring of free-text radiology report databases.
Authors: Nowak, S.1 (AUTHOR) sebastian.nowak@ukbonn.de, Biesner, D.2 (AUTHOR), Layer, Y. C.1 (AUTHOR), Theis, M.1 (AUTHOR), Schneider, H.2 (AUTHOR), Block, W.1 (AUTHOR), Wulff, B.2 (AUTHOR), Attenberger, U. I.1 (AUTHOR), Sifa, R.2 (AUTHOR), Sprinkart, A. M.1 (AUTHOR)
Source: European Radiology. Jun2023, Vol. 33 Issue 6, p4228-4236. 9p. 1 Diagram, 3 Charts, 1 Graph.
Subjects: Natural language processing, Intensive care units, Medical databases, Databases, Radiology
Abstract: Objectives: To provide insights for on-site development of transformer-based structuring of free-text report databases by investigating different labeling and pre-training strategies. Methods: A total of 93,368 German chest X-ray reports from 20,912 intensive care unit (ICU) patients were included. Two labeling strategies were investigated to tag six findings of the attending radiologist. First, a system based on human-defined rules was applied for annotation of all reports (termed "silver labels"). Second, 18,000 reports were manually annotated in 197 h (termed "gold labels") of which 10% were used for testing. An on-site pre-trained model (Tmlm) using masked-language modeling (MLM) was compared to a public, medically pre-trained model (Tmed). Both models were fine-tuned on silver labels only, gold labels only, and first with silver and then gold labels (hybrid training) for text classification, using varying numbers (N: 500, 1000, 2000, 3500, 7000, 14,580) of gold labels. Macro-averaged F1-scores (MAF1) in percent were calculated with 95% confidence intervals (CI). Results: Tmlm,gold (95.5 [94.5–96.3]) showed significantly higher MAF1 than Tmed,silver (75.0 [73.4–76.5]) and Tmlm,silver (75.2 [73.6–76.7]), but not significantly higher MAF1 than Tmed,gold (94.7 [93.6–95.6]), Tmed,hybrid (94.9 [93.9–95.8]), and Tmlm,hybrid (95.2 [94.3–96.0]). When using 7000 or less gold-labeled reports, Tmlm,gold (N: 7000, 94.7 [93.5–95.7]) showed significantly higher MAF1 than Tmed,gold (N: 7000, 91.5 [90.0–92.8]). With at least 2000 gold-labeled reports, utilizing silver labels did not lead to significant improvement of Tmlm,hybrid (N: 2000, 91.8 [90.4–93.2]) over Tmlm,gold (N: 2000, 91.4 [89.9–92.8]). Conclusions: Custom pre-training of transformers and fine-tuning on manual annotations promises to be an efficient strategy to unlock report databases for data-driven medicine. Key Points: • On-site development of natural language processing methods that retrospectively unlock free-text databases of radiology clinics for data-driven medicine is of great interest. • For clinics seeking to develop methods on-site for retrospective structuring of a report database of a certain department, it remains unclear which of previously proposed strategies for labeling reports and pre-training models is the most appropriate in context of, e.g., available annotator time. • Using a custom pre-trained transformer model, along with a little annotation effort, promises to be an efficient way to retrospectively structure radiological databases, even if not millions of reports are available for pre-training. [ABSTRACT FROM AUTHOR]
Copyright of European Radiology is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: egs
DbLabel: Engineering Source
An: 163727665
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Transformer-based structuring of free-text radiology report databases.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Nowak%2C+S%2E%22">Nowak, S.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> sebastian.nowak@ukbonn.de</i><br /><searchLink fieldCode="AR" term="%22Biesner%2C+D%2E%22">Biesner, D.</searchLink><relatesTo>2</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Layer%2C+Y%2E+C%2E%22">Layer, Y. C.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Theis%2C+M%2E%22">Theis, M.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Schneider%2C+H%2E%22">Schneider, H.</searchLink><relatesTo>2</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Block%2C+W%2E%22">Block, W.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Wulff%2C+B%2E%22">Wulff, B.</searchLink><relatesTo>2</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Attenberger%2C+U%2E+I%2E%22">Attenberger, U. I.</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Sifa%2C+R%2E%22">Sifa, R.</searchLink><relatesTo>2</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Sprinkart%2C+A%2E+M%2E%22">Sprinkart, A. M.</searchLink><relatesTo>1</relatesTo> (AUTHOR)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22European+Radiology%22">European Radiology</searchLink>. Jun2023, Vol. 33 Issue 6, p4228-4236. 9p. 1 Diagram, 3 Charts, 1 Graph.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Natural+language+processing%22">Natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Intensive+care+units%22">Intensive care units</searchLink><br /><searchLink fieldCode="DE" term="%22Medical+databases%22">Medical databases</searchLink><br /><searchLink fieldCode="DE" term="%22Databases%22">Databases</searchLink><br /><searchLink fieldCode="DE" term="%22Radiology%22">Radiology</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Objectives: To provide insights for on-site development of transformer-based structuring of free-text report databases by investigating different labeling and pre-training strategies. Methods: A total of 93,368 German chest X-ray reports from 20,912 intensive care unit (ICU) patients were included. Two labeling strategies were investigated to tag six findings of the attending radiologist. First, a system based on human-defined rules was applied for annotation of all reports (termed "silver labels"). Second, 18,000 reports were manually annotated in 197 h (termed "gold labels") of which 10% were used for testing. An on-site pre-trained model (Tmlm) using masked-language modeling (MLM) was compared to a public, medically pre-trained model (Tmed). Both models were fine-tuned on silver labels only, gold labels only, and first with silver and then gold labels (hybrid training) for text classification, using varying numbers (N: 500, 1000, 2000, 3500, 7000, 14,580) of gold labels. Macro-averaged F1-scores (MAF1) in percent were calculated with 95% confidence intervals (CI). Results: Tmlm,gold (95.5 [94.5–96.3]) showed significantly higher MAF1 than Tmed,silver (75.0 [73.4–76.5]) and Tmlm,silver (75.2 [73.6–76.7]), but not significantly higher MAF1 than Tmed,gold (94.7 [93.6–95.6]), Tmed,hybrid (94.9 [93.9–95.8]), and Tmlm,hybrid (95.2 [94.3–96.0]). When using 7000 or less gold-labeled reports, Tmlm,gold (N: 7000, 94.7 [93.5–95.7]) showed significantly higher MAF1 than Tmed,gold (N: 7000, 91.5 [90.0–92.8]). With at least 2000 gold-labeled reports, utilizing silver labels did not lead to significant improvement of Tmlm,hybrid (N: 2000, 91.8 [90.4–93.2]) over Tmlm,gold (N: 2000, 91.4 [89.9–92.8]). Conclusions: Custom pre-training of transformers and fine-tuning on manual annotations promises to be an efficient strategy to unlock report databases for data-driven medicine. Key Points: • On-site development of natural language processing methods that retrospectively unlock free-text databases of radiology clinics for data-driven medicine is of great interest. • For clinics seeking to develop methods on-site for retrospective structuring of a report database of a certain department, it remains unclear which of previously proposed strategies for labeling reports and pre-training models is the most appropriate in context of, e.g., available annotator time. • Using a custom pre-trained transformer model, along with a little annotation effort, promises to be an efficient way to retrospectively structure radiological databases, even if not millions of reports are available for pre-training. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of European Radiology is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=163727665
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s00330-023-09526-y
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 9
        StartPage: 4228
    Subjects:
      – SubjectFull: Natural language processing
        Type: general
      – SubjectFull: Intensive care units
        Type: general
      – SubjectFull: Medical databases
        Type: general
      – SubjectFull: Databases
        Type: general
      – SubjectFull: Radiology
        Type: general
    Titles:
      – TitleFull: Transformer-based structuring of free-text radiology report databases.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Nowak, S.
      – PersonEntity:
          Name:
            NameFull: Biesner, D.
      – PersonEntity:
          Name:
            NameFull: Layer, Y. C.
      – PersonEntity:
          Name:
            NameFull: Theis, M.
      – PersonEntity:
          Name:
            NameFull: Schneider, H.
      – PersonEntity:
          Name:
            NameFull: Block, W.
      – PersonEntity:
          Name:
            NameFull: Wulff, B.
      – PersonEntity:
          Name:
            NameFull: Attenberger, U. I.
      – PersonEntity:
          Name:
            NameFull: Sifa, R.
      – PersonEntity:
          Name:
            NameFull: Sprinkart, A. M.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 06
              Text: Jun2023
              Type: published
              Y: 2023
          Identifiers:
            – Type: issn-print
              Value: 09387994
          Numbering:
            – Type: volume
              Value: 33
            – Type: issue
              Value: 6
          Titles:
            – TitleFull: European Radiology
              Type: main
ResultId 1