Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria.

Saved in:
Bibliographic Details
Title: Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria.
Authors: Choi, Jiyoung1 jyc2173@tc.columbia.edu, Moya-Galé, Gemma1, Hwang, KyungHae2, Hirschberg, Julia3, Levya, Erika S.1
Source: Journal of Speech, Language & Hearing Research. Apr2026, Vol. 69 Issue 4, p1438-1454. 17p.
Subject Terms: *Dysarthria, *Data analysis, *Intelligibility of speech, *Listening, *Speech evaluation, *Research, *Speech perception, *Children, Automatic speech recognition, Cerebral palsy, Descriptive statistics, Statistics, Judgment (Psychology), Data analysis software, Regression analysis, Disease complications
Abstract: Purpose: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners' orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners' judgments. Method: Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners' transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings. Results: The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems' WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings. Conclusions: ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications. Supplemental Material: https://doi.org/10.23641/asha.31397457 [ABSTRACT FROM AUTHOR]
Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: ehh
DbLabel: Education Research Complete
An: 192982170
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Choi%2C+Jiyoung%22">Choi, Jiyoung</searchLink><relatesTo>1</relatesTo><i> jyc2173@tc.columbia.edu</i><br /><searchLink fieldCode="AR" term="%22Moya-Galé%2C+Gemma%22">Moya-Galé, Gemma</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Hwang%2C+KyungHae%22">Hwang, KyungHae</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Hirschberg%2C+Julia%22">Hirschberg, Julia</searchLink><relatesTo>3</relatesTo><br /><searchLink fieldCode="AR" term="%22Levya%2C+Erika+S%2E%22">Levya, Erika S.</searchLink><relatesTo>1</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Apr2026, Vol. 69 Issue 4, p1438-1454. 17p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *<searchLink fieldCode="DE" term="%22Dysarthria%22">Dysarthria</searchLink><br />*<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Intelligibility+of+speech%22">Intelligibility of speech</searchLink><br />*<searchLink fieldCode="DE" term="%22Listening%22">Listening</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+evaluation%22">Speech evaluation</searchLink><br />*<searchLink fieldCode="DE" term="%22Research%22">Research</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Children%22">Children</searchLink><br /><searchLink fieldCode="DE" term="%22Automatic+speech+recognition%22">Automatic speech recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Cerebral+palsy%22">Cerebral palsy</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Judgment+%28Psychology%29%22">Judgment (Psychology)</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analysis+software%22">Data analysis software</searchLink><br /><searchLink fieldCode="DE" term="%22Regression+analysis%22">Regression analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Disease+complications%22">Disease complications</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Purpose: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners' orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners' judgments. Method: Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners' transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings. Results: The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems' WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings. Conclusions: ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications. Supplemental Material: https://doi.org/10.23641/asha.31397457 [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=192982170
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1044/2025_JSLHR-25-00562
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 17
        StartPage: 1438
    Subjects:
      – SubjectFull: Dysarthria
        Type: general
      – SubjectFull: Data analysis
        Type: general
      – SubjectFull: Intelligibility of speech
        Type: general
      – SubjectFull: Listening
        Type: general
      – SubjectFull: Speech evaluation
        Type: general
      – SubjectFull: Research
        Type: general
      – SubjectFull: Speech perception
        Type: general
      – SubjectFull: Children
        Type: general
      – SubjectFull: Automatic speech recognition
        Type: general
      – SubjectFull: Cerebral palsy
        Type: general
      – SubjectFull: Descriptive statistics
        Type: general
      – SubjectFull: Statistics
        Type: general
      – SubjectFull: Judgment (Psychology)
        Type: general
      – SubjectFull: Data analysis software
        Type: general
      – SubjectFull: Regression analysis
        Type: general
      – SubjectFull: Disease complications
        Type: general
    Titles:
      – TitleFull: Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Choi, Jiyoung
      – PersonEntity:
          Name:
            NameFull: Moya-Galé, Gemma
      – PersonEntity:
          Name:
            NameFull: Hwang, KyungHae
      – PersonEntity:
          Name:
            NameFull: Hirschberg, Julia
      – PersonEntity:
          Name:
            NameFull: Levya, Erika S.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 04
              Text: Apr2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 10924388
          Numbering:
            – Type: volume
              Value: 69
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Journal of Speech, Language & Hearing Research
              Type: main
ResultId 1