Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria.
Saved in:
| Title: | Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria. |
|---|---|
| Authors: | Choi, Jiyoung1 jyc2173@tc.columbia.edu, Moya-Galé, Gemma1, Hwang, KyungHae2, Hirschberg, Julia3, Levya, Erika S.1 |
| Source: | Journal of Speech, Language & Hearing Research. Apr2026, Vol. 69 Issue 4, p1438-1454. 17p. |
| Subject Terms: | *Dysarthria, *Data analysis, *Intelligibility of speech, *Listening, *Speech evaluation, *Research, *Speech perception, *Children, Automatic speech recognition, Cerebral palsy, Descriptive statistics, Statistics, Judgment (Psychology), Data analysis software, Regression analysis, Disease complications |
| Abstract: | Purpose: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners' orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners' judgments. Method: Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners' transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings. Results: The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems' WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings. Conclusions: ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications. Supplemental Material: https://doi.org/10.23641/asha.31397457 [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Education Research Complete |
| FullText | Links: – Type: pdflink Text: Availability: 0 |
|---|---|
| Header | DbId: ehh DbLabel: Education Research Complete An: 192982170 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Choi%2C+Jiyoung%22">Choi, Jiyoung</searchLink><relatesTo>1</relatesTo><i> jyc2173@tc.columbia.edu</i><br /><searchLink fieldCode="AR" term="%22Moya-Galé%2C+Gemma%22">Moya-Galé, Gemma</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Hwang%2C+KyungHae%22">Hwang, KyungHae</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Hirschberg%2C+Julia%22">Hirschberg, Julia</searchLink><relatesTo>3</relatesTo><br /><searchLink fieldCode="AR" term="%22Levya%2C+Erika+S%2E%22">Levya, Erika S.</searchLink><relatesTo>1</relatesTo> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Apr2026, Vol. 69 Issue 4, p1438-1454. 17p. – Name: Subject Label: Subject Terms Group: Su Data: *<searchLink fieldCode="DE" term="%22Dysarthria%22">Dysarthria</searchLink><br />*<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Intelligibility+of+speech%22">Intelligibility of speech</searchLink><br />*<searchLink fieldCode="DE" term="%22Listening%22">Listening</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+evaluation%22">Speech evaluation</searchLink><br />*<searchLink fieldCode="DE" term="%22Research%22">Research</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Children%22">Children</searchLink><br /><searchLink fieldCode="DE" term="%22Automatic+speech+recognition%22">Automatic speech recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Cerebral+palsy%22">Cerebral palsy</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Judgment+%28Psychology%29%22">Judgment (Psychology)</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analysis+software%22">Data analysis software</searchLink><br /><searchLink fieldCode="DE" term="%22Regression+analysis%22">Regression analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Disease+complications%22">Disease complications</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Purpose: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners' orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners' judgments. Method: Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners' transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings. Results: The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems' WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings. Conclusions: ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications. Supplemental Material: https://doi.org/10.23641/asha.31397457 [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=192982170 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1044/2025_JSLHR-25-00562 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 17 StartPage: 1438 Subjects: – SubjectFull: Dysarthria Type: general – SubjectFull: Data analysis Type: general – SubjectFull: Intelligibility of speech Type: general – SubjectFull: Listening Type: general – SubjectFull: Speech evaluation Type: general – SubjectFull: Research Type: general – SubjectFull: Speech perception Type: general – SubjectFull: Children Type: general – SubjectFull: Automatic speech recognition Type: general – SubjectFull: Cerebral palsy Type: general – SubjectFull: Descriptive statistics Type: general – SubjectFull: Statistics Type: general – SubjectFull: Judgment (Psychology) Type: general – SubjectFull: Data analysis software Type: general – SubjectFull: Regression analysis Type: general – SubjectFull: Disease complications Type: general Titles: – TitleFull: Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Choi, Jiyoung – PersonEntity: Name: NameFull: Moya-Galé, Gemma – PersonEntity: Name: NameFull: Hwang, KyungHae – PersonEntity: Name: NameFull: Hirschberg, Julia – PersonEntity: Name: NameFull: Levya, Erika S. IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 04 Text: Apr2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 10924388 Numbering: – Type: volume Value: 69 – Type: issue Value: 4 Titles: – TitleFull: Journal of Speech, Language & Hearing Research Type: main |
| ResultId | 1 |