Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition.
Saved in:
| Title: | Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition. |
|---|---|
| Authors: | Calandruccio, Lauren1 lauren.calandruccio@case.edu, Weidman, Dani2, Leatherwood, Aja1, Buss, Emily3 |
| Source: | Journal of Speech, Language & Hearing Research. Dec2025, Vol. 68 Issue 12, p6114-6128. 15p. |
| Subject Terms: | *Auditory perception testing, *Data analysis, *Artificial intelligence, *Speech perception, *Auditory perception, *Comparative studies, Automatic speech recognition, Noise, Research funding, Statistical sampling, Descriptive statistics, Statistics, Acoustic stimulation |
| Abstract: | Purpose: Characterizing speech-in-noise recognition is fundamental to both clinical audiology and hearing research. Current methods rely on human speech recordings and human testers. However, modern artificial intelligence tools could automate both stimulus generation and scoring. This report evaluated masked-sentence recognition with synthetic and human speech productions and human and machine scoring methods. Methods: Participants were young adults with normal hearing who were native speakers of the test language (English). Participants completed a speech-in-noise recognition task for open-set sentences at -6 dB signal-to-noise ratio for 10 different target talkers (five human and five synthetic). Automatic speech recognition was used in addition to human scoring to determine listener performance. Participants also provided perceptual ratings using a Likert rating scale to determine if they could identify which talkers were human and which were synthetic. Results: Speech recognition scores varied across the 10 talkers, with a trend for greater intelligibility for synthetic than human talkers and greater variability across human than synthetic talkers. However, the pattern of individual differences in recognition across participants was similar for human and synthetic speech. Agreement between scores produced by human testers and automatic speech recognition was high (~98% agreement). Perceptual ratings indicate that some synthetic talkers sounded more human than others, but ratings did not predict recognition accuracy. Conclusions: Speech-in-noise recognition varied for different human and synthetic talkers, with some indication of greater consistency in intelligibility for synthetic speech. This variability did not seem to be related to perceived human likeness. Human scoring was more accurate than automatic machine scoring for open-set sentences, but results were in close agreement for both methods. These results provide tentative support for the use of synthetic speech and machine scoring when evaluating masked-sentence recognition. [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Education Research Complete |
| FullText | Links: – Type: pdflink Text: Availability: 0 |
|---|---|
| Header | DbId: ehh DbLabel: Education Research Complete An: 190171430 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Calandruccio%2C+Lauren%22">Calandruccio, Lauren</searchLink><relatesTo>1</relatesTo><i> lauren.calandruccio@case.edu</i><br /><searchLink fieldCode="AR" term="%22Weidman%2C+Dani%22">Weidman, Dani</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Leatherwood%2C+Aja%22">Leatherwood, Aja</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Buss%2C+Emily%22">Buss, Emily</searchLink><relatesTo>3</relatesTo> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Dec2025, Vol. 68 Issue 12, p6114-6128. 15p. – Name: Subject Label: Subject Terms Group: Su Data: *<searchLink fieldCode="DE" term="%22Auditory+perception+testing%22">Auditory perception testing</searchLink><br />*<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Artificial+intelligence%22">Artificial intelligence</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Auditory+perception%22">Auditory perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Comparative+studies%22">Comparative studies</searchLink><br /><searchLink fieldCode="DE" term="%22Automatic+speech+recognition%22">Automatic speech recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Noise%22">Noise</searchLink><br /><searchLink fieldCode="DE" term="%22Research+funding%22">Research funding</searchLink><br /><searchLink fieldCode="DE" term="%22Statistical+sampling%22">Statistical sampling</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Acoustic+stimulation%22">Acoustic stimulation</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Purpose: Characterizing speech-in-noise recognition is fundamental to both clinical audiology and hearing research. Current methods rely on human speech recordings and human testers. However, modern artificial intelligence tools could automate both stimulus generation and scoring. This report evaluated masked-sentence recognition with synthetic and human speech productions and human and machine scoring methods. Methods: Participants were young adults with normal hearing who were native speakers of the test language (English). Participants completed a speech-in-noise recognition task for open-set sentences at -6 dB signal-to-noise ratio for 10 different target talkers (five human and five synthetic). Automatic speech recognition was used in addition to human scoring to determine listener performance. Participants also provided perceptual ratings using a Likert rating scale to determine if they could identify which talkers were human and which were synthetic. Results: Speech recognition scores varied across the 10 talkers, with a trend for greater intelligibility for synthetic than human talkers and greater variability across human than synthetic talkers. However, the pattern of individual differences in recognition across participants was similar for human and synthetic speech. Agreement between scores produced by human testers and automatic speech recognition was high (~98% agreement). Perceptual ratings indicate that some synthetic talkers sounded more human than others, but ratings did not predict recognition accuracy. Conclusions: Speech-in-noise recognition varied for different human and synthetic talkers, with some indication of greater consistency in intelligibility for synthetic speech. This variability did not seem to be related to perceived human likeness. Human scoring was more accurate than automatic machine scoring for open-set sentences, but results were in close agreement for both methods. These results provide tentative support for the use of synthetic speech and machine scoring when evaluating masked-sentence recognition. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=190171430 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1044/2025_JSLHR-24-00893 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 15 StartPage: 6114 Subjects: – SubjectFull: Auditory perception testing Type: general – SubjectFull: Data analysis Type: general – SubjectFull: Artificial intelligence Type: general – SubjectFull: Speech perception Type: general – SubjectFull: Auditory perception Type: general – SubjectFull: Comparative studies Type: general – SubjectFull: Automatic speech recognition Type: general – SubjectFull: Noise Type: general – SubjectFull: Research funding Type: general – SubjectFull: Statistical sampling Type: general – SubjectFull: Descriptive statistics Type: general – SubjectFull: Statistics Type: general – SubjectFull: Acoustic stimulation Type: general Titles: – TitleFull: Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Calandruccio, Lauren – PersonEntity: Name: NameFull: Weidman, Dani – PersonEntity: Name: NameFull: Leatherwood, Aja – PersonEntity: Name: NameFull: Buss, Emily IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 12 Text: Dec2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 10924388 Numbering: – Type: volume Value: 68 – Type: issue Value: 12 Titles: – TitleFull: Journal of Speech, Language & Hearing Research Type: main |
| ResultId | 1 |