Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition.

Saved in:
Bibliographic Details
Title: Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition.
Authors: Calandruccio, Lauren1 lauren.calandruccio@case.edu, Weidman, Dani2, Leatherwood, Aja1, Buss, Emily3
Source: Journal of Speech, Language & Hearing Research. Dec2025, Vol. 68 Issue 12, p6114-6128. 15p.
Subject Terms: *Auditory perception testing, *Data analysis, *Artificial intelligence, *Speech perception, *Auditory perception, *Comparative studies, Automatic speech recognition, Noise, Research funding, Statistical sampling, Descriptive statistics, Statistics, Acoustic stimulation
Abstract: Purpose: Characterizing speech-in-noise recognition is fundamental to both clinical audiology and hearing research. Current methods rely on human speech recordings and human testers. However, modern artificial intelligence tools could automate both stimulus generation and scoring. This report evaluated masked-sentence recognition with synthetic and human speech productions and human and machine scoring methods. Methods: Participants were young adults with normal hearing who were native speakers of the test language (English). Participants completed a speech-in-noise recognition task for open-set sentences at -6 dB signal-to-noise ratio for 10 different target talkers (five human and five synthetic). Automatic speech recognition was used in addition to human scoring to determine listener performance. Participants also provided perceptual ratings using a Likert rating scale to determine if they could identify which talkers were human and which were synthetic. Results: Speech recognition scores varied across the 10 talkers, with a trend for greater intelligibility for synthetic than human talkers and greater variability across human than synthetic talkers. However, the pattern of individual differences in recognition across participants was similar for human and synthetic speech. Agreement between scores produced by human testers and automatic speech recognition was high (~98% agreement). Perceptual ratings indicate that some synthetic talkers sounded more human than others, but ratings did not predict recognition accuracy. Conclusions: Speech-in-noise recognition varied for different human and synthetic talkers, with some indication of greater consistency in intelligibility for synthetic speech. This variability did not seem to be related to perceived human likeness. Human scoring was more accurate than automatic machine scoring for open-set sentences, but results were in close agreement for both methods. These results provide tentative support for the use of synthetic speech and machine scoring when evaluating masked-sentence recognition. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: ehh
DbLabel: Education Research Complete
An: 190171430
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Calandruccio%2C+Lauren%22">Calandruccio, Lauren</searchLink><relatesTo>1</relatesTo><i> lauren.calandruccio@case.edu</i><br /><searchLink fieldCode="AR" term="%22Weidman%2C+Dani%22">Weidman, Dani</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Leatherwood%2C+Aja%22">Leatherwood, Aja</searchLink><relatesTo>1</relatesTo><br /><searchLink fieldCode="AR" term="%22Buss%2C+Emily%22">Buss, Emily</searchLink><relatesTo>3</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Dec2025, Vol. 68 Issue 12, p6114-6128. 15p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *<searchLink fieldCode="DE" term="%22Auditory+perception+testing%22">Auditory perception testing</searchLink><br />*<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Artificial+intelligence%22">Artificial intelligence</searchLink><br />*<searchLink fieldCode="DE" term="%22Speech+perception%22">Speech perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Auditory+perception%22">Auditory perception</searchLink><br />*<searchLink fieldCode="DE" term="%22Comparative+studies%22">Comparative studies</searchLink><br /><searchLink fieldCode="DE" term="%22Automatic+speech+recognition%22">Automatic speech recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Noise%22">Noise</searchLink><br /><searchLink fieldCode="DE" term="%22Research+funding%22">Research funding</searchLink><br /><searchLink fieldCode="DE" term="%22Statistical+sampling%22">Statistical sampling</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Acoustic+stimulation%22">Acoustic stimulation</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Purpose: Characterizing speech-in-noise recognition is fundamental to both clinical audiology and hearing research. Current methods rely on human speech recordings and human testers. However, modern artificial intelligence tools could automate both stimulus generation and scoring. This report evaluated masked-sentence recognition with synthetic and human speech productions and human and machine scoring methods. Methods: Participants were young adults with normal hearing who were native speakers of the test language (English). Participants completed a speech-in-noise recognition task for open-set sentences at -6 dB signal-to-noise ratio for 10 different target talkers (five human and five synthetic). Automatic speech recognition was used in addition to human scoring to determine listener performance. Participants also provided perceptual ratings using a Likert rating scale to determine if they could identify which talkers were human and which were synthetic. Results: Speech recognition scores varied across the 10 talkers, with a trend for greater intelligibility for synthetic than human talkers and greater variability across human than synthetic talkers. However, the pattern of individual differences in recognition across participants was similar for human and synthetic speech. Agreement between scores produced by human testers and automatic speech recognition was high (~98% agreement). Perceptual ratings indicate that some synthetic talkers sounded more human than others, but ratings did not predict recognition accuracy. Conclusions: Speech-in-noise recognition varied for different human and synthetic talkers, with some indication of greater consistency in intelligibility for synthetic speech. This variability did not seem to be related to perceived human likeness. Human scoring was more accurate than automatic machine scoring for open-set sentences, but results were in close agreement for both methods. These results provide tentative support for the use of synthetic speech and machine scoring when evaluating masked-sentence recognition. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=190171430
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1044/2025_JSLHR-24-00893
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 6114
    Subjects:
      – SubjectFull: Auditory perception testing
        Type: general
      – SubjectFull: Data analysis
        Type: general
      – SubjectFull: Artificial intelligence
        Type: general
      – SubjectFull: Speech perception
        Type: general
      – SubjectFull: Auditory perception
        Type: general
      – SubjectFull: Comparative studies
        Type: general
      – SubjectFull: Automatic speech recognition
        Type: general
      – SubjectFull: Noise
        Type: general
      – SubjectFull: Research funding
        Type: general
      – SubjectFull: Statistical sampling
        Type: general
      – SubjectFull: Descriptive statistics
        Type: general
      – SubjectFull: Statistics
        Type: general
      – SubjectFull: Acoustic stimulation
        Type: general
    Titles:
      – TitleFull: Testing Sentence-in-Noise Recognition With Synthetic Speech and Automatic Speech Recognition.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Calandruccio, Lauren
      – PersonEntity:
          Name:
            NameFull: Weidman, Dani
      – PersonEntity:
          Name:
            NameFull: Leatherwood, Aja
      – PersonEntity:
          Name:
            NameFull: Buss, Emily
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 12
              Text: Dec2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 10924388
          Numbering:
            – Type: volume
              Value: 68
            – Type: issue
              Value: 12
          Titles:
            – TitleFull: Journal of Speech, Language & Hearing Research
              Type: main
ResultId 1