Automatically Calculated Context-Sensitive Features of Connected Speech Improve Prediction of Impairment in Alzheimer's Disease.

Saved in:
Bibliographic Details
Title: Automatically Calculated Context-Sensitive Features of Connected Speech Improve Prediction of Impairment in Alzheimer's Disease.
Authors: Flick, Graham1,2 graham.flick@nyu.edu, Ostrand, Rachel3
Source: Journal of Speech, Language & Hearing Research. Nov2025, Vol. 68 Issue 11, p5341-5362. 22p.
Subject Terms: *Speech evaluation, *Alzheimer's disease, *Data analysis, *Discourse analysis, *Research, Statistical models, Pearson correlation (Statistics), Research funding, Secondary analysis, Receiver operating characteristic curves, Multiple regression analysis, Logistic regression analysis, Descriptive statistics, Natural language processing, Linguistics, Statistics
Abstract: Purpose: Early detection is critical for effective management of Alzheimer's disease (AD) and other dementias. One promising approach for predicting AD status is to automatically calculate linguistic features from open-ended connected speech. Past work has focused on individual word-level features such as part of speech counts, total word production, and lexical richness, with less emphasis on measuring the relationship between words and the context in which they are produced. Here, we assessed whether linguistic features that take into account where a word was produced in the discourse context improved the ability to predict AD patients' Mini-Mental State Examination (MMSE) scores and classify AD patients from healthy control participants. Method: Seventeen linguistic features were automatically computed from transcriptions of spoken picture descriptions from individuals with probable or possible AD (n = 176 transcripts). This included 12 word-level features (e.g., part of speech counts) and five features capturing contextual word choices (linguistic surprisal, computed from a computational large language model, and properties of words produced following filled pauses). We examined whether (a) the full set jointly predicted MMSE scores, (b) the addition of contextual features improved prediction, and (c) linguistic features could classify AD patients (n = 130) versus healthy participants (n = 93). Results: Linguistic features accurately predicted MMSE scores in individuals with probable or possible AD and successfully identified up to 87% of AD participants versus healthy controls. Statistical models that contained linguistic sur-prisal (a contextual feature) performed better than those that included only word-level and demographic features. Overall, AD patients with lower MMSE scores produced more empty words, fewer nouns and definite articles, and words that were higher frequency yet more surprising given the previous context. Conclusion: These results provide novel evidence that metrics related to con-textualized word choices, particularly the surprisal of an individual's words, capture variance in degree of cognitive decline in AD. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
Description
Abstract:Purpose: Early detection is critical for effective management of Alzheimer's disease (AD) and other dementias. One promising approach for predicting AD status is to automatically calculate linguistic features from open-ended connected speech. Past work has focused on individual word-level features such as part of speech counts, total word production, and lexical richness, with less emphasis on measuring the relationship between words and the context in which they are produced. Here, we assessed whether linguistic features that take into account where a word was produced in the discourse context improved the ability to predict AD patients' Mini-Mental State Examination (MMSE) scores and classify AD patients from healthy control participants. Method: Seventeen linguistic features were automatically computed from transcriptions of spoken picture descriptions from individuals with probable or possible AD (n = 176 transcripts). This included 12 word-level features (e.g., part of speech counts) and five features capturing contextual word choices (linguistic surprisal, computed from a computational large language model, and properties of words produced following filled pauses). We examined whether (a) the full set jointly predicted MMSE scores, (b) the addition of contextual features improved prediction, and (c) linguistic features could classify AD patients (n = 130) versus healthy participants (n = 93). Results: Linguistic features accurately predicted MMSE scores in individuals with probable or possible AD and successfully identified up to 87% of AD participants versus healthy controls. Statistical models that contained linguistic sur-prisal (a contextual feature) performed better than those that included only word-level and demographic features. Overall, AD patients with lower MMSE scores produced more empty words, fewer nouns and definite articles, and words that were higher frequency yet more surprising given the previous context. Conclusion: These results provide novel evidence that metrics related to con-textualized word choices, particularly the surprisal of an individual's words, capture variance in degree of cognitive decline in AD. [ABSTRACT FROM AUTHOR]
ISSN:10924388
DOI:10.1044/2025_JSLHR-24-00297