Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency.
Saved in:
| Title: | Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency. |
|---|---|
| Authors: | Borders, James C.1 bordersj@bu.edu, Thompson, Austin2, Kearney, Elaine3,4 |
| Source: | Journal of Speech, Language & Hearing Research. Dec2025, Vol. 68 Issue 12, p5854-5869. 16p. |
| Subject Terms: | *Data analysis, *Communicative disorders, Pearson correlation (Statistics), T-test (Statistics), Research evaluation, Statistical sampling, Data analytics, Descriptive statistics, Chi-squared test, Odds ratio, Statistics, Analysis of variance, Data analysis software, Regression analysis |
| Abstract: | Purpose: Reproducibility is a core principle of science, and access to a study's data is essential to reproduce its findings. However, data sharing is uncommon in the discipline of communication sciences and disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offer a potential solution to this barrier by generating artificial data sets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study aimed to explore the feasibility and preliminary utility of synthetic data to promote transparency and reproducibility in the discipline of CSD. Method: Ten open data sets were obtained from previously published research within the American Speech-Language-Hearing Association "Big Nine" domains (articulation, cognition, communication, fluency, hearing, language, social communication, voice and resonance, and swallowing) across a range of study outcomes and designs. Synthetic data sets were generated with the synthpop R package. General utility was assessed visually and with the standardized ratio of the propensity mean squared error (S_pMSE). Specific utility assessed whether inferential relationships from the original data were preserved in the synthetic data set by comparing model fit indices, coefficients, and p values. Results: All synthetic data sets showed strong general utility, maintaining univariate and bivariate distributions. Six of nine synthetic data sets that used inferential statistics showed strong specific utility, maintaining inferential relationships from the original analysis. Specific utility was low in three data sets with hierarchical structures. Conclusions: Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of nonhierarchical data commonly seen in the discipline of CSD. Other approaches for hierarchical data need to be explored in future work. Researchers who use synthetic data should assess its utility in preserving their results for their own data and use-case. Open Science Form: https://doi.org/10.23641/asha.30569957 [ABSTRACT FROM AUTHOR] |
| Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Education Research Complete |
| FullText | Links: – Type: pdflink Text: Availability: 0 |
|---|---|
| Header | DbId: ehh DbLabel: Education Research Complete An: 190171415 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Borders%2C+James+C%2E%22">Borders, James C.</searchLink><relatesTo>1</relatesTo><i> bordersj@bu.edu</i><br /><searchLink fieldCode="AR" term="%22Thompson%2C+Austin%22">Thompson, Austin</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Kearney%2C+Elaine%22">Kearney, Elaine</searchLink><relatesTo>3,4</relatesTo> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Dec2025, Vol. 68 Issue 12, p5854-5869. 16p. – Name: Subject Label: Subject Terms Group: Su Data: *<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Communicative+disorders%22">Communicative disorders</searchLink><br /><searchLink fieldCode="DE" term="%22Pearson+correlation+%28Statistics%29%22">Pearson correlation (Statistics)</searchLink><br /><searchLink fieldCode="DE" term="%22T-test+%28Statistics%29%22">T-test (Statistics)</searchLink><br /><searchLink fieldCode="DE" term="%22Research+evaluation%22">Research evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Statistical+sampling%22">Statistical sampling</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analytics%22">Data analytics</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Chi-squared+test%22">Chi-squared test</searchLink><br /><searchLink fieldCode="DE" term="%22Odds+ratio%22">Odds ratio</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Analysis+of+variance%22">Analysis of variance</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analysis+software%22">Data analysis software</searchLink><br /><searchLink fieldCode="DE" term="%22Regression+analysis%22">Regression analysis</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Purpose: Reproducibility is a core principle of science, and access to a study's data is essential to reproduce its findings. However, data sharing is uncommon in the discipline of communication sciences and disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offer a potential solution to this barrier by generating artificial data sets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study aimed to explore the feasibility and preliminary utility of synthetic data to promote transparency and reproducibility in the discipline of CSD. Method: Ten open data sets were obtained from previously published research within the American Speech-Language-Hearing Association "Big Nine" domains (articulation, cognition, communication, fluency, hearing, language, social communication, voice and resonance, and swallowing) across a range of study outcomes and designs. Synthetic data sets were generated with the synthpop R package. General utility was assessed visually and with the standardized ratio of the propensity mean squared error (S_pMSE). Specific utility assessed whether inferential relationships from the original data were preserved in the synthetic data set by comparing model fit indices, coefficients, and p values. Results: All synthetic data sets showed strong general utility, maintaining univariate and bivariate distributions. Six of nine synthetic data sets that used inferential statistics showed strong specific utility, maintaining inferential relationships from the original analysis. Specific utility was low in three data sets with hierarchical structures. Conclusions: Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of nonhierarchical data commonly seen in the discipline of CSD. Other approaches for hierarchical data need to be explored in future work. Researchers who use synthetic data should assess its utility in preserving their results for their own data and use-case. Open Science Form: https://doi.org/10.23641/asha.30569957 [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=190171415 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1044/2025_JSLHR-24-00736 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 16 StartPage: 5854 Subjects: – SubjectFull: Data analysis Type: general – SubjectFull: Communicative disorders Type: general – SubjectFull: Pearson correlation (Statistics) Type: general – SubjectFull: T-test (Statistics) Type: general – SubjectFull: Research evaluation Type: general – SubjectFull: Statistical sampling Type: general – SubjectFull: Data analytics Type: general – SubjectFull: Descriptive statistics Type: general – SubjectFull: Chi-squared test Type: general – SubjectFull: Odds ratio Type: general – SubjectFull: Statistics Type: general – SubjectFull: Analysis of variance Type: general – SubjectFull: Data analysis software Type: general – SubjectFull: Regression analysis Type: general Titles: – TitleFull: Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Borders, James C. – PersonEntity: Name: NameFull: Thompson, Austin – PersonEntity: Name: NameFull: Kearney, Elaine IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 12 Text: Dec2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 10924388 Numbering: – Type: volume Value: 68 – Type: issue Value: 12 Titles: – TitleFull: Journal of Speech, Language & Hearing Research Type: main |
| ResultId | 1 |