Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency.

Saved in:
Bibliographic Details
Title: Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency.
Authors: Borders, James C.1 bordersj@bu.edu, Thompson, Austin2, Kearney, Elaine3,4
Source: Journal of Speech, Language & Hearing Research. Dec2025, Vol. 68 Issue 12, p5854-5869. 16p.
Subject Terms: *Data analysis, *Communicative disorders, Pearson correlation (Statistics), T-test (Statistics), Research evaluation, Statistical sampling, Data analytics, Descriptive statistics, Chi-squared test, Odds ratio, Statistics, Analysis of variance, Data analysis software, Regression analysis
Abstract: Purpose: Reproducibility is a core principle of science, and access to a study's data is essential to reproduce its findings. However, data sharing is uncommon in the discipline of communication sciences and disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offer a potential solution to this barrier by generating artificial data sets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study aimed to explore the feasibility and preliminary utility of synthetic data to promote transparency and reproducibility in the discipline of CSD. Method: Ten open data sets were obtained from previously published research within the American Speech-Language-Hearing Association "Big Nine" domains (articulation, cognition, communication, fluency, hearing, language, social communication, voice and resonance, and swallowing) across a range of study outcomes and designs. Synthetic data sets were generated with the synthpop R package. General utility was assessed visually and with the standardized ratio of the propensity mean squared error (S_pMSE). Specific utility assessed whether inferential relationships from the original data were preserved in the synthetic data set by comparing model fit indices, coefficients, and p values. Results: All synthetic data sets showed strong general utility, maintaining univariate and bivariate distributions. Six of nine synthetic data sets that used inferential statistics showed strong specific utility, maintaining inferential relationships from the original analysis. Specific utility was low in three data sets with hierarchical structures. Conclusions: Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of nonhierarchical data commonly seen in the discipline of CSD. Other approaches for hierarchical data need to be explored in future work. Researchers who use synthetic data should assess its utility in preserving their results for their own data and use-case. Open Science Form: https://doi.org/10.23641/asha.30569957 [ABSTRACT FROM AUTHOR]
Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
FullText Links:
  – Type: pdflink
Text:
  Availability: 0
Header DbId: ehh
DbLabel: Education Research Complete
An: 190171415
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Borders%2C+James+C%2E%22">Borders, James C.</searchLink><relatesTo>1</relatesTo><i> bordersj@bu.edu</i><br /><searchLink fieldCode="AR" term="%22Thompson%2C+Austin%22">Thompson, Austin</searchLink><relatesTo>2</relatesTo><br /><searchLink fieldCode="AR" term="%22Kearney%2C+Elaine%22">Kearney, Elaine</searchLink><relatesTo>3,4</relatesTo>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Journal+of+Speech%2C+Language+%26+Hearing+Research%22">Journal of Speech, Language & Hearing Research</searchLink>. Dec2025, Vol. 68 Issue 12, p5854-5869. 16p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *<searchLink fieldCode="DE" term="%22Data+analysis%22">Data analysis</searchLink><br />*<searchLink fieldCode="DE" term="%22Communicative+disorders%22">Communicative disorders</searchLink><br /><searchLink fieldCode="DE" term="%22Pearson+correlation+%28Statistics%29%22">Pearson correlation (Statistics)</searchLink><br /><searchLink fieldCode="DE" term="%22T-test+%28Statistics%29%22">T-test (Statistics)</searchLink><br /><searchLink fieldCode="DE" term="%22Research+evaluation%22">Research evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Statistical+sampling%22">Statistical sampling</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analytics%22">Data analytics</searchLink><br /><searchLink fieldCode="DE" term="%22Descriptive+statistics%22">Descriptive statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Chi-squared+test%22">Chi-squared test</searchLink><br /><searchLink fieldCode="DE" term="%22Odds+ratio%22">Odds ratio</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics%22">Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Analysis+of+variance%22">Analysis of variance</searchLink><br /><searchLink fieldCode="DE" term="%22Data+analysis+software%22">Data analysis software</searchLink><br /><searchLink fieldCode="DE" term="%22Regression+analysis%22">Regression analysis</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Purpose: Reproducibility is a core principle of science, and access to a study's data is essential to reproduce its findings. However, data sharing is uncommon in the discipline of communication sciences and disorders (CSD), often due to concerns related to privacy and disclosure risks. Synthetic data offer a potential solution to this barrier by generating artificial data sets that do not represent real individuals yet retain statistical properties and relationships from the original data. This study aimed to explore the feasibility and preliminary utility of synthetic data to promote transparency and reproducibility in the discipline of CSD. Method: Ten open data sets were obtained from previously published research within the American Speech-Language-Hearing Association "Big Nine" domains (articulation, cognition, communication, fluency, hearing, language, social communication, voice and resonance, and swallowing) across a range of study outcomes and designs. Synthetic data sets were generated with the synthpop R package. General utility was assessed visually and with the standardized ratio of the propensity mean squared error (S_pMSE). Specific utility assessed whether inferential relationships from the original data were preserved in the synthetic data set by comparing model fit indices, coefficients, and p values. Results: All synthetic data sets showed strong general utility, maintaining univariate and bivariate distributions. Six of nine synthetic data sets that used inferential statistics showed strong specific utility, maintaining inferential relationships from the original analysis. Specific utility was low in three data sets with hierarchical structures. Conclusions: Findings suggest that synthetic data can effectively maintain statistical properties and relationships across a wide range of nonhierarchical data commonly seen in the discipline of CSD. Other approaches for hierarchical data need to be explored in future work. Researchers who use synthetic data should assess its utility in preserving their results for their own data and use-case. Open Science Form: https://doi.org/10.23641/asha.30569957 [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Journal of Speech, Language & Hearing Research is the property of American Speech-Language-Hearing Association and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=190171415
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1044/2025_JSLHR-24-00736
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 16
        StartPage: 5854
    Subjects:
      – SubjectFull: Data analysis
        Type: general
      – SubjectFull: Communicative disorders
        Type: general
      – SubjectFull: Pearson correlation (Statistics)
        Type: general
      – SubjectFull: T-test (Statistics)
        Type: general
      – SubjectFull: Research evaluation
        Type: general
      – SubjectFull: Statistical sampling
        Type: general
      – SubjectFull: Data analytics
        Type: general
      – SubjectFull: Descriptive statistics
        Type: general
      – SubjectFull: Chi-squared test
        Type: general
      – SubjectFull: Odds ratio
        Type: general
      – SubjectFull: Statistics
        Type: general
      – SubjectFull: Analysis of variance
        Type: general
      – SubjectFull: Data analysis software
        Type: general
      – SubjectFull: Regression analysis
        Type: general
    Titles:
      – TitleFull: Using Synthetic Data in Communication Sciences and Disorders to Promote Computational Reproducibility and Transparency.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Borders, James C.
      – PersonEntity:
          Name:
            NameFull: Thompson, Austin
      – PersonEntity:
          Name:
            NameFull: Kearney, Elaine
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 12
              Text: Dec2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 10924388
          Numbering:
            – Type: volume
              Value: 68
            – Type: issue
              Value: 12
          Titles:
            – TitleFull: Journal of Speech, Language & Hearing Research
              Type: main
ResultId 1