View in EDS HTML Full Text PDF Full Text

Clinical Correlates of Errors in Machine-Learning Diagnostic Model of Autism Spectrum Disorder: Impact of Sample Cohorts

Saved in:

Bibliographic Details
Title:	Clinical Correlates of Errors in Machine-Learning Diagnostic Model of Autism Spectrum Disorder: Impact of Sample Cohorts
Language:	English
Authors:	Yen-Chin Wang (ORCID 0000-0002-3420-5042), Chung-Yuan Cheng (ORCID 0000-0003-1931-458X), Chi-Shin Wu, Chi-Chun Lee, Susan Shur-Fen Gau (ORCID 0000-0002-2718-8221)
Source:	Autism: The International Journal of Research and Practice. 2025 29(12):3083-3099.
Availability:	SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
Peer Reviewed:	Y
Page Count:	17
Publication Date:	2025
Document Type:	Journal Articles Reports - Research
Descriptors:	Artificial Intelligence, Autism Spectrum Disorders, Clinical Diagnosis, Error Patterns, Models, Classification, Sex, Age, Intelligence Quotient, Symptoms (Individual Disorders), Mental Disorders, Behavior Problems, Attention Deficit Hyperactivity Disorder, Aggression, Attention, Foreign Countries, Diagnostic Tests
Geographic Terms:	Taiwan
Assessment and Survey Identifiers:	Social Responsiveness Scale, Child Behavior Checklist, Autism Diagnostic Observation Schedule
DOI:	10.1177/13623613251360271
ISSN:	1362-3613 1461-7005
Abstract:	Machine-learning models can assist in diagnosing autism but have biases. We examines the correlates of misclassifications and how training data affect model generalizability. The Social Responsive Scale data were collected from two cohorts in Taiwan: the clinical cohort comprised 1203 autistic participants and 1182 non-autistic comparisons, and the community cohort consisted of 35 autistic participants and 3297 non-autistic comparisons. Classification models were trained, and the misclassification cases were investigated regarding their associations with sex, age, intelligence quotient (IQ), symptoms from the child behavioral checklist (CBCL), and co-occurring psychiatric diagnosis. Models showed high within-cohort accuracy (clinical: sensitivity 0.91-0.95, specificity 0.93-0.94; community: sensitivity 0.91-1.00, specificity 0.89-0.96), but generalizability across cohorts was limited. When the community-trained model was applied to the clinical cohort, performance declined (sensitivity 0.65, specificity 0.95). In both models, non-autistic individuals misclassified as autistic showed elevated behavioral symptoms and attention-deficit hyperactivity disorder (ADHD) prevalence. Conversely, autistic individuals who were misclassified tended to show fewer behavioral symptoms and, in the community model, higher IQ and aggressive behavior but less social and attention problems. Error patterns of machine-learning model and the impact of training data warrant careful consideration in future research.
Abstractor:	As Provided
Entry Date:	2025
Accession Number:	EJ1489398
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwGaZDl6l65FFde5Vf1B5W5UAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDAuNPElUNLqYvL6tHQIBEICBmpdMv11hnQjuw0CKzo-wu9NgmSdrOqNVahlGq_NDAOHdXYfxCfpeZs7ZWfpklFGhHsk9K9ubb6cfR5qW98xPkshqPYswKrp9TkOWGkQxLmDd2Ug9YcGg8eJfUzk6sF9ivODbR1bNU6C0441VB6USMSJgWYm0tuguNxjsUhIJMRCdBgdp4jcDuigg6AFXhska9S2gYneryg7vtYQ= Text: Availability: 1 Value: <anid>AN0189325696;f9d01dec.25;2025Nov18.00:28;v2.2.500</anid> <title id="AN0189325696-1">Clinical correlates of errors in machine-learning diagnostic model of autism spectrum disorder: Impact of sample cohorts </title> <p>Machine-learning models can assist in diagnosing autism but have biases. We examines the correlates of misclassifications and how training data affect model generalizability. The Social Responsive Scale data were collected from two cohorts in Taiwan: the clinical cohort comprised 1203 autistic participants and 1182 non-autistic comparisons, and the community cohort consisted of 35 autistic participants and 3297 non-autistic comparisons. Classification models were trained, and the misclassification cases were investigated regarding their associations with sex, age, intelligence quotient (IQ), symptoms from the child behavioral checklist (CBCL), and co-occurring psychiatric diagnosis. Models showed high within-cohort accuracy (clinical: sensitivity 0.91–0.95, specificity 0.93–0.94; community: sensitivity 0.91–1.00, specificity 0.89–0.96), but generalizability across cohorts was limited. When the community-trained model was applied to the clinical cohort, performance declined (sensitivity 0.65, specificity 0.95). In both models, non-autistic individuals misclassified as autistic showed elevated behavioral symptoms and attention-deficit hyperactivity disorder (ADHD) prevalence. Conversely, autistic individuals who were misclassified tended to show fewer behavioral symptoms and, in the community model, higher IQ and aggressive behavior but less social and attention problems. Error patterns of machine-learning model and the impact of training data warrant careful consideration in future research. Machine-learning is a type of computer model that can help identify patterns in data and make predictions. In autism research, these models may support earlier or more accurate identification of autistic individuals. But to be useful, they need to make reliable predictions across different groups of people. In this study, we explored when and why these models might make mistakes—and how the kind of data used to train them affects their accuracy. Training models means using information to teach the computer model how to tell the difference between autistic and non-autistic individuals. We used the information from the Social Responsiveness Scale (SRS), which is a questionnaire that measures autistic features. We tested these models on two different groups: one from clinical settings and one from the general community. The models worked well when tested within the same type of group they were trained. However, a model trained on the community group did not perform as accurately when tested on the clinical group. Sometimes, the model got it wrong. For example, in the clinical group, some autistic individuals were mistakenly identified as non-autistic. These individuals tended to have fewer emotional or behavioral difficulties. In the community group, autistic individuals who were mistakenly identified as non-autistic had higher IQs and showed more aggressive behaviors but fewer attention or social problems. On the contrary, some non-autistic people were incorrectly identified as autistic. These people had more emotional or behavioral challenges and were more likely to have attention-deficit hyperactivity disorder (ADHD). These findings highlight that machine-learning models are sensitive to the type of data they are trained on. To build fair and accurate models for predicting autism, it is essential to consider where the training data come from and whether it represents the full diversity of individuals. Understanding these patterns of error can help improve future tools used in both research and clinical care.</p> <p>Keywords: autism spectrum disorder; diagnostic models; error analysis; machine-learning</p> <hd id="AN0189325696-2">Introduction</hd> <p>Machine-learning models have been increasingly used in research to assist in diagnosing autism spectrum disorder (ASD) ([<reflink idref="bib25" id="ref1">25</reflink>]; [<reflink idref="bib34" id="ref2">34</reflink>]; [<reflink idref="bib61" id="ref3">61</reflink>]). Different data modalities have been included in various studies to train classification models ([<reflink idref="bib61" id="ref4">61</reflink>]), including behavioral data from questionnaires ([<reflink idref="bib23" id="ref5">23</reflink>]) or standardized diagnostic interviews ([<reflink idref="bib54" id="ref6">54</reflink>]), social interactions or stereotyped movements from video recordings ([<reflink idref="bib38" id="ref7">38</reflink>]), speech characteristics from audio recordings ([<reflink idref="bib11" id="ref8">11</reflink>]; [<reflink idref="bib43" id="ref9">43</reflink>]), and brain image ([<reflink idref="bib17" id="ref10">17</reflink>]; [<reflink idref="bib50" id="ref11">50</reflink>]; [<reflink idref="bib63" id="ref12">63</reflink>]) or genetics data ([<reflink idref="bib26" id="ref13">26</reflink>]). Behavioral data are some of the most commonly used modalities. Data from standardized interviews and observations, such as the Autism Diagnostic Interview, Revised (ADI-R), and the Autism Diagnostic Observation Schedule (ADOS), may be initially considered for their well-established diagnostic validity ([<reflink idref="bib61" id="ref14">61</reflink>]). However, these diagnostic processes are highly time-consuming and require trained interviewers to conduct them. Questionnaires that could be self-administered, such as the Social Responsiveness Scale (SRS), are more convenient for clinical deployment. [<reflink idref="bib23" id="ref15">23</reflink>] utilized crowd-sourced data from the 65-item original SRS to establish a model with an area under the receiver operating characteristic curve (AUC) &gt; 0.96. They later validated it in another data set with a model trained using only 15 items, achieving an AUC of 0.89 ([<reflink idref="bib22" id="ref16">22</reflink>]). Compared to original 65-item scale with a single cutoff point ([<reflink idref="bib19" id="ref17">19</reflink>]), machine-learning models can either improve performance (e.g. AUC increased from 0.85 to 0.96) or reduce the number of required items (from 65 to 15) while maintaining similar accuracy. These advantages suggest that machine-learning models may enhance the practicality and efficiency of autism screening in real-world clinical settings.</p> <p>It is crucial to remember that even the most advanced machine-learning models are not immune to bias and can occasionally falter ([<reflink idref="bib52" id="ref18">52</reflink>]; [<reflink idref="bib62" id="ref19">62</reflink>]). The diversity in training data backgrounds can influence these models' generalizability and hinder their practical application ([<reflink idref="bib5" id="ref20">5</reflink>]; [<reflink idref="bib52" id="ref21">52</reflink>]). Recent advocacy for clinical audits for machine-learning algorithms ([<reflink idref="bib44" id="ref22">44</reflink>]) underscores the necessity to scrutinize potential biases and errors in classification models. The inclusion of bias assessment and exploratory error analysis in the latest reporting guideline ([<reflink idref="bib44" id="ref23">44</reflink>]) and appraisal tool ([<reflink idref="bib40" id="ref24">40</reflink>]) for machine-learning research is a significant step toward ensuring the reliability of these models. Past studies have shown that co-occurring conditions could skew the machine-learning model and result in misclassification ([<reflink idref="bib54" id="ref25">54</reflink>]). Furthermore, gender bias has been a concern in diagnosing ASD ([<reflink idref="bib24" id="ref26">24</reflink>]; [<reflink idref="bib37" id="ref27">37</reflink>]), and while different norms have been identified between genders in the SRS ([<reflink idref="bib19" id="ref28">19</reflink>]; [<reflink idref="bib21" id="ref29">21</reflink>]; [<reflink idref="bib33" id="ref30">33</reflink>]), it remains uncertain whether this bias extends to machine-learning models developed using SRS data.</p> <p>Moreover, these data sets may reflect biases toward specific populations or exhibit imbalances between cases and controls. For example, the distinct nature of the clinical study would recruit a group of participants with some extent of severity but excluding significant co-occurring conditions to avoid confounding factors ([<reflink idref="bib49" id="ref31">49</reflink>]). Distinct comorbid patterns may thereby impact the error pattern of classification. On the contrary, community-fetched data were usually less prominent but confounded with multiple factors, and the case number would be primarily limited when studying conditions with low prevalence ([<reflink idref="bib31" id="ref32">31</reflink>]), resulting in an imbalanced cohort. Thoroughly examining the impact of various factors on training data and comparing error patterns is crucial for enhancing the robustness of machine-learning models for clinical use ([<reflink idref="bib5" id="ref33">5</reflink>]; [<reflink idref="bib52" id="ref34">52</reflink>]).</p> <p>This study aims to (<reflink idref="bib1" id="ref35">1</reflink>) develop machine-learning classification models for ASD using two different data sets, one from a clinical study and another one from a national epidemiological study; (<reflink idref="bib2" id="ref36">2</reflink>) analyze the error patterns of these machine-learning models by exploring the clinical correlates of misclassified cases; and (<reflink idref="bib3" id="ref37">3</reflink>) compare the error patterns between models trained on different data sets. We hypothesize that both data sets can be used to establish effective machine-learning models and that the error patterns would be closely correlated with general emotional and behavioral problems. However, we also anticipated that specific characteristics of the training data—such as the differing ratios between autistic and non-autistic participants, the variability in autistic traits across data sets, and the prevalence of co-occurring conditions—might influence both the model's classification performance and its error patterns.</p> <hd id="AN0189325696-3">Methods</hd> <p></p> <hd id="AN0189325696-4">Data description and preprocessing</hd> <p></p> <hd id="AN0189325696-5">The clinical cohort from the case–control study</hd> <p>Autistic participants were recruited from the child psychiatric department at a medical center in Northern Taiwan. The Chinese versions of ADI-R ([<reflink idref="bib32" id="ref38">32</reflink>]) and ADOS ([<reflink idref="bib7" id="ref39">7</reflink>]) were employed as a standardized diagnostic tool to confirm their diagnosis, while the Kiddie Epidemiologic version of the Schedule for Affective Disorders and Schizophrenia (K-SADS-E) ([<reflink idref="bib14" id="ref40">14</reflink>]) was utilized to make the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (<emph>DSM</emph>-5) psychiatric diagnosis, excluding participants with other neurological disorders, psychotic disorders, mood disorders, learning disabilities, or substance use disorders from the study. Age and sex-matched non-autistic comparisons (NACs) were recruited from similar neighborhoods, with exclusion criteria including ASD, attention-deficit hyperactivity disorder (ADHD), developmental delay, and those mentioned above major psychiatric and neurological diagnoses. The child behavioral checklist (CBCL) was utilized to measure emotional and behavioral problems during the study. There were 1280 autistic participants and 1240 NAC in the primary data set. The clinical cohort demonstrated excellent information completeness, with less than 1% of SRS items missing. We employed listwise exclusion of missing data, using only cases with complete information to develop classification models. After excluding those missing data, a total of 1203 autistic participants and 1182 NAC were included in the final analysis.</p> <hd id="AN0189325696-6">Community cohort from the epidemiological study</hd> <p>The nationwide, population-based representative community cohort comprised participants from Taiwan's national epidemiology survey of child mental disorders ([<reflink idref="bib13" id="ref41">13</reflink>]). This survey employed a stratified school-based clustering sampling design from 2015 to 2017, encompassing 45 elementary and 24 junior high schools. We included all the participants who completed the K-SADS-E for the <emph>DSM</emph>-5 ([<reflink idref="bib14" id="ref42">14</reflink>]) interview to screen for potential ASD diagnosis. Those without a potential ASD diagnosis were categorized as non-ASD and served as the comparison group in model development. The SRS was also administered during clinical interviews with their parents. Detailed information, including background, study size determination, sampling method, and study design, has been documented elsewhere ([<reflink idref="bib13" id="ref43">13</reflink>]). There were 52 autistic participants and 4764 NAC in the primary data set. The community cohort had a higher missing rate of SRS items, ranging from 14.9% to 16.4%, with item 44 exhibiting a notably high missing rate of 40.8%. To optimize the use of available data from this cohort, we excluded item 44 from the analysis. After listwise exclusion, 35 autistic participants and 3297 NAC from the community cohort were included in the model development.</p> <hd id="AN0189325696-7">Ethics approval</hd> <p>The Research Ethics Committee of the study hospital approved the data collection of clinical samples (approval number: 201201006RIB, ClinicalTrials.gov number: NCT01582256) and epidemiological samples (approval number: 201411056RIN, ClinicalTrials.gov number: NCT02707848) before recruitment, and written informed consent was obtained from both participants and their parents. Complete confidentiality was ensured throughout the study. The NTUH Research Ethics Committee approved this work before data analysis (approval number: 202303089RINB, 202002086RIND; ClinicalTrials.gov number: NCT04873674).</p> <hd id="AN0189325696-8">Measures of autistic traits and emotional/behavioral problems</hd> <p></p> <hd id="AN0189325696-9">The Chinese version of the SRS</hd> <p>The SRS is a 65-item self-administered questionnaire used by individuals or their caregivers to assess autistic traits ([<reflink idref="bib20" id="ref44">20</reflink>]). The Taiwan Autism Study Group translated the Chinese version with permission and approval from Dr. Constantino and Western Psychological Services ([<reflink idref="bib29" id="ref45">29</reflink>]). It has been extensively used for measuring autistic traits (e.g. [<reflink idref="bib16" id="ref46">16</reflink>]; [<reflink idref="bib15" id="ref47">15</reflink>]). Items were rated on a 4-point Likert-type scale from "0" (<emph>not true</emph>) to "3" (<emph>almost always true</emph>). The four-factor subscale, socio-communication, unique mannerisms, social awareness, and social emotion, as revealed in a previous psychometric study in Taiwan ([<reflink idref="bib29" id="ref48">29</reflink>]), examined the distribution of autistic traits between different diagnostic statuses and error cases analysis.</p> <hd id="AN0189325696-10">Chinese version of the CBCL</hd> <p>The CBCL ([<reflink idref="bib1" id="ref49">1</reflink>]; [<reflink idref="bib55" id="ref50">55</reflink>]), a parent-reported questionnaire for children aged 4–18 years, identifies eight emotional or behavioral problems from its 118 items: attention problems, anxiety/depression, aggression, delinquency, social problems, somatic symptoms, thought problems, and withdrawal. Responses are rated on a scale of 0 (<emph>not true</emph>), 1 (<emph>somewhat or sometimes true</emph>), and 2 (very true or often true). The Chinese version of the CBCL demonstrates strong reliability and validity and is extensively utilized in child research in Taiwan (e.g. [<reflink idref="bib15" id="ref51">15</reflink>]; [<reflink idref="bib58" id="ref52">58</reflink>]). We used the eight subscales to investigate the clinical associations of wrong classification cases.</p> <hd id="AN0189325696-11">Measures for assisting diagnoses of ASD and other psychiatric disorders</hd> <p></p> <hd id="AN0189325696-12">The Chinese version of ADI-R and ADOS</hd> <p>The ADI-R is a standardized, semi-structured interview scale that qualified interviewers apply to the primary caregivers of people over 18 months old. The interview covers the most developmental and behavioral aspects of ASD, and the diagnostic algorithm corresponds to the core ASD symptoms ([<reflink idref="bib27" id="ref53">27</reflink>]; [<reflink idref="bib46" id="ref54">46</reflink>]). The ADOS is a thorough, investigator-administered tool designed to observe children in natural social settings that elicit particular social and communicative responses ([<reflink idref="bib7" id="ref55">7</reflink>]; [<reflink idref="bib45" id="ref56">45</reflink>]). We have prepared the Chinese ADI-R and ADOS, approved by the Western Psychological Services in June 2007 and April 2008, respectively, for use in this study. Both instruments have been widely used in Taiwan in clinical and research settings (e.g. [<reflink idref="bib16" id="ref57">16</reflink>]; [<reflink idref="bib15" id="ref58">15</reflink>]). We conducted ADOS and ADI-R interviews with autistic participants (only a proportion) and their parents in the clinical study before recruitment to confirm the clinical diagnosis of ASD made by board-certified child psychiatrists.</p> <hd id="AN0189325696-13">Kiddie epidemiologic version of the Schedule for Affective Disorders and Schizophrenia (K-SAD...</hd> <p>The K-SADS-E is a semi-structured interview tool used to reliably evaluate both past and present <emph>DSM</emph>-IV psychiatric disorders in children and adolescents. The Chinese adaptation was translated by Taiwan's Child Psychiatry Research Group, incorporating cultural modifications specific to the Taiwanese context and additional adjustments to align with <emph>DSM</emph>-IV ([<reflink idref="bib28" id="ref59">28</reflink>]) and <emph>DSM</emph>-5 ([<reflink idref="bib14" id="ref60">14</reflink>]) diagnostic criteria. The Chinese versions showed good sensitivity and specificity and have been extensively used in research (e.g. [<reflink idref="bib9" id="ref61">9</reflink>], [<reflink idref="bib8" id="ref62">8</reflink>]; [<reflink idref="bib15" id="ref63">15</reflink>]; [<reflink idref="bib16" id="ref64">16</reflink>]; [<reflink idref="bib42" id="ref65">42</reflink>]; [<reflink idref="bib56" id="ref66">56</reflink>]; [<reflink idref="bib60" id="ref67">60</reflink>]). All the participants and their parents were interviewed to make the psychiatric diagnoses.</p> <hd id="AN0189325696-14">Machine-learning models development, assessment, and error analysis</hd> <p></p> <hd id="AN0189325696-15">Develop a classification model within each cohort</hd> <p>In this study, we adopted a flexible, data-driven approach to classification using machine-learning methods. We conceptualize machine-learning as a framework that extends beyond traditional regression methods by enabling the discovery of complex patterns within multidimensional data. Rather than assuming a predefined relationship between SRS items and ASD classification, machine-learning algorithms can identify subtle, multivariate patterns across items that the cutoff-based methods may overlook. This openness to algorithmic diversity reflects our acknowledgment of the complexity and heterogeneity inherent in behavioral data and supports a more exploratory approach to prediction ([<reflink idref="bib6" id="ref68">6</reflink>]).</p> <p>The entire flow chart of the study process is provided in Figure 1. To facilitate performance comparison across different models, 40% of the clinical cohort was separated as an independent test data set. Owing to the small number of autistic participants in the community cohort, we generated a training set by oversampling autistic participants fivefold and under-sampling 10% of non-NACs to achieve an ASD: NAC ratio of approximately 1:2. Classification models including support vector machine ([<reflink idref="bib30" id="ref69">30</reflink>]), random forest ([<reflink idref="bib6" id="ref70">6</reflink>]), K-nearest neighbor ([<reflink idref="bib57" id="ref71">57</reflink>]), linear discriminant ([<reflink idref="bib3" id="ref72">3</reflink>]), neural network ([<reflink idref="bib53" id="ref73">53</reflink>]), and decision tree ([<reflink idref="bib39" id="ref74">39</reflink>]) were trained and compared. We summarized the default parameter settings used in the machine-learning models implemented in JASP for this study in the Supplementary Materials. As the primary aim was exploratory analysis of misclassification patterns, no manual parameter tuning was performed. All 65 items from parents' reports were utilized to establish the classification models for the clinical cohort. For the community cohort model, item 44 was excluded due to its significantly high missing rate. Model development was conducted using the machine-learning module of JASP (Version 0.18.3) ([<reflink idref="bib35" id="ref75">35</reflink>]).</p> <p>Graph: Figure 1. Flow chart of develop machine-learning model from each cohort and examine error pattern.</p> <hd id="AN0189325696-16">Model assessment within each cohort and with the independent test data set</hd> <p>We chose AUC, sensitivity (recall), and specificity (true negative rate) as primary evaluation metrics to assess model performance, aligning with clinical intuition and minimizing confounding from the case prevalence of the data set. Precision (positive predictive value) and F1 score (harmonic mean of precision and recall) were also used in machine-learning studies as standard practice. Initially, we compared the performance metrics across different models within each cohort. We then tested the performance of the best-performing model from the community cohort by applying it to the independent test data set. Finally, the best-performing models from the study and community cohorts were applied to the independent test set for subsequent error pattern analysis (Figure 1).</p> <hd id="AN0189325696-17">Clinical correlates of errors of machine-learning models</hd> <p>The wrong classification cases in the independent test data set were further investigated for associations with sex, age, intelligence quotient (IQ), emotional and behavioral symptoms from the CBCL, and co-occurring conditions assessed by the Mandarin version of K-SADS-E ([<reflink idref="bib14" id="ref76">14</reflink>]). We used Student <emph>t</emph>-tests for continuous variables and chi-square tests for categorical variables, considering a <emph>p</emph> value &lt; 0.05 statistically significant. For exploratory analysis, we analyzed the association between the classification outcomes and both the total scores and subscales of the SRS. All statistical analyses and data visualizations were performed using JASP (Version 0.18.3).</p> <hd id="AN0189325696-18">Evaluation of alternative sampling methods and advanced model</hd> <p>In the primary analysis, we opted for simple random oversampling (with replication) of autistic participants and random under-sampling of non-autistic participants to achieve an approximate 1:2 ratio for model training. Likewise, we selected a set of classic machine-learning classifiers (e.g. LDA, SVM, RF) that are well-established and supported by the JASP platform, which ensures workflow reproducibility and consistency.</p> <p>To further explore the impact of alternative techniques, we conducted additional analyses using the Synthetic Minority Oversampling Technique (SMOTE) ([<reflink idref="bib10" id="ref77">10</reflink>]) and the XGBoost algorithm ([<reflink idref="bib12" id="ref78">12</reflink>]) in Python to assess model performance and generalizability under advanced sampling and classification methods.</p> <hd id="AN0189325696-19">Additional investigation about the errors of prediction in community cohorts by clinical mode...</hd> <p>To further explore the impact of the sampling strategy, we conducted an additional analysis using the original, unmodified community data set as an evaluation set. This data set included all participants prior to under- and oversampling (52 participants with ASD and 4764 NACs). We applied the model trained on the community cohort (after sampling) back to this full data set to assess overall performance and examine whether the misclassification patterns remained consistent despite changes in class distribution.</p> <hd id="AN0189325696-20">Exploratory analysis of the classification condition and SRS sub-scores</hd> <p>The SRS has been utilized to measure autistic traits across clinical and general populations. However, the cut-off between autistic and non-autistic populations varied across different samples depending on the study setting or comparison groups ([<reflink idref="bib19" id="ref79">19</reflink>]). We performed an exploratory analysis of the classification condition and SRS total and sub-scores to see how the machine-learning model performed better than a single cut-off point and whether each SRS sub-score plays a similar role in differentiating autistic participants from comparisons.</p> <hd id="AN0189325696-21">Results</hd> <p></p> <hd id="AN0189325696-22">Characteristics of different data sets</hd> <p>Table 1 illustrates the characteristics of the two cohorts. The final clinical cohort comprised 1203 autistic participants (87.4% male, mean age ± standard deviation: 10.0 ± 4.7 years old) and 1182 NACs (57.8% male, 10.9 ± 4.9 years old). The final population-based community cohort comprised 35 autistic participants (77.1% male, 11.7 ± 1.7 years old) and 3297 NACs (50.6% male, 11.2 ± 1.8 years old). The age of autistic participants was slightly younger in the clinical cohort compared with the community cohort (<emph>t</emph> = 2.45, <emph>p</emph> = 0.0145). The sex ratio of autistic participants was similar across the two cohorts (chi-square = 2.029, <emph>p</emph> = 0.154). Still, autistic participants in the community cohort exhibited lower symptom severity in the SRS sub-scores (all <emph>p</emph> &lt; 0.001) except for the social awareness sub-score (<emph>p</emph> = 0.831).</p> <p>Table 1. Basic characteristics of clinical cohort and community cohort.</p> <p>Graph</p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left" colspan="2"&gt;Clinical cohort&lt;/th&gt;&lt;th align="left" colspan="2"&gt;Community cohort&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;ASD&lt;/th&gt;&lt;th align="left"&gt;NAC&lt;/th&gt;&lt;th align="left"&gt;ASD&lt;/th&gt;&lt;th align="left"&gt;NAC&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;1203&lt;/td&gt;&lt;td&gt;1182&lt;/td&gt;&lt;td&gt;35&lt;/td&gt;&lt;td&gt;3297&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sex&lt;/td&gt;&lt;td&gt;Male: 87.4%Female: 12.6%&lt;/td&gt;&lt;td&gt;Male: 57.7%Female: 42.3%&lt;/td&gt;&lt;td&gt;Male: 77.1%Female: 22.9%&lt;/td&gt;&lt;td&gt;Male: 50.6%Female: 49.4%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Age&lt;/td&gt;&lt;td&gt;10.0 &amp;#177; 4.7&lt;/td&gt;&lt;td&gt;10.9 &amp;#177; 4.9&lt;/td&gt;&lt;td&gt;11.7 &amp;#177; 1.7&lt;/td&gt;&lt;td&gt;11.2 &amp;#177; 1.8&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="5"&gt;SRS&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social communication&lt;/td&gt;&lt;td&gt;37.52 &amp;#177; 14.18&lt;/td&gt;&lt;td&gt;10.09 &amp;#177; 7.67&lt;/td&gt;&lt;td&gt;19.31 &amp;#177; 12.51&lt;/td&gt;&lt;td&gt;9.24 &amp;#177; 7.53&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Stereotyped behavior&lt;/td&gt;&lt;td&gt;18.99 &amp;#177; 7.23&lt;/td&gt;&lt;td&gt;3.80 &amp;#177; 4.42&lt;/td&gt;&lt;td&gt;11.14 &amp;#177; 7.56&lt;/td&gt;&lt;td&gt;4.59 &amp;#177; 4.18&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social awareness&lt;/td&gt;&lt;td&gt;21.72 &amp;#177; 5.10&lt;/td&gt;&lt;td&gt;11.34 &amp;#177; 6.18&lt;/td&gt;&lt;td&gt;21.14 &amp;#177; 4.71&lt;/td&gt;&lt;td&gt;15.15 &amp;#177; 7.50&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social emotion&lt;/td&gt;&lt;td&gt;11.77 &amp;#177; 4.73&lt;/td&gt;&lt;td&gt;3.50 &amp;#177; 3.24&lt;/td&gt;&lt;td&gt;5.50 &amp;#177; 3.32&lt;/td&gt;&lt;td&gt;3.42 &amp;#177; 2.82&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>1 ASD: autism spectrum disorder, SRS: Social Responsiveness Scale, NAC: non-autistic comparison.</p> <p>Forty percent of the clinical cohort was separated as an independent test data set; the demographic distribution was similar between the training data set and test data set (Supplementary Table 1). All <emph>p</emph> values, except for the sex ratio among ASD participants (chi-square test <emph>p</emph> = 0.032), were greater than 0.05.</p> <hd id="AN0189325696-23">Model performance of different modalities and different training data sets</hd> <p>Machine-learning models trained using different methods achieved similarly high performance within the sample (Table 2, clinical cohort: sensitivity 0.91–0.95, specificity 0.93–0.94; community cohort: sensitivity 0.91–1.0, specificity 0.89–0.96). We selected the models with the highest AUC for the subsequent analysis. Namely, the linear discriminant model from the clinical cohort (AUC = 0.982) and the random forest model from the community cohort (AUC = 0.989) were chosen and applied to the validation cohort to examine the cross-cohort effect and subsequent error pattern analysis. The model from the community model performed less favorably when applied to the independent test set derived clinical cohort (sensitivity 0.65, specificity 0.95, Table 4). Similarly, when we changed the test data set to a community-based one, the findings of decreased model performance and the clinical correlates remained similar. We examined the overlap of misclassified cases in the clinical test set between models. Of the autistic participants misclassified by the clinical model (<emph>n</emph> = 23), most were also misclassified by the community model (<emph>n</emph> = 20). In contrast, overlap among misclassified non-autistic participants was lower: 9 out of 23 (community model) and 30 (clinical model) were misclassified by both. These findings are illustrated in Supplementary Figure 1.</p> <p>Table 2. Performance metrics of different models trained by clinical and community cohort.</p> <p>Graph</p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" colspan="6"&gt;Trained by clinical cohort&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;AUC&lt;/th&gt;&lt;th align="left"&gt;Recall (sensitivity)&lt;/th&gt;&lt;th align="left"&gt;True negative rate (specificity)&lt;/th&gt;&lt;th align="left"&gt;Precision (positive predictive value)&lt;/th&gt;&lt;th align="left"&gt;F1 Score&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;SVM&lt;/td&gt;&lt;td&gt;0.94&lt;/td&gt;&lt;td&gt;0.947&lt;/td&gt;&lt;td&gt;0.944&lt;/td&gt;&lt;td&gt;0.937&lt;/td&gt;&lt;td&gt;0.942&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Random forest&lt;/td&gt;&lt;td&gt;0.981&lt;/td&gt;&lt;td&gt;0.952&lt;/td&gt;&lt;td&gt;0.929&lt;/td&gt;&lt;td&gt;0.939&lt;/td&gt;&lt;td&gt;0.945&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;KNN&lt;/td&gt;&lt;td&gt;0.967&lt;/td&gt;&lt;td&gt;0.927&lt;/td&gt;&lt;td&gt;0.929&lt;/td&gt;&lt;td&gt;0.937&lt;/td&gt;&lt;td&gt;0.932&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Linear discriminant&lt;/td&gt;&lt;td&gt;0.982&lt;/td&gt;&lt;td&gt;0.923&lt;/td&gt;&lt;td&gt;0.939&lt;/td&gt;&lt;td&gt;0.945&lt;/td&gt;&lt;td&gt;0.934&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Neural network&lt;/td&gt;&lt;td&gt;0.936&lt;/td&gt;&lt;td&gt;0.931&lt;/td&gt;&lt;td&gt;0.941&lt;/td&gt;&lt;td&gt;0.947&lt;/td&gt;&lt;td&gt;0.939&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Decision tree&lt;/td&gt;&lt;td&gt;0.915&lt;/td&gt;&lt;td&gt;0.906&lt;/td&gt;&lt;td&gt;0.924&lt;/td&gt;&lt;td&gt;0.932&lt;/td&gt;&lt;td&gt;0.919&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" colspan="6"&gt;Trained by community cohort (with under-sampling of NAC and over-sampling of ASD)&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;AUC&lt;/th&gt;&lt;th align="left"&gt;Recall (sensitivity)&lt;/th&gt;&lt;th align="left"&gt;True negative rate (specificity)&lt;/th&gt;&lt;th align="left"&gt;Precision (positive predictive value)&lt;/th&gt;&lt;th align="left"&gt;F1 Score&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SVM&lt;/td&gt;&lt;td&gt;0.971&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0.942&lt;/td&gt;&lt;td&gt;0.894&lt;/td&gt;&lt;td&gt;0.944&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Random forest&lt;/td&gt;&lt;td&gt;0.989&lt;/td&gt;&lt;td&gt;0.919&lt;/td&gt;&lt;td&gt;0.963&lt;/td&gt;&lt;td&gt;0.941&lt;/td&gt;&lt;td&gt;0.925&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;KNN&lt;/td&gt;&lt;td&gt;0.941&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0.882&lt;/td&gt;&lt;td&gt;0.809&lt;/td&gt;&lt;td&gt;0.895&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Linear discriminant&lt;/td&gt;&lt;td&gt;0.973&lt;/td&gt;&lt;td&gt;0.949&lt;/td&gt;&lt;td&gt;0.887&lt;/td&gt;&lt;td&gt;0.801&lt;/td&gt;&lt;td&gt;0.865&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Neural network&lt;/td&gt;&lt;td&gt;0.904&lt;/td&gt;&lt;td&gt;0.909&lt;/td&gt;&lt;td&gt;0.897&lt;/td&gt;&lt;td&gt;0.816&lt;/td&gt;&lt;td&gt;0.861&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Decision tree&lt;/td&gt;&lt;td&gt;0.935&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0.871&lt;/td&gt;&lt;td&gt;0.791&lt;/td&gt;&lt;td&gt;0.883&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>2 ASD: autism spectrum disorder, AUC: area under the receiver operating characteristic curve, KNN: k-nearest neighbor, SVM: support vector machine; NAC: non-autistic comparison.</p> <hd id="AN0189325696-24">Clinical correlates of errors from different models</hd> <p>Regarding the wrong classification made by the SRS-based model from the clinical cohort, autistic participants misclassified as non-autistic exhibited lower emotional and behavioral symptoms across all CBCL subdomains. Conversely, non-autistic participants misclassified as autistic by the models based on SRS data displayed higher symptoms across all CBCL domains and a higher rate of current ADHD diagnosis (Table 3). About the wrong classification from the community model, autistic participants misclassified as non-autistic showed higher full-scale IQ and aggressive behaviors but fewer attention problems, social problems, thought problems, and withdrawal symptoms. On the contrary, non-autistic participants misclassified as autistic in the community model exhibited the same pattern as those in the clinical study model, with higher CBCL symptoms and rates of ADHD (Table 4).</p> <p>Table 3. Wrong classification of model trained by clinical cohort.</p> <p>Graph</p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" colspan="4"&gt;Misclassified ASD as non-ASD&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;Wrong cases (false negative)&lt;/th&gt;&lt;th align="left"&gt;Correct cases (true positive)&lt;/th&gt;&lt;th align="left"&gt;P&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;23&lt;/td&gt;&lt;td&gt;458&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sex&lt;/td&gt;&lt;td&gt;87% male (M:20, F:3)&lt;/td&gt;&lt;td&gt;85% male (M: 390, F:68)&lt;/td&gt;&lt;td&gt;0.082&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Age&lt;/td&gt;&lt;td&gt;9.64 &amp;#177; 3.97&lt;/td&gt;&lt;td&gt;10.02 &amp;#177; 4.57&lt;/td&gt;&lt;td&gt;0.698&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FIQ&lt;/td&gt;&lt;td&gt;95.158 &amp;#177; 19.1&lt;/td&gt;&lt;td&gt;94.788 &amp;#177; 23.35&lt;/td&gt;&lt;td&gt;0.946&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;CBCL&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Aggressive behavior&lt;/td&gt;&lt;td&gt;5.478 &amp;#177; 5.991&lt;/td&gt;&lt;td&gt;10.849 &amp;#177; 7.418&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Anxiety/depression&lt;/td&gt;&lt;td&gt;2.087 &amp;#177; 2.065&lt;/td&gt;&lt;td&gt;8.499 &amp;#177; 6.247&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Attention problem&lt;/td&gt;&lt;td&gt;4.783 &amp;#177; 3.397&lt;/td&gt;&lt;td&gt;10.768 &amp;#177; 4.141&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Delinquent behavior&lt;/td&gt;&lt;td&gt;1.609 &amp;#177; 1.373&lt;/td&gt;&lt;td&gt;3.547 &amp;#177; 2.779&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social problem&lt;/td&gt;&lt;td&gt;2.652 &amp;#177; 1.668&lt;/td&gt;&lt;td&gt;7.115 &amp;#177; 2.904&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Somatic problem&lt;/td&gt;&lt;td&gt;0.87 &amp;#177; 1.842&lt;/td&gt;&lt;td&gt;2.344 &amp;#177; 3.282&lt;/td&gt;&lt;td&gt;0.033&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Thought problem&lt;/td&gt;&lt;td&gt;0.652 &amp;#177; 0.714&lt;/td&gt;&lt;td&gt;3.815 &amp;#177; 2.563&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Withdrawal&lt;/td&gt;&lt;td&gt;1.783 &amp;#177; 1.757&lt;/td&gt;&lt;td&gt;6.216 &amp;#177; 3.609&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ADHD&lt;xref ref-type="table-fn" rid="tfn4"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;73% (&lt;italic&gt;N&lt;/italic&gt; = 11/15)&lt;/td&gt;&lt;td&gt;64% (&lt;italic&gt;N&lt;/italic&gt; = 193/302)&lt;/td&gt;&lt;td&gt;0.458&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ODD&lt;xref ref-type="table-fn" rid="tfn4"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;20% (&lt;italic&gt;N&lt;/italic&gt; = 3/15)&lt;/td&gt;&lt;td&gt;21% (&lt;italic&gt;N&lt;/italic&gt; = 63/302)&lt;/td&gt;&lt;td&gt;0.906&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" colspan="4"&gt;Misclassified non-ASD as ASD&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;Wrong cases (false positive)&lt;/th&gt;&lt;th align="left"&gt;Correct cases (true negative)&lt;/th&gt;&lt;th align="left"&gt;p&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;30&lt;/td&gt;&lt;td&gt;393&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sex&lt;/td&gt;&lt;td&gt;70% male(M:21, F: 9)&lt;/td&gt;&lt;td&gt;58% male(M: 227, F: 166)&lt;/td&gt;&lt;td&gt;0.19&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Age&lt;/td&gt;&lt;td&gt;11.60 &amp;#177; 4.88&lt;/td&gt;&lt;td&gt;11.29 &amp;#177; 4.98&lt;/td&gt;&lt;td&gt;0.744&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FIQ&lt;/td&gt;&lt;td&gt;110.607 &amp;#177; 12.133&lt;/td&gt;&lt;td&gt;109.217 &amp;#177; 12.281&lt;/td&gt;&lt;td&gt;0.564&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;CBCL&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Aggressive behavior&lt;/td&gt;&lt;td&gt;10.241 &amp;#177; 6.796&lt;/td&gt;&lt;td&gt;4.032 &amp;#177; 4.544&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Anxiety/depression&lt;/td&gt;&lt;td&gt;8.621 &amp;#177; 5.955&lt;/td&gt;&lt;td&gt;3.304 &amp;#177; 3.911&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Attention problem&lt;/td&gt;&lt;td&gt;8 &amp;#177; 4.528&lt;/td&gt;&lt;td&gt;2.544 &amp;#177; 2.861&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Delinquent behavior&lt;/td&gt;&lt;td&gt;2.69 &amp;#177; 2.855&lt;/td&gt;&lt;td&gt;1.258 &amp;#177; 1.692&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social problem&lt;/td&gt;&lt;td&gt;5.759 &amp;#177; 3.398&lt;/td&gt;&lt;td&gt;1.353 &amp;#177; 1.625&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Somatic problem&lt;/td&gt;&lt;td&gt;3.183 &amp;#177; 3.632&lt;/td&gt;&lt;td&gt;1.092 &amp;#177; 2.29&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Thought problem&lt;/td&gt;&lt;td&gt;2.862 &amp;#177; 2.615&lt;/td&gt;&lt;td&gt;0.613 &amp;#177; 1.068&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Withdrawal&lt;/td&gt;&lt;td&gt;6.241 &amp;#177; 3.398&lt;/td&gt;&lt;td&gt;1.883 &amp;#177; 2.181&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ADHD&lt;xref ref-type="table-fn" rid="tfn4"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;29% (&lt;italic&gt;N&lt;/italic&gt; = 6/21)&lt;/td&gt;&lt;td&gt;12% (&lt;italic&gt;N&lt;/italic&gt; = 25/209)&lt;/td&gt;&lt;td&gt;0.034&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ODD&lt;xref ref-type="table-fn" rid="tfn4"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;14% (&lt;italic&gt;N&lt;/italic&gt; = 3/21)&lt;/td&gt;&lt;td&gt;6% (&lt;italic&gt;N&lt;/italic&gt; = 13/209)&lt;/td&gt;&lt;td&gt;0.17&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <ulist> <item>3 CBCL: child behavioral checklist; ADHD: attention deficit/hyperactivity disorder; ODD: oppositional defiant disorder; <emph>p</emph> value were calculated by t-test in continuous variables and chi-square in categorical variables.</item> <item>4 The diagnostic rates of ADHD and ODD were calculated by dividing the number of diagnosed cases the number of participants with complete K-SADS-E information.</item> </ulist> <p>Table 4. Wrong classification of model trained by community cohort.</p> <p>Graph</p> <p> <ephtml> &lt;table&gt;&lt;colgroup&gt;&lt;col align="left" /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;col align="char" char="." /&gt;&lt;/colgroup&gt;&lt;thead&gt;&lt;tr&gt;&lt;th align="left" colspan="4"&gt;Misclassified ASD as non-ASD&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;Wrong cases (false negative)&lt;/th&gt;&lt;th align="left"&gt;Correct cases (true positive)&lt;/th&gt;&lt;th align="left"&gt;p&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;166&lt;/td&gt;&lt;td&gt;315&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;sex&lt;/td&gt;&lt;td&gt;86% male (M:143, F: 23)&lt;/td&gt;&lt;td&gt;85% male (M: 267, F: 48)&lt;/td&gt;&lt;td&gt;0.685&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Age&lt;/td&gt;&lt;td&gt;10.055 &amp;#177; 4.347&lt;/td&gt;&lt;td&gt;9.969 &amp;#177; 4.638&lt;/td&gt;&lt;td&gt;0.843&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FIQ&lt;/td&gt;&lt;td&gt;101.083 &amp;#177; 20.92&lt;/td&gt;&lt;td&gt;91.195 &amp;#177; 23.6&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;CBCL&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Aggressive behavior&lt;/td&gt;&lt;td&gt;11.939 &amp;#177; 7.74&lt;/td&gt;&lt;td&gt;9.841 &amp;#177; 7.176&lt;/td&gt;&lt;td&gt;0.004&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Anxiety/depression&lt;/td&gt;&lt;td&gt;7.534 &amp;#177; 5.59&lt;/td&gt;&lt;td&gt;8.535 &amp;#177; 6.584&lt;/td&gt;&lt;td&gt;0.1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Attention problem&lt;/td&gt;&lt;td&gt;9.266 &amp;#177; 4.147&lt;/td&gt;&lt;td&gt;11.13 &amp;#177; 4.254&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Delinquent behavior&lt;/td&gt;&lt;td&gt;3.61 &amp;#177; 2.851&lt;/td&gt;&lt;td&gt;3.364 &amp;#177; 2.706&lt;/td&gt;&lt;td&gt;0.359&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social problem&lt;/td&gt;&lt;td&gt;6.125 &amp;#177; 3.051&lt;/td&gt;&lt;td&gt;7.315 &amp;#177; 2.915&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Somatic problem&lt;/td&gt;&lt;td&gt;2.474 &amp;#177; 3.588&lt;/td&gt;&lt;td&gt;2.16 &amp;#177; 3.033&lt;/td&gt;&lt;td&gt;0.319&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Thought problem&lt;/td&gt;&lt;td&gt;3.28 &amp;#177; 2.587&lt;/td&gt;&lt;td&gt;3.864 &amp;#177; 2.582&lt;/td&gt;&lt;td&gt;0.02&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Withdrawal&lt;/td&gt;&lt;td&gt;5.155 &amp;#177; 3.291&lt;/td&gt;&lt;td&gt;6.456 &amp;#177; 3.786&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ADHD&lt;xref ref-type="table-fn" rid="tfn6"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;64% (&lt;italic&gt;N&lt;/italic&gt; = 74/115)&lt;/td&gt;&lt;td&gt;64% (&lt;italic&gt;N&lt;/italic&gt; = 130/202)&lt;/td&gt;&lt;td&gt;0.999&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ODD&lt;xref ref-type="table-fn" rid="tfn6"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;23% (&lt;italic&gt;N&lt;/italic&gt; = 26/115)&lt;/td&gt;&lt;td&gt;20% (&lt;italic&gt;N&lt;/italic&gt; = 40/202)&lt;/td&gt;&lt;td&gt;0.521&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th align="left" colspan="4"&gt;Misclassified non-ASD as ASD&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;th /&gt;&lt;th align="left"&gt;Wrong cases (false positive)&lt;/th&gt;&lt;th align="left"&gt;Correct cases (true negative)&lt;/th&gt;&lt;th align="left"&gt;p&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;N&lt;/td&gt;&lt;td&gt;23&lt;/td&gt;&lt;td&gt;400&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;sex&lt;/td&gt;&lt;td&gt;57% male (M: 13, F: 10)&lt;/td&gt;&lt;td&gt;59% male (M: 235, F: 165)&lt;/td&gt;&lt;td&gt;0.833&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Age&lt;/td&gt;&lt;td&gt;10.456 &amp;#177; 4.041&lt;/td&gt;&lt;td&gt;11.358 &amp;#177; 5.014&lt;/td&gt;&lt;td&gt;0.398&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FIQ&lt;/td&gt;&lt;td&gt;105.714 &amp;#177; 14.547&lt;/td&gt;&lt;td&gt;109.523 &amp;#177; 12.107&lt;/td&gt;&lt;td&gt;0.166&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="4"&gt;CBCL&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Aggressive behavior&lt;/td&gt;&lt;td&gt;7.773 &amp;#177; 6.546&lt;/td&gt;&lt;td&gt;4.284 &amp;#177; 4.83&lt;/td&gt;&lt;td&gt;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Anxiety/depression&lt;/td&gt;&lt;td&gt;8.5 &amp;#177; 6.653&lt;/td&gt;&lt;td&gt;3.407 &amp;#177; 3.969&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Attention problem&lt;/td&gt;&lt;td&gt;6.405 &amp;#177; 5.008&lt;/td&gt;&lt;td&gt;2.734 &amp;#177; 3.084&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Delinquent behavior&lt;/td&gt;&lt;td&gt;2.409 &amp;#177; 2.501&lt;/td&gt;&lt;td&gt;1.3 &amp;#177; 1.771&lt;/td&gt;&lt;td&gt;0.006&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Social problem&lt;/td&gt;&lt;td&gt;4.636 &amp;#177; 4.054&lt;/td&gt;&lt;td&gt;1.496 &amp;#177; 1.833&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Somatic problem&lt;/td&gt;&lt;td&gt;2.318 &amp;#177; 3.708&lt;/td&gt;&lt;td&gt;1.179 &amp;#177; 2.365&lt;/td&gt;&lt;td&gt;0.035&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Thought problem&lt;/td&gt;&lt;td&gt;2.045 &amp;#177; 2.236&lt;/td&gt;&lt;td&gt;0.7 &amp;#177; 1.265&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Withdrawal&lt;/td&gt;&lt;td&gt;5.136 &amp;#177; 3.796&lt;/td&gt;&lt;td&gt;2.024 &amp;#177; 2.352&lt;/td&gt;&lt;td&gt;&amp;#60;0.001&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ADHD&lt;xref ref-type="table-fn" rid="tfn6"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;35% (&lt;italic&gt;N&lt;/italic&gt; = 5/14)&lt;/td&gt;&lt;td&gt;12% (&lt;italic&gt;N&lt;/italic&gt; = 26/216)&lt;/td&gt;&lt;td&gt;0.012&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Current diagnosis of ODD&lt;xref ref-type="table-fn" rid="tfn6"&gt;&amp;#42;&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;7% (&lt;italic&gt;N&lt;/italic&gt; = 1/14)&lt;/td&gt;&lt;td&gt;7% (&lt;italic&gt;N&lt;/italic&gt; = 15/216)&lt;/td&gt;&lt;td&gt;0.981&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <ulist> <item>5 CBCL: child behavioral checklist; ADHD: attention deficit/hyperactivity disorder; ODD: oppositional defiant disorder; p value were calculated by t-test in continuous variables and chi-square test in categorical variables.</item> <item>6 The diagnostic rates of ADHD and ODD were calculated by dividing the number of diagnosed cases the number of participants with complete K-SADS-E information.</item> </ulist> <p>The distribution of CBCL subscales between diagnoses and classifications from the models is depicted in Figure 2 (model from the clinical cohort) and Figure 3 (model from the community cohort). Although some CBCL subscales showed significant differences between correct and incorrect classifications, their distributions were generally less discriminative in the model from the community cohort (Figure 3).</p> <p>Graph: Figure 2. Classification by model from clinical cohort.The left panel of each subscale of CBCL represents the classification among non-autistic comparisons (NAC), while the right panel represents the classification among autistic participants (ASD). On the x-axis, label 1 indicates that the model classification is ASD, whereas label 0 indicates that the model classification is non-ASD. For example, label 1 in the left panel (NAC) signifies the incorrect classification of NAC as autistic, while label 1 in the right panel (ASD) indicates correct classification.* p &lt; 0.05; p &lt; 0.01; * p &lt; 0.001.</p> <p>Graph: Figure 3. Classification by model from community cohort.The left panel of each subscale of CBCL represents the classification among non-autistic comparisons (NAC), while the right panel represents the classification among autistic participants (ASD). On the x-axis, label 1 indicates that the model classification is ASD, whereas label 0 indicates that the model classification is non-ASD. For example, label 1 in the left panel (NAC) signifies the incorrect classification of NAC as autistic, while label 1 in the right panel (ASD) indicates correct classification.* p &lt; 0.05, p &lt; 0.01, * p &lt; 0.001.</p> <p>The model trained on the sampled community data set was applied back to the original, full community cohort to evaluate its performance under natural class imbalance. The model achieved an AUC of 0.83, with a sensitivity of 57.6% and a specificity of 89.9%. Regarding the prediction errors in community cohorts by clinical model, similarly, most all CBCL subscales remained significantly different between correct and wrong classifications (see Supplementary Table 2). The diagnosis rate of ADHD was also higher in the incorrect classification of non-autistic as autistic participants in the additional analysis.</p> <hd id="AN0189325696-25">Evaluation of alternative sampling methods and advanced model</hd> <p>In the community cohort, the XGBoost model trained with or without SMOTE both achieved high performance within the training data set (AUC = 0.99, sensitivity = 0.99, specificity = 0.99). However, in cross-cohort testing using the clinical data set, the model demonstrated signs of overfitting with decreased generalizability (AUC = 0.88, sensitivity = 0.11, specificity = 1.00). While the trend of CBCL differences between correctly and incorrectly classified autistic participants persisted (Supplementary Table 3), the number of misclassified non-autistic participants was too small for meaningful subgroup analysis.</p> <hd id="AN0189325696-26">Exploratory analysis of the classification condition and SRS scores</hd> <p>Machine-learning models have provided improved discrimination regarding good sensitivity and specificity. Figure 4 provided the distribution between classification conditions and SRS total and sub-scores to understand how different subscales of the SRS contribute to the classification. The model trained by the clinical cohort was employed due to its superior performance in the test data set. The distribution revealed better differentiation than any single-point cutoff. The difference in distribution was most pronounced in the social communication subscale but was much less clear in the social awareness subscale.</p> <p>Graph: Figure 4. Distribution of total and sub-scores of SRS between classification condition.Distribution of total scores and four sub-scores, social communication, stereotyped behavior, social awareness, and social emotion, of Social Responsiveness Scale (SRS) by classification condition. The upper panel (A) is the classification of autistic participants and the lower panel (B) is of non-autistic participants. X-axis label means classified by model as autistic or as non-autistic.</p> <hd id="AN0189325696-27">Discussion</hd> <p>This study represents one of the few attempts to focus specifically on the clinical correlates of misclassification from machine-learning classification models of ASD trained using two distinct types of large data sets. Our model reported comparable performance to previous studies ([<reflink idref="bib23" id="ref80">23</reflink>], [<reflink idref="bib22" id="ref81">22</reflink>]) with AUC &gt; 0.90. Moreover, we found that the misclassification is closely linked to general emotional and behavioral problems. Furthermore, although different data sets can achieve good performance within samples, performance stability across different contexts cannot be guaranteed.</p> <hd id="AN0189325696-28">Models perform well within context but struggle to generalize across cohorts</hd> <p>Our finding is largely in line with our hypothesis and previous research. The two data sets utilized in this study serve different purposes in real-world applications. The clinical cohort, comprising autistic participants with more prominent autistic features and a more balanced comparison group, may be more suitable for hospital-based clinical use. However, the complexity of co-occurring conditions in real-world settings could be a limitation ([<reflink idref="bib49" id="ref82">49</reflink>]). Conversely, the community cohort, with less prominent autistic participants but a representative comparison group, may be more appropriate for screening in primary care and community settings, albeit with potentially reduced sensitivity/specificity ([<reflink idref="bib5" id="ref83">5</reflink>]; [<reflink idref="bib31" id="ref84">31</reflink>]). We established a test data set for examining the performance between models and investigating error patterns by separating an independent portion within the clinical cohort. This cohort had better characterization and confirmed diagnoses through clinical evaluations and psychiatric interviews. Therefore, the performance of the model developed using the clinical cohort was unsurprisingly better than that of the model from the community cohort. It is common for model performance to decrease when deployed in a novel setting outside of the primary training situation ([<reflink idref="bib5" id="ref85">5</reflink>]; [<reflink idref="bib22" id="ref86">22</reflink>]). Hence, even a model with good performance during initial validation should still be examined in the local setting for practical use ([<reflink idref="bib44" id="ref87">44</reflink>]). This condition remained true when we changed the validation set to a community-based one, wherein the model from the clinical cohort performed worse. However, the prevalence of autism was relatively low in the community ([<reflink idref="bib13" id="ref88">13</reflink>]). Although the prevalence rate did not affect sensitivity or specificity statistically, the challenge of obtaining enough participants with a community study design could limit the representativeness of these autistic participants. Establishing a representative sample with a balanced case-control ratio remains a long-standing challenge when studying relatively rare conditions, and continuous work is warranted.</p> <hd id="AN0189325696-29">Misclassifications are closely tied to emotional and behavioral symptoms</hd> <p>The error patterns in our models remained closely correlated with emotional and behavioral problems. This error pattern may be partly intrinsic to the data modalities, as the manifestation of autistic traits often correlates with a broad spectrum of emotional and behavioral phenotypes ([<reflink idref="bib21" id="ref89">21</reflink>]; [<reflink idref="bib33" id="ref90">33</reflink>]). In addition, questionnaire-based assessments such as the SRS are inherently subject to rater variability, including differences in interpretation or response tendencies, which may contribute to misclassification. For example, raters may attribute other emotional or behavioral problems—such as anxiety or defiance—as autistic traits, or vice versa. Observer leniency or strictness may also lead to inflated or deflated ratings, ultimately affecting classification accuracy ([<reflink idref="bib36" id="ref91">36</reflink>]). Understanding the model's misclassification patterns may thus help reveal whether rater bias is contributing to prediction errors. Moreover, conducting exploratory error analysis of false-positive and false-negative groups within the machine-learning diagnostic models has led to identifying distinct subgroups not prospectively defined a priori ([<reflink idref="bib44" id="ref92">44</reflink>]). These findings may offer deeper insights into the underlying heterogeneity as the heterogeneity within ASD has long been a topic of debate ([<reflink idref="bib48" id="ref93">48</reflink>]). These error patterns suggest that co-occurring emotional and behavioral symptoms could serve as potentially meaningful subgrouping markers of ASD, as previously suggested ([<reflink idref="bib51" id="ref94">51</reflink>]). The clinical correlates of error classification remained similar in our additional analysis using a clinical-based model to predict a community-based data set. The results confirmed the robustness of our findings.</p> <hd id="AN0189325696-30">Instability in error patterns may arise from overlapping symptoms and modeling constraints</hd> <p>Nevertheless, the error pattern is less stable in the model trained using the community cohort, with non-significant differences in anxiety/depression, delinquent behavior, somatic problems, and even higher aggressive behavior in the misclassification of autistic participants as non-autistic. This instability may be attributed to several reasons. First, the low prevalence rate in the community cohort may limit the representativeness of the autistic participants. Also, the over- and under-sampling process to obtain a balanced training set could lead to problems of overfitting and potentially impact the model performance ([<reflink idref="bib47" id="ref95">47</reflink>]). Besides, the exclusion criteria of the clinical study may provide a comparison group with less emotional and behavioral co-occurring conditions. While the community cohort had comparison group with diverse diagnosis and less prominent autistic participants. In addition, it is also possible that these components of emotional/behavioral problems are less correlated with core autistic features and/or highly correlated with other potential diagnoses, such as anxiety/depression with anxiety disorder or depressive disorder ([<reflink idref="bib13" id="ref96">13</reflink>]), and aggressive behavior with ADHD ([<reflink idref="bib59" id="ref97">59</reflink>]). On the contrary, the significant association of IQ and diagnosis of ADHD with misclassification cases reminds us of the difficulties of identifying higher-functioning autism in real-life settings. A large body of evidence has revealed that autistic populations with higher function, such as high IQ, suffer from late diagnosis, particularly for females ([<reflink idref="bib4" id="ref98">4</reflink>]; [<reflink idref="bib41" id="ref99">41</reflink>]). Studies have also reported difficulties differentiating autism and ADHD ([<reflink idref="bib2" id="ref100">2</reflink>]; [<reflink idref="bib18" id="ref101">18</reflink>]; [<reflink idref="bib59" id="ref102">59</reflink>]). Co-occurring conditions should be carefully considered when deploying any diagnostic tools, and the potential interactions between these factors—such as age, sex, and cognitive ability—and their impact on diagnostic algorithms warrant further investigation, particularly in cases with higher misclassification rates. Our study echoes recent calls for the importance of clinical audits of machine-learning models.</p> <hd id="AN0189325696-31">Model insights may support item reduction for more efficient tools</hd> <p>Our exploratory analysis revealed that machine-learning classification models outperformed any single-point cut-off of SRS, consistent with most previous studies that highlight the robustness of machine-learning methods ([<reflink idref="bib23" id="ref103">23</reflink>]; [<reflink idref="bib54" id="ref104">54</reflink>]). We also noted that the social communication subscale seemed to best differentiate between autistic participants and comparisons. This result is reasonable since this subscale comprises the most core autistic features ([<reflink idref="bib29" id="ref105">29</reflink>]). On the contrary, the border of social awareness seemed to be blurred. Previous psychometric studies have revealed that although all subscales show good reliability and validity, the social awareness subscale has the lowest test–retest correlation (intra-class correlation: 0.751 compared with total and other subscales: 0.767–0.852) and second-lowest internal consistency (Cronbach's alpha 0.866 compared with total and other subscales 0.742–0.951) ([<reflink idref="bib29" id="ref106">29</reflink>]). In addition, the models used in this study—LDA for the clinical cohort and random forest for the community cohort—include internal mechanisms for assessing feature contributions. Future studies could build on this by examining the relative importance of individual items or subscales. Such analysis may help identify a reduced set of features that retain predictive utility, thereby supporting the development of shortened versions of questionnaires to facilitate clinical use.</p> <hd id="AN0189325696-32">Strengths, limitations, and future directions</hd> <p>Our study's strengths lie in using large data sets across diverse backgrounds and a comprehensive evaluation of error patterns in classification models. However, some limitations merit consideration when interpreting the results. First, there is a variation in diagnostic methods between the two cohorts: the clinical cohort diagnoses were determined through serial, rigorous clinical evaluation, confirmed by the Mandarin version of the K-SADS-E interview and also ADOS and ADI-R, while the diagnoses in the community cohort were based on medical records and on the K-SADS-E interview at the school settings. This discrepancy in diagnostic rigor may impact model performance, although the specific effects are challenging to anticipate. Second, the composition of the comparison groups differs across the two cohorts. The study cohort, subjected to strict exclusion criteria, included only participants with no major psychiatric diagnoses in the comparison group. In contrast, the community cohort included participants with other diagnoses in the comparison group to reflect real-world screening scenarios, potentially introducing confounding factors when comparing model error patterns between models. Third, it remains uncertain whether the observed error patterns are unique to behavioral data or if they also manifest in models based on other data modalities. Further comparative studies are needed to explore biases and errors across different machine-learning models. Fourth, using the CBCL to characterize co-occurring conditions may limit our ability to capture specific co-occurring conditions. Although it broadly assesses emotional and behavioral symptoms, it does not fully reflect the complexity or specificity of psychiatric diagnoses. We selected the CBCL due to its widespread use, strong psychometric properties, and consistent availability across both cohorts in our study. Nonetheless, future research should incorporate structured diagnostic interviews or multi-informant assessments to better understand error patterns influencing model performance. Fifth, while we employed simple under-/oversampling methods and classic machine-learning algorithms to prioritize interpretability and exploratory analysis, more advanced sampling strategies and model types may yield different results. Our additional analysis using SMOTE and XGBoost showed high within-sample performance but revealed substantial overfitting when applied to a cross-cohort test set. Future research should systematically compare performance and error patterns across various algorithms and sampling techniques to better inform model generalizability. Finally, the exploration of clinical correlates was not exhaustive. Future research should consider additional variables, such as biological indices, familial factors, and social context, to provide a more complete understanding of the factors that affect model performance.</p> <hd id="AN0189325696-33">Conclusion</hd> <p>Our findings indicate that machine-learning models can exhibit error patterns correlated to certain clinical features. Autistic people with fewer emotional and behavioral symptoms and higher IQ may be more likely to be misclassified as non-autistic. In contrast, non-autistic people with more emotional and behavioral symptoms, particularly those with ADHD, are more likely to be misclassified as autistic. These findings underscore the urgent need for additional research to refine machine-learning models for ASD diagnosis by considering the influence of co-occurring conditions and the distinctive traits of different data sets. Specifically, our findings highlight the importance of using comprehensive and well-characterized training data, and of monitoring shifts in error patterns when developing models in highly unbalanced data sets or across different testing contexts. Future research should also consider evaluating feature importance to enhance model interpretability and guide the refinement of diagnostic tools. Our research also highlights the critical need for meticulously selecting diagnostic tools and acknowledges the challenges encountered in real-life settings. Our study further emphasizes the necessity of regular clinical evaluations and audits of machine-learning models to enhance their accuracy and reliability. Future studies should focus on rigorously assessing the error patterns of these machine-learning models and the impact of training methods to improve their diagnostic precision.</p> <hd id="AN0189325696-34">Supplemental Material</hd> <p>Graph: Supplemental material, sj-docx-1-aut-10.1177_13623613251360271 for Clinical correlates of errors in machine-learning diagnostic model of autism spectrum disorder: Impact of sample cohorts by Yen-Chin Wang, Chung-Yuan Cheng, Chi-Shin Wu, Chi-Chun Lee and Susan Shur-Fen Gau in Autism</p> <p>The authors thank all the participants and their parents for participating in this study.</p> <ref id="AN0189325696-35"> <title> References </title> <blist> <bibl id="bib1" idref="ref35" type="bt">1</bibl> <bibtext> Achenbach T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991 profile. University of Vermont, Department of Psychiatry.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref36" type="bt">2</bibl> <bibtext> Antshel K. M., Russo N. (2019). Autism spectrum disorders and ADHD: Overlapping phenomenology, diagnostic issues, and treatment considerations. Current Psychiatry Reports, 21, 1–11.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref37" type="bt">3</bibl> <bibtext> Balakrishnama S., Ganapathiraju A. (1998). Linear discriminant analysis-a brief tutorial. Institute for Signal and Information Processing, 18(1998), 1–8.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref98" type="bt">4</bibl> <bibtext> Begeer S., Mandell D., Wijnker-Holmes B., Venderbosch S., Rem D., Stekelenburg F., Koot H. M. (2013). Sex differences in the timing of identification among children and adults with autism spectrum disorders. Journal of Autism and Developmental Disorders, 43, 1151–1156.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref20" type="bt">5</bibl> <bibtext> Bone D., Goodwin M. S., Black M. P., Lee C.-C., Audhkhasi K., Narayanan S. (2015). Applying machine learning to facilitate autism diagnostics: Pitfalls and promises. Journal of Autism and Developmental Disorders, 45(5), 1121–1136. https://doi.org/10.1007/s10803-014-2268-6</bibtext> </blist> <blist> <bibl id="bib6" idref="ref68" type="bt">6</bibl> <bibtext> Breiman L. (2001). Random forests. Machine Learning, 45, 5–32.</bibtext> </blist> <blist> <bibl id="bib7" idref="ref39" type="bt">7</bibl> <bibtext> Chang J. C., Lai M. C., Chien Y. L., Cheng C. Y., Wu Y. Y., Gau S. S. (2023). Psychometric properties of the Mandarin version of the autism diagnostic observation schedule-generic. Journal of the Formosan Medical Association, 122(7), 574–583. https://doi.org/10.1016/j.jfma.2023.01.008</bibtext> </blist> <blist> <bibl id="bib8" idref="ref62" type="bt">8</bibl> <bibtext> Chang J. C., Lin H. Y., Gau S. S. (2024). Distinct developmental changes in regional gray matter volume and covariance in individuals with attention-deficit hyperactivity disorder: A longitudinal voxel-based morphometry study. Asian Journal of Psychiatry, 91, 103860. https://doi.org/10.1016/j.ajp.2023.103860</bibtext> </blist> <blist> <bibl id="bib9" idref="ref61" type="bt">9</bibl> <bibtext> Chang J. P.-C., Lai M.-C., Chou M.-C., Shang C.-Y., Chiu Y.-N., Tsai W.-C., Wu Y.-Y., Gau S. S.-F. (2019). Maternal and family processes in different subgroups of youth with autism spectrum disorder. Journal of Abnormal Child Psychology, 47(1), 177–194.</bibtext> </blist> <blist> <bibtext> Chawla N. V., Bowyer K. W., Hall L. O., Kegelmeyer W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.</bibtext> </blist> <blist> <bibtext> Chen C. P., Pan H. H., Gau S. S. F., Lee C. C. (2024). Using measures of vowel space for autistic traits characterization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 591–607. https://doi.org/10.1109/TASLP.2023.3330605</bibtext> </blist> <blist> <bibtext> Chen T., Guestrin C. (2016, August 13–17). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco.</bibtext> </blist> <blist> <bibtext> Chen Y.-L., Chen W. J., Lin K.-C., Shen L.-J., Gau S. S.-F. (2019). Prevalence of DSM-5 mental disorders in a nationally representative sample of children in Taiwan: Methodology and main findings. Epidemiology and Psychiatric Sciences, 29, Article e15. https://doi.org/10.1017/S2045796018000793</bibtext> </blist> <blist> <bibtext> Chen Y.-L., Shen L.-J., Gau S. S.-F. (2017). The Mandarin version of the Kiddie-Schedule for Affective Disorders and Schizophrenia-Epidemiological version for DSM–5: A psychometric study. Journal of the Formosan Medical Association, 116(9), 671–678.</bibtext> </blist> <blist> <bibtext> Chiang H. L., Gau S. S. (2016). Comorbid psychiatric conditions as mediators to predict later social adjustment in youths with autism spectrum disorder. Journal of Child Psychology and Psychiatry and Allied Disciplines, 57(1), 103–111. https://doi.org/10.1111/jcpp.12450</bibtext> </blist> <blist> <bibtext> Chiang H. L., Kao W. C., Chou M. C., Chou W. J., Chiu Y. N., Wu Y. Y., Gau S. S. (2018). School dysfunction in youth with autistic spectrum disorder in Taiwan: The effect of subtype and ADHD. Autism Research, 11(6), 857–869. https://doi.org/10.1002/aur.1923</bibtext> </blist> <blist> <bibtext> Chiang H. L., Wu C. S., Chen C. L., Tseng W. I., Gau S. S. (2024). Machine-learning-based feature selection to identify attention-deficit hyperactivity disorder using whole-brain white matter microstructure: A longitudinal study. Asian Journal of Psychiatry, 97, 104087. https://doi.org/10.1016/j.ajp.2024.104087</bibtext> </blist> <blist> <bibtext> Chien Y.-L., Chou M.-C., Chiu Y.-N., Chou W.-J., Wu Y.-Y., Tsai W.-C., Gau S. S.-F. (2017). ADHD-related symptoms and attention profiles in the unaffected siblings of probands with autism spectrum disorder: Focus on the subtypes of autism and Asperger's disorder. Molecular Autism, 8(1), 37. https://doi.org/10.1186/s13229-017-0153-9</bibtext> </blist> <blist> <bibtext> Constantino J. N. (2021). Social Responsiveness Scale. In Encyclopedia of autism spectrum disorders (pp. 4457–4467). Springer.</bibtext> </blist> <blist> <bibtext> Constantino J. N., Davis S. A., Todd R. D., Schindler M. K., Gross M. M., Brophy S. L., Metzger L. M., Shoushtari C. S., Splinter R., Reich W. (2003). Validation of a brief quantitative measure of autistic traits: Comparison of the Social Responsiveness Scale with the autism diagnostic interview-revised. Journal of Autism and Developmental Disorders, 33(4), 427–433.</bibtext> </blist> <blist> <bibtext> Constantino J. N., Zhang Y., Frazier T., Abbacchi A. M., Law P. (2010). Sibling recurrence and the genetic epidemiology of autism. American Journal of Psychiatry, 167(11), 1349–1356.</bibtext> </blist> <blist> <bibtext> Duda M., Haber N., Daniels J., Wall D. P. (2017). Crowdsourced validation of a machine-learning classification system for autism and ADHD. Translational Psychiatry, 7(5), e1133–e1133. https://doi.org/10.1038/tp.2017.86</bibtext> </blist> <blist> <bibtext> Duda M., Ma R., Haber N., Wall D. P. (2016). Use of machine learning for behavioral distinction of autism and ADHD. Translational Psychiatry, 6(2), e732–e732. https://doi.org/10.1038/tp.2015.221</bibtext> </blist> <blist> <bibtext> Dworzynski K., Ronald A., Bolton P., Happé F. (2012). How different are girls and boys above and below the diagnostic threshold for autism spectrum disorders? Journal of the American Academy of Child and Adolescent Psychiatry, 51(8), 788–797. https://doi.org/10.1016/j.jaac.2012.05.018</bibtext> </blist> <blist> <bibtext> Dwyer D., Koutsouleris N. (2022). Annual research review: Translational machine learning for child and adolescent psychiatry. Journal of Child Psychology and Psychiatry, 63(4), 421–443. https://doi.org/10.1111/jcpp.13545</bibtext> </blist> <blist> <bibtext> Engchuan W., Dhindsa K., Lionel A. C., Scherer S. W., Chan J. H., Merico D. (2015). Performance of case-control rare copy number variation annotation in classification of autism. BMC Medical Genomics, 8, 1–10.</bibtext> </blist> <blist> <bibtext> Gau S. S., Lee C. M., Lai M. C., Chiu Y. N., Huang Y. F., Kao J. D., Wu Y. Y. (2011). Psychometric properties of the Chinese version of the social communication questionnaire. Research in Autism Spectrum Disorders, 5(2), 809–818. https://doi.org/10.1016/j.rasd.2010.09.010</bibtext> </blist> <blist> <bibtext> Gau S. S.-F., Chong M.-Y., Chen T. H., Cheng A. T. (2005). A 3-year panel study of mental disorders among adolescents in Taiwan. American Journal of Psychiatry, 162(7), 1344–1350.</bibtext> </blist> <blist> <bibtext> Gau S. S.-F., Liu L.-T., Wu Y.-Y., Chiu Y.-N., Tsai W.-C. (2013). Psychometric properties of the Chinese version of the Social Responsiveness Scale. Research in Autism Spectrum Disorders, 7(2), 349–360.</bibtext> </blist> <blist> <bibtext> Hearst M. A., Dumais S. T., Osuna E., Platt J., Scholkopf B. (1998). Support vector machines. IEEE Intelligent Systems and Their Applications, 13(4), 18–28.</bibtext> </blist> <blist> <bibtext> Holtman G. A., Berger M. Y., Burger H., Deeks J. J., Donner-Banzhoff N., Fanshawe T. R., Koshiaris C., Leeflang M. M., Oke J. L., Perera R., Reitsma J. B., Van den Bruel A. (2019). Development of practical recommendations for diagnostic accuracy studies in low-prevalence situations. Journal of Clinical Epidemiology, 114, 38–48. https://doi.org/10.1016/j.jclinepi.2019.05.018</bibtext> </blist> <blist> <bibtext> Huang C.-F., Lin Y.-S., Chiu Y.-N., Gau S. S.-F., Chen V. C.-H., Lin C.-F., Hsieh Y.-H., Liu W.-S., Chan H.-L., Wu Y.-Y. (2022). Validation of the Chinese version of the autism diagnostic interview-revised in autism spectrum disorder. Neuropsychiatric Disease and Treatment, 18, 327.</bibtext> </blist> <blist> <bibtext> Hus V., Bishop S., Gotham K., Huerta M., Lord C. (2013). Factors influencing scores on the Social Responsiveness Scale. Journal of Child Psychology and Psychiatry, 54(2), 216–224. https://doi.org/https://doi.org/10.1111/j.1469-7610.2012.02589.x</bibtext> </blist> <blist> <bibtext> Hyde K. K., Novack M. N., LaHaye N., Parlett-Pelleriti C., Anden R., Dixon D. R., Linstead E. (2019). Applications of supervised machine learning in autism spectrum disorder research: A review. Review Journal of Autism and Developmental Disorders, 6(2), 128–146. https://doi.org/10.1007/s40489-019-00158-x</bibtext> </blist> <blist> <bibtext> JASP Team. (2024). JASP (Version 0.18.3). https://jasp-stats.org/</bibtext> </blist> <blist> <bibtext> Kaufman N. K. (2022). Rethinking "gold standards" and "best practices" in the assessment of autism. Applied Neuropsychology: Child, 11(3), 529–540.</bibtext> </blist> <blist> <bibtext> Kirkovski M., Enticott P. G., Fitzgerald P. B. (2013). A review of the role of female gender in autism spectrum disorders. Journal of Autism and Developmental Disorders, 43(11), 2584–2603. https://doi.org/10.1007/s10803-013-1811-1</bibtext> </blist> <blist> <bibtext> Kojovic N., Natraj S., Mohanty S. P., Maillart T., Schaer M. (2021). Using 2D video-based pose estimation for automated prediction of autism spectrum disorders in young children. Scientific Reports, 11(1), 15069.</bibtext> </blist> <blist> <bibtext> Kotsiantis S. B. (2013). Decision trees: A recent overview. Artificial Intelligence Review, 39, 261–283.</bibtext> </blist> <blist> <bibtext> Kwong J. C. C., Khondker A., Lajkosz K., McDermott M. B. A., Frigola X. B., McCradden M. D., Mamdani M., Kulkarni G. S., Johnson A. E. W. (2023). APPRAISE-AI tool for quantitative evaluation of AI studies for clinical decision support. JAMA Network Open, 6(9), e2335377–e2335377. https://doi.org/10.1001/jamanetworkopen.2023.35377</bibtext> </blist> <blist> <bibtext> Leedham A., Thompson A. R., Smith R., Freeth M. (2019). 'I was exhausted trying to figure it out': The experiences of females receiving an autism diagnosis in middle to late adulthood. Autism, 24(1), 135–146. https://doi.org/10.1177/1362361319853442</bibtext> </blist> <blist> <bibtext> Lin H.-Y., Kessler D., Tseng W.-Y. I., Gau S. S.-F. (2021). Increased functional segregation related to the salience network in unaffected siblings of youths with attention-deficit/hyperactivity disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 60(1), 152–165.</bibtext> </blist> <blist> <bibtext> Lin Y. S., Gau S. S. F., Lee C. C. (2020). A multimodal interlocutor-modulated attentional BLSTM for classifying autism subgroups during clinical interviews. IEEE Journal of Selected Topics in Signal Processing, 14(2), 299–311. https://doi.org/10.1109/JSTSP.2020.2970578</bibtext> </blist> <blist> <bibtext> Liu X., Glocker B., McCradden M. M., Ghassemi M., Denniston A. K., Oakden-Rayner L. (2022). The medical algorithmic audit. The Lancet Digital Health, 4(5), e384–e397. https://doi.org/https://doi.org/10.1016/S2589-7500(22)00003-6</bibtext> </blist> <blist> <bibtext> Lord C., Risi S., Lambrecht L., Cook E. H., Leventhal B. L., DiLavore P. C., Pickles A., Rutter M. (2000). The Autism Diagnostic Observation Schedule—Generic: A standard measure of social and communication deficits associated with the spectrum of autism. Journal of Autism and Developmental Disorders, 30(3), 205–223.</bibtext> </blist> <blist> <bibtext> Lord C., Rutter M., Le Couteur A. (1994). Autism diagnostic interview-revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders, 24(5), 659–685.</bibtext> </blist> <blist> <bibtext> Mohammed R., Rawashdeh J., Abdullah M. (2020). Machine learning with oversampling and undersampling techniques: Overview study and experimental results [Conference session]. 2020 11th International Conference on Information and Communication Systems (ICICS).</bibtext> </blist> <blist> <bibtext> Mottron L., Bzdok D. (2020). Autism spectrum heterogeneity: Fact or artifact? Molecular Psychiatry, 25(12), 3178–3185.</bibtext> </blist> <blist> <bibtext> Mulder R., Singh A. B., Hamilton A., Das P., Outhred T., Morris G., Bassett D., Baune B. T., Berk M., Boyce P. (2018). The limitations of using randomised controlled trials as a basis for developing treatment guidelines. BMJ Mental Health, 21(1), 4–6.</bibtext> </blist> <blist> <bibtext> Nogay H. S., Adeli H. (2020). Machine learning (ML) for the diagnosis of autism spectrum disorder (ASD) using brain imaging. Reviews in the Neurosciences, 31(8), 825–841.</bibtext> </blist> <blist> <bibtext> Nordahl C. W., Iosif A. M., Young G. S., Hechtman A., Heath B., Lee J. K., Libero L., Reinhardt V. P., Winder-Patel B., Amaral D. G., Rogers S., Solomon M., Ozonoff S. (2020). High psychopathology subgroup in young children with autism: Associations with biological sex and Amygdala volume. Journal of the American Academy of Child and Adolescent Psychiatry, 59(12), 1353–1363.e1352. https://doi.org/10.1016/j.jaac.2019.11.022</bibtext> </blist> <blist> <bibtext> Rajpurkar P., Chen E., Banerjee O., Topol E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31–38.</bibtext> </blist> <blist> <bibtext> Scarselli F., Gori M., Tsoi A. C., Hagenbuchner M., Monfardini G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80.</bibtext> </blist> <blist> <bibtext> Schulte-Rüther M., Kulvicius T., Stroth S., Wolff N., Roessner V., Marschik P. B., Kamp-Becker I., Poustka L. (2023). Using machine learning to improve diagnostic assessment of ASD in the light of specific differential and co-occurring diagnoses. Journal of Child Psychology and Psychiatry, 64(1), 16–26. https://doi.org/https://doi.org/10.1111/jcpp.13650</bibtext> </blist> <blist> <bibtext> Shang C. Y., Gau S. S., Soong W. T. (2006). Association between childhood sleep problems and perinatal factors, parental mental distress and behavioral problems. Journal of Sleep Research, 15(1), 63–73. https://doi.org/10.1111/j.1365-2869.2006.00492.x</bibtext> </blist> <blist> <bibtext> Shang C. Y., Lin H. Y., Gau S. S. (2021). The norepinephrine transporter gene modulates intrinsic brain activity, visual memory, and visual attention in children with attention-deficit/hyperactivity disorder. Molecular Psychiatry, 26(8), 4026–4035. https://doi.org/10.1038/s41380-019-0545-7</bibtext> </blist> <blist> <bibtext> Taunk K., De S., Verma S., Swetapadma A. (2019, May 15–17). A brief review of nearest neighbor algorithm for learning and classification [Conference session]. 2019 International Conference on Intelligent Computing and Control Systems (ICCS).</bibtext> </blist> <blist> <bibtext> Tseng W. L., Gau S. S. F. (2013). Executive function as a mediator in the link between attention-deficit/hyperactivity disorder and social problems. Journal of Child Psychology and Psychiatry, 54(9), 996–1004.</bibtext> </blist> <blist> <bibtext> Tseng W.-L., Kawabata Y., Gau S. S.-F., Banny A. M., Lingras K. A., Crick N. R. (2012). Relations of inattention and hyperactivity/impulsivity to preadolescent peer functioning: The mediating roles of aggressive and prosocial behaviors. Journal of Clinical Child and Adolescent Psychology, 41(3), 275–287.</bibtext> </blist> <blist> <bibtext> Tung Y.-H., Lin H.-Y., Chen C.-L., Shang C.-Y., Yang L.-Y., Hsu Y.-C., Tseng W.-Y. I., Gau S. S.-F. (2021). Whole brain white matter tract deviation and idiosyncrasy from normative development in autism and ADHD and unaffected siblings link with dimensions of psychopathology and cognition. American Journal of Psychiatry, 178(8), 730–743.</bibtext> </blist> <blist> <bibtext> Washington P., Wall D. P. (2023). A review of and roadmap for data science and machine learning for the neuropsychiatric phenotype of autism. Annual Review of Biomedical Data Science, 6, 211–228.</bibtext> </blist> <blist> <bibtext> Wiens J., Saria S., Sendak M., Ghassemi M., Liu V. X., Doshi-Velez F., Jung K., Heller K., Kale D., Saeed M. (2019). Do no harm: A roadmap for responsible machine learning for health care. Nature Medicine, 25(9), 1337–1340.</bibtext> </blist> <blist> <bibtext> Xu M., Calhoun V., Jiang R., Yan W., Sui J. (2021). Brain imaging-based machine learning in autism spectrum disorder: Methods and applications. Journal of Neuroscience Methods, 361, 109271.</bibtext> </blist> </ref> <ref id="AN0189325696-36"> <title> Footnotes </title> <blist> <bibtext> Yen-Chin Wang</bibtext> </blist> <blist> <bibtext>Graph</bibtext> </blist> <blist> <bibtext>https://orcid.org/0000-0002-3420-5042 Chung-Yuan Cheng</bibtext> </blist> <blist> <bibtext>Graph</bibtext> </blist> <blist> <bibtext>https://orcid.org/0000-0003-1931-458X Susan Shur-Fen Gau</bibtext> </blist> <blist> <bibtext>Graph https://orcid.org/0000-0002-2718-8221</bibtext> </blist> <blist> <bibtext> The Research Ethics Committee of National Taiwan University Hospital (NTUH), Taipei, Taiwan, approved the data collection of clinical samples (approval number: 201201006RIB, ClinicalTrials.gov number: NCT01582256) and epidemiological samples (approval number: 201411056RIN, ClinicalTrials.gov number: NCT02707848) before recruitment. Complete confidentiality was ensured throughout the study. The NTUH Research Ethics Committee approved this work before data analysis (approval number: 202303089RINB, 202002086RIND; ClinicalTrials.gov number: NCT04873674).</bibtext> </blist> <blist> <bibtext> Written informed consent was obtained from both participants and their parents before participation.</bibtext> </blist> <blist> <bibtext> Yen-chin Wang: Conceptualization; Data curation; Formal analysis; Writing—original draft.Chung-Yuan Cheng: Methodology; Validation; Writing—review &amp; editing.Chi-Shin Wu: Methodology; Validation; Writing—review &amp; editing.Chi-Chun Lee: Methodology; Validation; Writing—review &amp; editing.SUSAN SHUR-FEN GAU: Conceptualization; Data curation; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Writing—review &amp; editing.</bibtext> </blist> <blist> <bibtext> The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by grants for data collection from the National Science and Technology Council and Ministry of Science and Technology, Taiwan (grant nos. NSC101-2627-B-002-002, NSC 101-2314-B-002-136-MY3, and MOST 103-2314-B-002-055-MY3) and National Taiwan University Hospital (grant no. NTUH101-S1910); and grants for this machine-learning analysis from the National Science and Technology Council and Ministry of Science and Technology, Taiwan (grant nos. MOST 109-2327-B-002-004, 110-2327-B-002-006, 111-2327-B-002-009, NSTC 112-2327-B-002-010, and NSTC 113-2321-B-002-025), and National Taiwan University Hospital (grant no. NTUH112-A168).</bibtext> </blist> <blist> <bibtext> The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.</bibtext> </blist> <blist> <bibtext> Patient-level data are not publicly available due to legal and ethical regulations under Taiwan's Personal Data Protection Act and Human Subjects Research Act. Access may be granted upon reasonable academic request to the corresponding author and approval by the relevant ethics committee. No custom code was used in this study; variables used in the analysis are listed in the Supplementary Material.</bibtext> </blist> <blist> <bibtext> Supplemental material for this article is available online.</bibtext> </blist> </ref> <aug> <p>By Yen-Chin Wang; Chung-Yuan Cheng; Chi-Shin Wu; Chi-Chun Lee and Susan Shur-Fen Gau</p> <p>Reported by Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib25" firstref="ref1"></nolink> <nolink nlid="nl2" bibid="bib34" firstref="ref2"></nolink> <nolink nlid="nl3" bibid="bib61" firstref="ref3"></nolink> <nolink nlid="nl4" bibid="bib23" firstref="ref5"></nolink> <nolink nlid="nl5" bibid="bib54" firstref="ref6"></nolink> <nolink nlid="nl6" bibid="bib38" firstref="ref7"></nolink> <nolink nlid="nl7" bibid="bib11" firstref="ref8"></nolink> <nolink nlid="nl8" bibid="bib43" firstref="ref9"></nolink> <nolink nlid="nl9" bibid="bib17" firstref="ref10"></nolink> <nolink nlid="nl10" bibid="bib50" firstref="ref11"></nolink> <nolink nlid="nl11" bibid="bib63" firstref="ref12"></nolink> <nolink nlid="nl12" bibid="bib26" firstref="ref13"></nolink> <nolink nlid="nl13" bibid="bib22" firstref="ref16"></nolink> <nolink nlid="nl14" bibid="bib19" firstref="ref17"></nolink> <nolink nlid="nl15" bibid="bib52" firstref="ref18"></nolink> <nolink nlid="nl16" bibid="bib62" firstref="ref19"></nolink> <nolink nlid="nl17" bibid="bib44" firstref="ref22"></nolink> <nolink nlid="nl18" bibid="bib40" firstref="ref24"></nolink> <nolink nlid="nl19" bibid="bib24" firstref="ref26"></nolink> <nolink nlid="nl20" bibid="bib37" firstref="ref27"></nolink> <nolink nlid="nl21" bibid="bib21" firstref="ref29"></nolink> <nolink nlid="nl22" bibid="bib33" firstref="ref30"></nolink> <nolink nlid="nl23" bibid="bib49" firstref="ref31"></nolink> <nolink nlid="nl24" bibid="bib31" firstref="ref32"></nolink> <nolink nlid="nl25" bibid="bib32" firstref="ref38"></nolink> <nolink nlid="nl26" bibid="bib14" firstref="ref40"></nolink> <nolink nlid="nl27" bibid="bib13" firstref="ref41"></nolink> <nolink nlid="nl28" bibid="bib20" firstref="ref44"></nolink> <nolink nlid="nl29" bibid="bib29" firstref="ref45"></nolink> <nolink nlid="nl30" bibid="bib16" firstref="ref46"></nolink> <nolink nlid="nl31" bibid="bib15" firstref="ref47"></nolink> <nolink nlid="nl32" bibid="bib55" firstref="ref50"></nolink> <nolink nlid="nl33" bibid="bib58" firstref="ref52"></nolink> <nolink nlid="nl34" bibid="bib27" firstref="ref53"></nolink> <nolink nlid="nl35" bibid="bib46" firstref="ref54"></nolink> <nolink nlid="nl36" bibid="bib45" firstref="ref56"></nolink> <nolink nlid="nl37" bibid="bib28" firstref="ref59"></nolink> <nolink nlid="nl38" bibid="bib42" firstref="ref65"></nolink> <nolink nlid="nl39" bibid="bib56" firstref="ref66"></nolink> <nolink nlid="nl40" bibid="bib60" firstref="ref67"></nolink> <nolink nlid="nl41" bibid="bib30" firstref="ref69"></nolink> <nolink nlid="nl42" bibid="bib57" firstref="ref71"></nolink> <nolink nlid="nl43" bibid="bib53" firstref="ref73"></nolink> <nolink nlid="nl44" bibid="bib39" firstref="ref74"></nolink> <nolink nlid="nl45" bibid="bib35" firstref="ref75"></nolink> <nolink nlid="nl46" bibid="bib10" firstref="ref77"></nolink> <nolink nlid="nl47" bibid="bib12" firstref="ref78"></nolink> <nolink nlid="nl48" bibid="bib36" firstref="ref91"></nolink> <nolink nlid="nl49" bibid="bib48" firstref="ref93"></nolink> <nolink nlid="nl50" bibid="bib51" firstref="ref94"></nolink> <nolink nlid="nl51" bibid="bib47" firstref="ref95"></nolink> <nolink nlid="nl52" bibid="bib59" firstref="ref97"></nolink> <nolink nlid="nl53" bibid="bib41" firstref="ref99"></nolink> <nolink nlid="nl54" bibid="bib18" firstref="ref101"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1489398 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Clinical Correlates of Errors in Machine-Learning Diagnostic Model of Autism Spectrum Disorder: Impact of Sample Cohorts – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Yen-Chin+Wang%22">Yen-Chin Wang</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-3420-5042">0000-0002-3420-5042</externalLink>)<br /><searchLink fieldCode="AR" term="%22Chung-Yuan+Cheng%22">Chung-Yuan Cheng</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-1931-458X">0000-0003-1931-458X</externalLink>)<br /><searchLink fieldCode="AR" term="%22Chi-Shin+Wu%22">Chi-Shin Wu</searchLink><br /><searchLink fieldCode="AR" term="%22Chi-Chun+Lee%22">Chi-Chun Lee</searchLink><br /><searchLink fieldCode="AR" term="%22Susan+Shur-Fen+Gau%22">Susan Shur-Fen Gau</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-2718-8221">0000-0002-2718-8221</externalLink>) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Autism%3A+The+International+Journal+of+Research+and+Practice%22"><i>Autism: The International Journal of Research and Practice</i></searchLink>. 2025 29(12):3083-3099. – Name: Avail Label: Availability Group: Avail Data: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 17 – Name: DatePubCY Label: Publication Date Group: Date Data: 2025 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Autism+Spectrum+Disorders%22">Autism Spectrum Disorders</searchLink><br /><searchLink fieldCode="DE" term="%22Clinical+Diagnosis%22">Clinical Diagnosis</searchLink><br /><searchLink fieldCode="DE" term="%22Error+Patterns%22">Error Patterns</searchLink><br /><searchLink fieldCode="DE" term="%22Models%22">Models</searchLink><br /><searchLink fieldCode="DE" term="%22Classification%22">Classification</searchLink><br /><searchLink fieldCode="DE" term="%22Sex%22">Sex</searchLink><br /><searchLink fieldCode="DE" term="%22Age%22">Age</searchLink><br /><searchLink fieldCode="DE" term="%22Intelligence+Quotient%22">Intelligence Quotient</searchLink><br /><searchLink fieldCode="DE" term="%22Symptoms+%28Individual+Disorders%29%22">Symptoms (Individual Disorders)</searchLink><br /><searchLink fieldCode="DE" term="%22Mental+Disorders%22">Mental Disorders</searchLink><br /><searchLink fieldCode="DE" term="%22Behavior+Problems%22">Behavior Problems</searchLink><br /><searchLink fieldCode="DE" term="%22Attention+Deficit+Hyperactivity+Disorder%22">Attention Deficit Hyperactivity Disorder</searchLink><br /><searchLink fieldCode="DE" term="%22Aggression%22">Aggression</searchLink><br /><searchLink fieldCode="DE" term="%22Attention%22">Attention</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22Diagnostic+Tests%22">Diagnostic Tests</searchLink> – Name: Subject Label: Geographic Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Taiwan%22">Taiwan</searchLink> – Name: SubjectThesaurus Label: Assessment and Survey Identifiers Group: Su Data: <searchLink fieldCode="SU" term="%22Social+Responsiveness+Scale%22">Social Responsiveness Scale</searchLink><br /><searchLink fieldCode="SU" term="%22Child+Behavior+Checklist%22">Child Behavior Checklist</searchLink><br /><searchLink fieldCode="SU" term="%22Autism+Diagnostic+Observation+Schedule%22">Autism Diagnostic Observation Schedule</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1177/13623613251360271 – Name: ISSN Label: ISSN Group: ISSN Data: 1362-3613<br />1461-7005 – Name: Abstract Label: Abstract Group: Ab Data: Machine-learning models can assist in diagnosing autism but have biases. We examines the correlates of misclassifications and how training data affect model generalizability. The Social Responsive Scale data were collected from two cohorts in Taiwan: the clinical cohort comprised 1203 autistic participants and 1182 non-autistic comparisons, and the community cohort consisted of 35 autistic participants and 3297 non-autistic comparisons. Classification models were trained, and the misclassification cases were investigated regarding their associations with sex, age, intelligence quotient (IQ), symptoms from the child behavioral checklist (CBCL), and co-occurring psychiatric diagnosis. Models showed high within-cohort accuracy (clinical: sensitivity 0.91-0.95, specificity 0.93-0.94; community: sensitivity 0.91-1.00, specificity 0.89-0.96), but generalizability across cohorts was limited. When the community-trained model was applied to the clinical cohort, performance declined (sensitivity 0.65, specificity 0.95). In both models, non-autistic individuals misclassified as autistic showed elevated behavioral symptoms and attention-deficit hyperactivity disorder (ADHD) prevalence. Conversely, autistic individuals who were misclassified tended to show fewer behavioral symptoms and, in the community model, higher IQ and aggressive behavior but less social and attention problems. Error patterns of machine-learning model and the impact of training data warrant careful consideration in future research. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2025 – Name: AN Label: Accession Number Group: ID Data: EJ1489398
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1489398
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1177/13623613251360271 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 17 StartPage: 3083 Subjects: – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Autism Spectrum Disorders Type: general – SubjectFull: Clinical Diagnosis Type: general – SubjectFull: Error Patterns Type: general – SubjectFull: Models Type: general – SubjectFull: Classification Type: general – SubjectFull: Sex Type: general – SubjectFull: Age Type: general – SubjectFull: Intelligence Quotient Type: general – SubjectFull: Symptoms (Individual Disorders) Type: general – SubjectFull: Mental Disorders Type: general – SubjectFull: Behavior Problems Type: general – SubjectFull: Attention Deficit Hyperactivity Disorder Type: general – SubjectFull: Aggression Type: general – SubjectFull: Attention Type: general – SubjectFull: Foreign Countries Type: general – SubjectFull: Diagnostic Tests Type: general – SubjectFull: Taiwan Type: general – SubjectFull: Social Responsiveness Scale Type: general – SubjectFull: Child Behavior Checklist Type: general – SubjectFull: Autism Diagnostic Observation Schedule Type: general Titles: – TitleFull: Clinical Correlates of Errors in Machine-Learning Diagnostic Model of Autism Spectrum Disorder: Impact of Sample Cohorts Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Yen-Chin Wang – PersonEntity: Name: NameFull: Chung-Yuan Cheng – PersonEntity: Name: NameFull: Chi-Shin Wu – PersonEntity: Name: NameFull: Chi-Chun Lee – PersonEntity: Name: NameFull: Susan Shur-Fen Gau IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 12 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 1362-3613 – Type: issn-electronic Value: 1461-7005 Numbering: – Type: volume Value: 29 – Type: issue Value: 12 Titles: – TitleFull: Autism: The International Journal of Research and Practice Type: main
ResultId	1