Reliability of the Commonly Used and Newly-Developed Autism Measures

Saved in:
Bibliographic Details
Title: Reliability of the Commonly Used and Newly-Developed Autism Measures
Language: English
Authors: Thomas W. Frazier (ORCID 0000-0002-6951-2667), Andrew J. O. Whitehouse, Susan R. Leekam, Sarah J. Carrington, Gail A. Alvares, David W. Evans, Antonio Y. Hardan, Mirko Uljarevic
Source: Journal of Autism and Developmental Disorders. 2024 54(6):2158-2169.
Availability: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 12
Publication Date: 2024
Document Type: Journal Articles
Reports - Research
Descriptors: Test Reliability, Item Response Theory, Autism Spectrum Disorders, Clinical Diagnosis, Parents, Symptoms (Individual Disorders), Diagnostic Tests
Assessment and Survey Identifiers: Autism Diagnostic Observation Schedule
DOI: 10.1007/s10803-023-05967-y
ISSN: 0162-3257
1573-3432
Abstract: Purpose: The aim of the present study was to compare scale and conditional reliability derived from item response theory analyses among the most commonly used, as well as several newly developed, observation, interview, and parent-report autism instruments. Methods: When available, data sets were combined to facilitate large sample evaluation. Scale reliability (internal consistency, average corrected item-total correlations, and model reliability) and conditional reliability estimates were computed for total scores and for measure subscales. Results: Generally good to excellent scale reliability was observed for total scores for all measures, scale reliability was weaker for RRB subscales of the ADOS and ADI-R, reflecting the relatively small number of items for these measures. For diagnostic measures, conditional reliability tended to be very good (> 0.80) in the regions of the latent trait where ASD and non-ASD developmental disability cases would be differentiated. For parent-report scales, conditional reliability of total scores tended to be excellent (> 0.90) across very wide ranges of autism symptom levels, with a few notable exceptions. Conclusions: These findings support the use of all of the clinical observation, interview, and parent-report autism symptom measures examined, but also suggest specific limitations that warrant consideration when choosing measures for specific clinical or research applications.
Abstractor: As Provided
Entry Date: 2024
Accession Number: EJ1426531
Database: ERIC
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwEzwb1zEEgPv-mb8ilgzaKXAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDOi6RHbL23unerqWVAIBEICBmh8BsfYx-OvHUw276xMPXmrehh0Y8NHZXFWZbVkqzA0sS7ECsrAcrRGxOiAd8ieykA_92ozQpUJr9nO7PZVt77wPNluyPsy7VevHzRyHWBaljhTiL4GsPzifMzLKNUNmADDNj29nBcrWdVMsev6KCkCzBSRMcHUz2lPBKUjAZzaQWUM048-Sil8VoqO7DqAVGdO9yveg0zclHk8=
Text:
  Availability: 1
  Value: <anid>AN0177598717;aut01jun.24;2024Jun04.06:54;v2.2.500</anid> <title id="AN0177598717-1">Reliability of the Commonly Used and Newly-Developed Autism Measures </title> <p>Purpose: The aim of the present study was to compare scale and conditional reliability derived from item response theory analyses among the most commonly used, as well as several newly developed, observation, interview, and parent-report autism instruments. Methods: When available, data sets were combined to facilitate large sample evaluation. Scale reliability (internal consistency, average corrected item-total correlations, and model reliability) and conditional reliability estimates were computed for total scores and for measure subscales. Results: Generally good to excellent scale reliability was observed for total scores for all measures, scale reliability was weaker for RRB subscales of the ADOS and ADI-R, reflecting the relatively small number of items for these measures. For diagnostic measures, conditional reliability tended to be very good (> 0.80) in the regions of the latent trait where ASD and non-ASD developmental disability cases would be differentiated. For parent-report scales, conditional reliability of total scores tended to be excellent (> 0.90) across very wide ranges of autism symptom levels, with a few notable exceptions. Conclusions: These findings support the use of all of the clinical observation, interview, and parent-report autism symptom measures examined, but also suggest specific limitations that warrant consideration when choosing measures for specific clinical or research applications.</p> <p>Keywords: Autism; Reliability; Item response theory; Observation; Interview; Questionnaire</p> <p>Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</p> <p>Comprehensive and sensitive capture of autism symptoms is crucial for screening, diagnosis, and longitudinal monitoring, including tracking symptom change with development and during psychosocial or medical interventions (Charman & Gotham, [<reflink idref="bib9" id="ref1">9</reflink>]). A number of childhood autism measures have been developed over the last three decades, including observational, clinical interview, and informant(parent)-report measures (Lord et al., [<reflink idref="bib37" id="ref2">37</reflink>]). Although many of these were initially developed for the screening or diagnostic context, most have since been used to monitor change over time in autism symptom presentation. Careful evaluation of each measure's psychometric properties is crucial for deciding their utility and appropriateness for specific applications, particularly in situations where it is not feasible to include multiple autism symptom measures due to limited time or resources.</p> <p>A number of psychometric properties are relevant to instrument development, selection, and use (Boateng et al., [<reflink idref="bib4" id="ref3">4</reflink>]). Reliability is a particularly important characteristic to consider as measurement error constrains validity. More precisely, the maximum validity coefficient is a function of the square root of the reliability coefficients of the variables being examined (Streiner & Norman, [<reflink idref="bib58" id="ref4">58</reflink>]). Thus, lower reliability reduces power and can inhibit the ability to detect significant relationships in research (Leon et al., [<reflink idref="bib35" id="ref5">35</reflink>]). Instrument reliability is particularly important to understand prior to clinical use, because the error around an individual score can greatly influence interpretation (Streiner & Norman, [<reflink idref="bib58" id="ref6">58</reflink>]). While reliability is dependent on context, in general, measures with higher reliability (lower measurement error) in diverse samples are likely to produce more precise estimates of autism symptom level in most clinical and research contexts and are thus preferable, especially for making clinical decisions that have important implications for individuals and their families.</p> <p>Internal consistency and model reliability of autism measures have been evaluated relatively frequently, including in US (Frazier et al., [<reflink idref="bib15" id="ref7">15</reflink>]; Taylor et al., [<reflink idref="bib59" id="ref8">59</reflink>]) and international samples (Murray et al., [<reflink idref="bib43" id="ref9">43</reflink>]; Nguyen et al., [<reflink idref="bib45" id="ref10">45</reflink>]). Although it is encouraging that conditional reliability derived from item response theory analyses has gained more attention in the last five years, it remains underutilized (Janvier et al., [<reflink idref="bib25" id="ref11">25</reflink>]; Taylor et al., [<reflink idref="bib59" id="ref12">59</reflink>]). Conditional reliability is particularly important to examine because it permits an evaluation of measurement precision across different score ranges, including ranges most relevant for clinical assessment (moderately high to very high symptom scores) and score ranges relevant to intervention settings (average to high scores). In addition, the majority of previous studies that have focused on the reliability of autism measures have relied on samples of small to moderate size.</p> <p>To address the limited view of reliability of the autism symptom literature, the present study aimed to evaluate the scale and conditional reliability across existing autism symptom measures. This investigation focused on observation, interview, and informant(parent)-report measures and aimed to include both the most widely-used and well-established measures and newly-developed autism symptom measures where data was accessible, either through publicly available datasets and data sharing initiatives or the internal collaborative networks. We hypothesized that there would be significant differences in scale and conditional reliability across measures, with newly-developed measures yielding higher reliability estimates than existing commonly-used measures, given that they were developed and validated based on modern measurement development frameworks. We further expected that informant-report measures would show higher reliability than observation or interview measures given that informant-report instruments are easier to collect, can rely on parent knowledge of the child – leading to larger item pools per instrument, – and have tended to focus on poly-ordinal (Likert) scales, which often yield higher reliability (Simms et al., [<reflink idref="bib55" id="ref13">55</reflink>]).</p> <hd id="AN0177598717-2">Methods</hd> <p></p> <hd id="AN0177598717-3">Data Sets</hd> <p>For each measure, available data sets were obtained from the following sources: Simons Simplex Collection (SSC) (Fischbach & Lord, [<reflink idref="bib14" id="ref14">14</reflink>]), the Autism Genetic Resource Exchange (AGRE) (Geschwind et al., [<reflink idref="bib18" id="ref15">18</reflink>]), National Database for Autism Research (NDAR) (Hall et al., [<reflink idref="bib20" id="ref16">20</reflink>]), DISCO clinic data (Leekam et al., [<reflink idref="bib30" id="ref17">30</reflink>], [<reflink idref="bib33" id="ref18">33</reflink>], [<reflink idref="bib31" id="ref19">31</reflink>]; S. R. Leekam et al., [<reflink idref="bib31" id="ref20">31</reflink>]; Wing et al., [<reflink idref="bib68" id="ref21">68</reflink>]), Social Responsiveness Scale normative data (SRS Norm) (Constantino & Gruber, [<reflink idref="bib10" id="ref22">10</reflink>]), Healthy Brain Network (HBN) (Alexander et al., [<reflink idref="bib1" id="ref23">1</reflink>]), ASDQ Norms (Frazier et al., [<reflink idref="bib15" id="ref24">15</reflink>]), SSDS Norms (Phillips et al., [<reflink idref="bib48" id="ref25">48</reflink>]), CRI-R Norms (Evans et al., [<reflink idref="bib13" id="ref26">13</reflink>]), and DARB Norms (Uljarević et al., [<reflink idref="bib62" id="ref27">62</reflink>]). When measures were present across datasets, samples were combined to create aggregate datasets while removing (where known) duplicate cases. Specifically, three samples were combined to create the Autism Diagnostic Observation Schedule (ADOS) dataset (SSC, AGRE, NDAR), five samples were combined to create the Social Responsiveness Scale (SRS) dataset (HBN, SSC, NDAR, SRS Norm, and AGRE), three samples were combined to create the Social Communication Questionnaire (SCQ) and Repetitive Behavior Scale – Revised (RBS-R) datasets (HBN, NDAR, and SSC). Information on the cohorts included for each measure is included in Supplemental Table 1.</p> <p>Table 1 Total scale reliability estimates for autism diagnostic and symptom measures</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left" /><th align="left"><p>Number of Items</p></th><th align="left"><p>Internal Consistency</p><p>Cronbach's α</p><p>(95% CI)</p></th><th align="left"><p>Average Corrected</p><p>Item-Total r</p></th><th align="left"><p>Model Reliability</p><p>McDonald's ω</p></th><th align="left"><p>IRT Theta Range</p><p>(reliability ≥ 0.70)</p></th></tr></thead><tbody><tr><td align="left"><p><bold>Diagnostic Observation</bold></p></td><td align="left" /><td align="left" /><td align="left" /><td align="left" /><td align="left" /></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("few to no words")</p></td><td char="." align="char"><p>14</p></td><td char="?" align="char"><p>0.83 (0.81-0.85)</p></td><td align="left"><p>0.50</p></td><td align="left"><p>0.83</p></td><td align="left"><p>-3.2 to + 1.1</p></td></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("some words")</p></td><td char="." align="char"><p>14</p></td><td char="?" align="char"><p>0.92 (0.91-0.92)</p></td><td align="left"><p>0.67</p></td><td align="left"><p>0.93</p></td><td align="left"><p>-2.0 to + 2.2</p></td></tr><tr><td align="left"><p>ADOS – Module 2</p></td><td char="." align="char"><p>14</p></td><td char="?" align="char"><p>0.87 (0.86-0.88)</p></td><td align="left"><p>0.56</p></td><td align="left"><p>0.88</p></td><td align="left"><p>-2.1 to + 3.4</p></td></tr><tr><td align="left"><p>ADOS – Module 3</p></td><td char="." align="char"><p>14</p></td><td char="?" align="char"><p>0.84 (0.83-0.85)</p></td><td align="left"><p>0.51</p></td><td align="left"><p>0.85</p></td><td align="left"><p>-2.1 to + 4.5</p></td></tr><tr><td align="left"><p>ADOS – Module 4</p></td><td char="." align="char"><p>15</p></td><td char="?" align="char"><p>0.90 (0.89-0.91)</p></td><td align="left"><p>0.60</p></td><td align="left"><p>0.91</p></td><td align="left"><p>-1.5 to + 2.9</p></td></tr><tr><td align="left"><p><bold>Diagnostic Interview</bold></p></td><td align="left" /><td align="left" /><td align="left" /><td align="left" /><td align="left" /></tr><tr><td align="left"><p>ADI-R</p></td><td char="." align="char"><p>42</p></td><td char="?" align="char"><p>0.92 (0.91-0.92)</p></td><td align="left"><p>0.48</p></td><td align="left"><p>0.91</p></td><td align="left"><p>-2.2 to + 1.6</p></td></tr><tr><td align="left"><p>DISCO</p></td><td char="." align="char"><p>48</p></td><td char="?" align="char"><p>0.85 (0.83-0.88)</p></td><td align="left"><p>0.35</p></td><td align="left"><p>0.83</p></td><td align="left"><p>-2.8 to + 3.2</p></td></tr><tr><td align="left"><p><bold>Parent-Report</bold></p></td><td align="left" /><td align="left" /><td align="left" /><td align="left" /><td align="left" /></tr><tr><td align="left"><p>SRS</p></td><td char="." align="char"><p>65</p></td><td char="?" align="char"><p>0.97 (0.96-0.97)</p></td><td align="left"><p>0.59</p></td><td align="left"><p>0.97</p></td><td align="left"><p>-3.0 to + 4.5</p></td></tr><tr><td align="left"><p>SCQ</p></td><td char="." align="char"><p>39</p></td><td char="?" align="char"><p>0.90 (0.90-0.91)</p></td><td align="left"><p>0.43</p></td><td align="left"><p>0.90</p></td><td align="left"><p>-1.4 to + 3.0</p></td></tr><tr><td align="left"><p>ASDQ</p></td><td char="." align="char"><p>39</p></td><td char="?" align="char"><p>0.95 (0.94-0.95)</p></td><td align="left"><p>0.56</p></td><td align="left"><p>0.97</p></td><td align="left"><p>-2.8 to + 5.6</p></td></tr></tbody></table> </ephtml> </p> <p>Note. ADOS module 1 "few to no words" N = 1299; ADOS module 1 "some words" N = 2481; ADOS module 2 N = 1620; ADOS module 3 N = 4932; ADOS module 4 N = 1706; ADI-R N = 1929, DISCO N = 272, SRS N = 16,755; SCQ N = 6214; ASDQ N = 1467; RBS-R N = 5299; CRI-R N = 3031</p> <hd id="AN0177598717-4">Measures</hd> <p></p> <hd id="AN0177598717-5">Autism Diagnostic Observation Schedule (ADOS)</hd> <p>The ADOS/ADOS-2 (first and second editions) is a clinician-observation measure of autism symptoms (Lord et al., [<reflink idref="bib39" id="ref28">39</reflink>], [<reflink idref="bib40" id="ref29">40</reflink>]). The measure includes five modules (toddler and modules 1–4) that are administered dependent on age and speech/language status. For the present study, only data from modules 1–4 were available. Each module was analyzed separately using only items included in the respective ADOS-2 algorithm scores. Although not typically interpreted, SCI (social affect) and RRB subscales were also scored and analyzed to independently evaluate reliability for these domains and for comparison to other measures where SCI and RRB domain scores are computed.</p> <hd id="AN0177598717-6">Autism Diagnostic Interview-Revised (ADI-R)</hd> <p>The ADI-R (Lord et al., [<reflink idref="bib41" id="ref30">41</reflink>]) is a standardized, semi-structured clinical interview for caregivers of children and adults. For the present study, item mapping to DSM-5 was used to identify total and SCI and RRB subscales (Huerta et al., [<reflink idref="bib23" id="ref31">23</reflink>]). For the total and subscales, item scores of 3 were recoded to 2 to be consistent with instrument scoring.</p> <hd id="AN0177598717-7">Diagnostic Interview for Social and Communication Disorders (DISCO)</hd> <p>The DISCO is a 320-item semi-structured interview used by clinicians to elicit information from caregivers about the individual's profile of development and behavior. Across different samples, it has been shown to have good sensitivity and specificity (Carrington et al., [<reflink idref="bib8" id="ref32">8</reflink>], [<reflink idref="bib6" id="ref33">6</reflink>]; Kent et al., [<reflink idref="bib27" id="ref34">27</reflink>]; Maljaars et al., [<reflink idref="bib42" id="ref35">42</reflink>]), and interrater reliability (κ ≥ 0.7) (Wing et al., [<reflink idref="bib68" id="ref36">68</reflink>]) and criterion validity (Leekam et al., [<reflink idref="bib32" id="ref37">32</reflink>]; Maljaars et al., [<reflink idref="bib42" id="ref38">42</reflink>]; Nygren et al., [<reflink idref="bib47" id="ref39">47</reflink>]). It has a DSM-5 algorithm item set (Kent et al., [<reflink idref="bib27" id="ref40">27</reflink>]), also published in abbreviated form (Carrington et al., [<reflink idref="bib8" id="ref41">8</reflink>], [<reflink idref="bib7" id="ref42">7</reflink>]). The abbreviated item set (48 items) was used for this analysis.</p> <hd id="AN0177598717-8">Social Responsiveness Scale (SRS)</hd> <p>The SRS/SRS-2 (first and second editions) is 65-item, parent-report, ordinally-scaled (1= "not true" to 4= "almost always true") quantitative assessment of the severity of autism traits. It is one of the most frequently used quantitative measures of autism symptoms (Constantino & Gruber, [<reflink idref="bib10" id="ref43">10</reflink>]). SCI and RRB subscales were derived from SRS-2 scoring.</p> <hd id="AN0177598717-9">Social Communication Questionnaire (SCQ)</hd> <p>The lifetime version of the SCQ is a parent-report dichotomously-keyed (yes/no) rating scale that consists of 40 questions many of which tap DSM-IV-TR symptom domains (Rutter et al., [<reflink idref="bib52" id="ref44">52</reflink>]). Lifetime ratings reference the child's behavior throughout their developmental history, increasing diagnostic validity (Lord et al., [<reflink idref="bib38" id="ref45">38</reflink>]). Items 2–39 were summed for the total score and SCI and RRB subscales were determined by the authors using item content and based on prior factor analyses (Uljarevic et al., [<reflink idref="bib64" id="ref46">64</reflink>]; Uljarević et al., [<reflink idref="bib66" id="ref47">66</reflink>]).</p> <hd id="AN0177598717-10">Autism Symptom Dimensions Questionnaire (ASDQ)</hd> <p>The ASDQ is a newly-created 39-item autism symptom measure informed by DSM-5 criteria and recent factor analyses of autism symptom data, with input from informant caregivers and autism clinicians (Frazier et al., [<reflink idref="bib15" id="ref48">15</reflink>]). Items are rated using a 5-point Likert scale (1 = Never, 2 = Rarely, 3 = Sometimes, 4 = Often, 5 = Very Often). The SCI and RRB subscales include 17 and 18 items, respectively.</p> <hd id="AN0177598717-11">Stanford Social Dimensions Scale (SSDS)</hd> <p>The SSDS is a 58-item dimensional measure designed to provide parental perspective on their child's social abilities (Phillips et al., [<reflink idref="bib48" id="ref49">48</reflink>]). Factor analyses have suggested a five-factor solution with factors interpreted as Social Motivation (SM), Social affiliation (SA), Expressive Social Communication (ESC), Social Recognition (SR), and Unusual Approach (UA). Each of these factors is positively correlated and therefore, items were treated as a single set for evaluation of the SCI domain.</p> <hd id="AN0177598717-12">Repetitive Behavior Scale – Revised (RBS-R)</hd> <p>The RBS-R is a 43-item parent-report rating scale for measuring the presence and severity of a variety of forms of restricted, repetitive behavior that are characteristic of individuals with ASD (Lam & Aman, [<reflink idref="bib29" id="ref50">29</reflink>]). The RBS-R consists of 6 subscales: stereotyped behavior, self-injurious behavior, compulsive behavior, routine behavior, sameness behavior, and restricted behavior. For the present study, items from all subscales except self-injurious behavior were analyzed as a single scale to evaluate measurement of the RRB domain.</p> <hd id="AN0177598717-13">Repetitive Behavior Questionnaire – 2 (RBQ-2)</hd> <p>The RBQ-2 is a 20-item parent-report questionnaire. Factor analytic studies across ASD (Barrett et al., [<reflink idref="bib3" id="ref51">3</reflink>]; Lidstone et al., [<reflink idref="bib36" id="ref52">36</reflink>]) and normative development (Leekam et al., [<reflink idref="bib31" id="ref53">31</reflink>]; Uljarevic et al., [<reflink idref="bib61" id="ref54">61</reflink>]) have suggested that the RBQ-2 has a stable two-factor structure encompassing repetitive sensory-motor and insistence on sameness factors. For the present study, all items were analyzed as a single scale to evaluate the RRB domain.</p> <hd id="AN0177598717-14">Childhood Routines Inventory – Revised (CRI-R)</hd> <p>The CRI-R is a 62-item, parent-report measure rated on a five-point Likert scale (Evans et al., [<reflink idref="bib13" id="ref55">13</reflink>]). Items evaluate stereotypies, tics, compulsions, habits and routines, rigidity, insistence on sameness, and sensory sensitivities. While the instrument assesses two broad domains of RRB, these domains are highly correlated. For the present study, items were treated as a single scale for evaluation of measurement precision for the RRB domain.</p> <hd id="AN0177598717-15">Dimensional Assessment for Restricted and Repetitive Behaviors (DARB)</hd> <p>The DARB is a new measure of RRB that was developed and refined through the iterative series of steps described by the PROMIS framework. Concepts guiding item development included (i) good coverage of the full range of symptom severity and presentations, (ii) applicability across the cognitive functioning range, and (iii) applicability across the lifespan. Measure development was informed by recent factor analyses of RRB symptoms and the final measure includes 98 items rated using a 5-point Likert scale. For the present study, only 96 items with significant loadings on 7 of the 8 subscales were combined into a single RRB scale; self-injury items were excluded.</p> <hd id="AN0177598717-16">Statistical Analyses</hd> <p>Descriptive statistics were computed separately for ASD, developmental delay (DD), and neurotypical (NT) groups across demographic (age, sex) and clinical factors (IQ), when available, to characterize the combined cohorts for each measure.</p> <hd id="AN0177598717-17">Missing Data Handling</hd> <p>Missing data was modest across measures (0-5.7%; Supplement 1). Thus, for each measure with missing data, five imputed datasets were generated using fully conditional Monte Carlo Markov Chain specification with 10 iterations. Classical test theory reliability analyses were computed on the original and each imputed dataset. In each case, deviations across datasets were very small (< 0.01). Mean values across imputed datasets are presented. Item response theory analyses were computed with the original data assuming that data were missing at random.</p> <hd id="AN0177598717-18">Classical Test Theory (CTT) - Scale Reliability</hd> <p>To evaluate measurement precision for each measure, CTT reliability coefficients (internal consistency and average corrected item-total correlations) (Streiner & Norman, [<reflink idref="bib57" id="ref56">57</reflink>]) and model reliability (MacDonald's coefficient ω) (Revelle & Condon, [<reflink idref="bib50" id="ref57">50</reflink>]) were computed using all items as inputs (total scales), only SCI items as inputs (SCI scales), and only RRB items as inputs (RRB scales). Model reliability was computed using factor loadings from a single factor confirmatory factor analysis using an SPSS macro (Hayes & Coutts, [<reflink idref="bib22" id="ref58">22</reflink>]). Confidence intervals (95%) were also calculated for internal consistency reliability estimates. Internal consistency and model reliability estimates falling in the ranges < 0.70, 0.70 to 0.79, 0.80 to 0.89, and > 0.90 were considered poor, fair, good, and excellent (Nunnally & Bernstein, [<reflink idref="bib46" id="ref59">46</reflink>]). Average corrected item-total correlations > = 0.30 were considered at least adequate (Streiner & Norman, [<reflink idref="bib57" id="ref60">57</reflink>]). To evaluate the association between original publication year and CTT reliability, bivariate non-parametric Spearman's rho correlations were computed between internal consistency reliability coefficients transformed to Fisher's z and publication year.</p> <hd id="AN0177598717-19">Item Response Theory (IRT) - Conditional Reliability</hd> <p>IRT analyses (Embretson & Reise, [<reflink idref="bib12" id="ref61">12</reflink>]; Hambleton et al., [<reflink idref="bib21" id="ref62">21</reflink>]; Reise et al., [<reflink idref="bib49" id="ref63">49</reflink>]) were conducted using maximum likelihood estimation with robust standard errors and a logit link with the single factor mean and variance fixed to 0 and 1, respectively. Principal components analyses were first conducted to ensure that each measure had a large first principal component indicating that a substantial proportion of the variance in items scores reflected a general dimension, consistent with scoring. After checking dimensionality, unifactorial IRT analyses were completed for each measure total and subscale score. Scale information estimates were converted to conditional reliability using the formula: reliability = 1 – [1/Information(theta)] (Thissen, [<reflink idref="bib60" id="ref64">60</reflink>]) from theta= -6 to + 6.</p> <p>Comparisons between measures within each category (clinical observation, parent interview, parent-report) were conducted using repeated measures analysis of variance with conditional reliability coefficients (after conversion to Fisher's z) as the dependent variable and specific measure (e.g., ADOS module 1 vs. module 2, etc.) as the independent variable. Comparisons across categories were computed using repeated measures analysis of variance by first averaging conditional reliability estimates across different measures within each category. This analysis examines whether observational, interview, or parent-report total scores show different levels of conditional reliability.</p> <hd id="AN0177598717-20">Statistical Power</hd> <p>Scale and conditional reliability analyses are considered over-powered given the large sample sizes for each measure. The only exception is for the newly-developed SSDS, where the sample is much smaller (N = 170), but still adequate as an initial evaluation of reliability. Repeated measures comparisons of conditional reliability estimates were expected to have at least adequate power (≥ 0.80) to detect a small-to-medium effect size or larger (d ≥ 0.36), assuming 61 observations along the information curve (from theta − 6 to + 6) for a two-measure comparison (α = 0.05, two-tailed).</p> <p>Data preparation, descriptive analyses, internal consistency reliability, corrected item-total correlations used SPSS v28 (IBM Corp, [<reflink idref="bib24" id="ref65">24</reflink>]). Item response theory analyses were computed in MPlus version 8.5 (Muthén & Muthén, [<reflink idref="bib44" id="ref66">44</reflink>]).</p> <hd id="AN0177598717-21">Results</hd> <p></p> <hd id="AN0177598717-22">Measure Cohorts</hd> <p>Relevant to the present analyses, sample composition varied widely, in terms of the proportion of ASD, non-ASD DD, and neurotypical participants (Supplement 1). However, all samples contained the full range of scores on all items, supporting the inclusion of a wide range of symptom levels regardless of specific diagnostic composition. Age and sex also varied widely, but was consistent with generally younger ages, often present in diagnostic clinic samples, and with a high proportion of males often seen in ASD-diagnosed samples. When available, average sample IQs for autistic participants ranged from very low (SS = 50) to average (SS = 102), with generally average sample average IQs in the DD and neurotypical groups (Supplement 1). Missing data rates were low (0–5.7%), indicating that selective data attrition is not likely to influence results.</p> <hd id="AN0177598717-23">Clinical Observation Measures</hd> <p>Total scale reliability fell in the good to excellent range across ADOS modules (Table 1). Conditional reliability was generally adequate (> 0.70) in the middle of the score range (theta − 1.5 to + 1.1) for most ADOS modules (Fig. 1), with a shift toward better conditional reliability at lower scores for module 1 – "few to no words" and a shift upward toward better conditional reliability at higher scores for module 4. These deviations may reflect the inclusion of "easier" items for module 1 and "harder" items for module 4 or may reflect slight differences in the population as module 1 is often administered to young children with early developing language levels and module 1 – "few to no words" had the lowest number of neurotypical cases in that cohort. Comparisons across modules indicated stronger conditional reliability for modules 2 and 3 relative to modules 1 (both forms) and 4 [F(<reflink idref="bib4" id="ref67">4</reflink>, 240) = 7.67, p <.001].</p> <p>Graph: Fig. 1 Conditional reliability for total scores across ADOS modules. Note: Module 1 – "Few to no words" is shifted left relative to other modules. This could in part reflect the fact that 93% of the children administered this module met criteria for ASD (n = 1209). For the other ADOS modules there was a better balance of ASD versus non-ASD/TD cases (% ASD module 1 "some words" = 73%; module 2 = 81%; module 3 = 85%; module 4 = 59%)</p> <p>Scale reliability remained good to excellent for ADOS SCI scales across modules (Table 2). However, scale reliability was poor for ADOS RRB scales, consistent with the small number of items (4–5 per module). Similarly, conditional reliability for ADOS SCI scales was good to excellent and only slightly weaker than for total scales (Supplement 2), while conditional reliability was generally inadequate for ADOS RRB scales (Supplement 3), with only fair to good levels in very narrow score ranges, which is not surprising given that the ADOS was not developed to provide an in-depth RRB assessment. Comparisons across modules indicated small but significant differences for SCI [F(<reflink idref="bib4" id="ref68">4</reflink>, 240) = 4.65, p =.001] with higher conditional reliability for modules 2 and 3 relative to other modules. Large significant differences were observed for RRB scales, with the highest conditional reliability observed for module 3 and the lowest for module 1 (both forms) [F(<reflink idref="bib4" id="ref69">4</reflink>, 240) = 15.14, p <.001].</p> <p>Table 2 Social communication / interaction (SCI) and restricted / repetitive behavior (RRB) subscale reliability estimates for autism diagnostic and symptom measures</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left" /><th align="left"><p>Number of Items</p></th><th align="left"><p>Internal Consistency</p><p>Cronbach's α</p><p>(95% CI)</p></th><th align="left"><p>Average Corrected</p><p>Item-Total r</p></th><th align="left"><p>Model Reliability</p><p>McDonald's ω</p></th><th align="left"><p>IRT Theta Range</p><p>(reliability ≥ 0.70)</p></th></tr></thead><tbody><tr><td align="left"><p><bold>Social Communication</bold></p><p><bold>/ Interaction</bold></p></td><td align="left" /><td align="left" /><td align="left" /><td align="left" /><td align="left" /></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("few to no words")</p></td><td char="." align="char"><p>10</p></td><td char="?" align="char"><p>0.84 (0.82-0.86)</p></td><td align="left"><p>0.56</p></td><td align="left"><p>0.84</p></td><td align="left"><p>-3.2 to + 1.0</p></td></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("some words")</p></td><td char="." align="char"><p>10</p></td><td char="?" align="char"><p>0.93 (0.92-0.93)</p></td><td align="left"><p>0.71</p></td><td align="left"><p>0.92</p></td><td align="left"><p>-1.9 to + 2.0</p></td></tr><tr><td align="left"><p>ADOS – Module 2</p></td><td char="." align="char"><p>10</p></td><td char="?" align="char"><p>0.86 (0.85-0.87)</p></td><td align="left"><p>0.59</p></td><td align="left"><p>0.87</p></td><td align="left"><p>-1.9 to + 3.1</p></td></tr><tr><td align="left"><p>ADOS – Module 3</p></td><td char="." align="char"><p>10</p></td><td char="?" align="char"><p>0.86 (0.85-0.87)</p></td><td align="left"><p>0.58</p></td><td align="left"><p>0.87</p></td><td align="left"><p>-1.7 to + 4.2</p></td></tr><tr><td align="left"><p>ADOS – Module 4</p></td><td char="." align="char"><p>10</p></td><td char="?" align="char"><p>0.90 (0.89-0.91)</p></td><td align="left"><p>0.66</p></td><td align="left"><p>0.90</p></td><td align="left"><p>-1.2 to + 2.5</p></td></tr><tr><td align="left"><p>ADI-R</p></td><td char="." align="char"><p>26</p></td><td char="?" align="char"><p>0.94 (0.94-0.95)</p></td><td align="left"><p>0.66</p></td><td align="left"><p>0.95*</p></td><td align="left"><p>-2.1 to + 1.5</p></td></tr><tr><td align="left"><p>DISCO</p></td><td char="." align="char"><p>24</p></td><td char="?" align="char"><p>0.83 (0.80-0.86)</p></td><td align="left"><p>0.45</p></td><td align="left"><p>0.84</p></td><td align="left"><p>-1.8 to + 2.8</p></td></tr><tr><td align="left"><p>SRS</p></td><td char="." align="char"><p>53</p></td><td char="?" align="char"><p>0.96 (0.96-0.97)</p></td><td align="left"><p>0.56</p></td><td align="left"><p>0.96</p></td><td align="left"><p>-3.0 to + 4.6</p></td></tr><tr><td align="left"><p>SCQ</p></td><td char="." align="char"><p>27</p></td><td char="?" align="char"><p>0.90 (0.90-0.91)</p></td><td align="left"><p>0.50</p></td><td align="left"><p>0.91</p></td><td align="left"><p>-1.3 to + 2.4</p></td></tr><tr><td align="left"><p>ASDQ</p></td><td char="." align="char"><p>17</p></td><td char="?" align="char"><p>0.93 (0.92-0.94)</p></td><td align="left"><p>0.65</p></td><td align="left"><p>0.93</p></td><td align="left"><p>-2.0 to + 4.8</p></td></tr><tr><td align="left"><p>SSDS</p></td><td char="." align="char"><p>40</p></td><td char="?" align="char"><p>0.94 (0.92-0.95)</p></td><td align="left"><p>0.52</p></td><td align="left"><p>0.94</p></td><td align="left"><p>-4.9 to + 5.3</p></td></tr><tr><td align="left"><p><bold>Restricted, Repetitive</bold></p><p><bold>Behavior</bold></p></td><td align="left" /><td align="left" /><td align="left" /><td align="left" /><td align="left" /></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("few to no words")</p></td><td char="." align="char"><p>4</p></td><td char="?" align="char"><p>0.56 (0.54-0.58)</p></td><td align="left"><p>0.35</p></td><td align="left"><p>0.56</p></td><td align="left"><p>-1.8 to -0.4</p></td></tr><tr><td align="left"><p>ADOS – Module 1</p><p>("some words")</p></td><td char="." align="char"><p>4</p></td><td char="?" align="char"><p>0.67 (0.66-0.69)</p></td><td align="left"><p>0.45</p></td><td align="left"><p>0.67</p></td><td align="left"><p>-0.9 to + 1.1</p></td></tr><tr><td align="left"><p>ADOS – Module 2</p></td><td char="." align="char"><p>4</p></td><td char="?" align="char"><p>0.69 (0.68-0.70)</p></td><td align="left"><p>0.47</p></td><td align="left"><p>0.69</p></td><td align="left"><p>-0.9 to + 1.3</p></td></tr><tr><td align="left"><p>ADOS – Module 3</p></td><td char="." align="char"><p>4</p></td><td char="?" align="char"><p>0.56 (0.55-0.57)</p></td><td align="left"><p>0.35</p></td><td align="left"><p>0.57</p></td><td align="left"><p>-2.5 to + 3.5</p></td></tr><tr><td align="left"><p>ADOS – Module 4</p></td><td char="." align="char"><p>5</p></td><td char="?" align="char"><p>0.68 (0.67-0.69)</p></td><td align="left"><p>0.48</p></td><td align="left"><p>0.71</p></td><td align="left"><p>-0.4 to + 2.8</p></td></tr><tr><td align="left"><p>ADI-R</p></td><td char="." align="char"><p>16</p></td><td char="?" align="char"><p>0.69 (0.67-0.71)</p></td><td align="left"><p>0.29</p></td><td align="left"><p>0.62</p></td><td align="left"><p>-0.9 to + 2.0</p></td></tr><tr><td align="left"><p>DISCO</p></td><td char="." align="char"><p>24</p></td><td char="?" align="char"><p>0.76 (0.72-0.80)</p></td><td align="left"><p>0.30</p></td><td align="left"><p>0.84</p></td><td align="left"><p>-3.4 to + 2.1</p></td></tr><tr><td align="left"><p>SRS</p></td><td char="." align="char"><p>12</p></td><td char="?" align="char"><p>0.93 (0.92-0.93)</p></td><td align="left"><p>0.69</p></td><td align="left"><p>0.93</p></td><td align="left"><p>-1.6 to + 2.9</p></td></tr><tr><td align="left"><p>SCQ</p></td><td char="." align="char"><p>11</p></td><td char="?" align="char"><p>0.84 (0.84-0.85)</p></td><td align="left"><p>0.53</p></td><td align="left"><p>0.85</p></td><td align="left"><p>-0.5 to + 2.2</p></td></tr><tr><td align="left"><p>ASDQ</p></td><td char="." align="char"><p>18</p></td><td char="?" align="char"><p>0.94 (0.93-0.94)</p></td><td align="left"><p>0.65</p></td><td align="left"><p>0.94</p></td><td align="left"><p>-2.0 to + 4.8</p></td></tr><tr><td align="left"><p>RBS-R*</p></td><td char="." align="char"><p>35</p></td><td char="?" align="char"><p>0.94 (0.94-0.95)</p></td><td align="left"><p>0.56</p></td><td align="left"><p>0.95</p></td><td align="left"><p>-1.7 to + 4.8</p></td></tr><tr><td align="left"><p>DARB</p></td><td char="." align="char"><p>96</p></td><td char="?" align="char"><p>0.96 (0.96-0.97)</p></td><td align="left"><p>0.46</p></td><td align="left"><p>0.96</p></td><td align="left"><p>-3.7 to + 6.0</p></td></tr><tr><td align="left"><p>CRI-R</p></td><td char="." align="char"><p>62</p></td><td char="?" align="char"><p>0.97 (0.97-0.98)</p></td><td align="left"><p>0.60</p></td><td align="left"><p>0.97</p></td><td align="left"><p>-3.1 to + 4.9</p></td></tr><tr><td align="left"><p>RBQ-2</p></td><td char="." align="char"><p>19</p></td><td char="?" align="char"><p>0.89 (0.88-0.89)</p></td><td align="left"><p>0.51</p></td><td align="left"><p>0.89</p></td><td align="left"><p>-2.3 to + 4.2</p></td></tr></tbody></table> </ephtml> </p> <p>Note. Sample sizes were the same as for total scales. *Self-injury items were excluded from RBS-R and DARB calculations. MacDonald's Omega for ADI-R total symptoms and SCI symptoms was computed without items 35 (conversation), 46 (attention to voice), and 66 (social distance) due to estimation difficulties when these items were included</p> <hd id="AN0177598717-24">Parent Interview Measures</hd> <p>Total scale reliability fell in the good to excellent range for both interview measures (Table 1). Conditional reliability was generally adequate (> 0.70) in the middle of the score range (theta − 2.0 to + 1.8; Fig. 2), with better conditional reliability for the DISCO than the ADI-R [F(<reflink idref="bib1" id="ref70">1</reflink>, 60) = 14.35, p <.001]. Subscale reliability was excellent for the ADI-R SCI and good for the DISCO SCI, but RRB subscale values fell in the poor and fair ranges, respectively (Table 2). Conditional reliability for SCI subscales was at least adequate (≥ 0.70) in the average score range (theta − 1.8 to 2.5) with no significant difference between the ADI-R and DISCO [F(<reflink idref="bib1" id="ref71">1</reflink>, 60) = 1.59, p =.212] (Supplement 4). However, for RRB subscales, conditional reliability was substantially better for the DISCO relative to the ADI-R [F(<reflink idref="bib1" id="ref72">1</reflink>, 60) = 58.81, p <.001], with adequate levels extending from extremely low to very high scores (theta − 3.4 to + 2.1) (Supplement 5).</p> <p>Graph: Fig. 2 Conditional reliability for total scale scores across parent-interview measures</p> <hd id="AN0177598717-25">Parent-Report Questionnaire Measures</hd> <p>Scale reliability was excellent (≥ 0.90) for all parent-report total scales (Table 1). While the SRS and ASDQ total scores had excellent conditional reliability coverage from very low to extremely high scores (theta − 2.8 to + 4.5), the SCQ total score only maintained good reliability from low to very high scores (theta − 1.4 to + 3.0) (Fig. 3). Not surprisingly, given differences in content coverage, average conditional reliability varied substantially across instruments [F(<reflink idref="bib2" id="ref73">2</reflink>, 120) = 123.57, p <.001], The SRS (r<subs>xx</subs>=0.86) and ASDQ (r<subs>xx</subs>=0.85; p <.001) did not significantly differ (p =.142), while the SCQ had much lower average conditional reliability (r<subs>xx</subs>=0.64) than both measures (both p <.001).</p> <p>Graph: Fig. 3 Conditional reliability for total scale scores across parent-report questionnaires</p> <p>Parent-report SCI and RRB subscales also had excellent scale reliability for most instruments; the exceptions being for the SCQ-RRB scale and RBQ-2, which fell in the good range (Table 2). Conditional reliability coverage was at least adequate (≥ 0.70) from very low to extremely high (theta − 2 to + 4.6) scores for the SRS, ASDQ, and SSDS SCI subscales (Table 2 and Supplement 6). Similarly, conditional reliability for RRB subscales was at least adequate (≥ 0.70) from low to extremely high (theta − 1.6 to + 2.9) scores for the SRS, ASDQ, RBS-R, CRI-R, DARB, and RBQ-2 (Supplement 7). For both the SCI and RRB scales, the SCQ had weaker conditional reliability coverage.</p> <p>Average conditional reliability varied significantly across the SCI [F(<reflink idref="bib3" id="ref74">3</reflink>, 180) = 118.89, p <.001] and RRB [F(<reflink idref="bib6" id="ref75">6</reflink>, 360) = 163.05, p <.001] subscales. For SCI scales, the SSDS (r<subs>xx</subs>=0.90) had the highest average conditional reliability followed by the SRS (r<subs>xx</subs>=0.85), ASDQ (r<subs>xx</subs>=0.77), and SCQ (r<subs>xx</subs>=0.56). For RRB scales, the DARB (r<subs>xx</subs>=0.91) and CRI (r<subs>xx</subs>=0.88) had the highest conditional reliability followed by the RBS-R (r<subs>xx</subs>=0.78), ASDQ (r<subs>xx</subs>=0.77), and RBQ-2 (r<subs>xx</subs>=0.72). Average conditional reliability estimates for the SRS (r<subs>xx</subs>=0.60) and SCQ (r<subs>xx</subs>=0.41) were weaker.</p> <p>A significant positive correlation was observed between publication year and internal consistency reliability across all instruments (r =.66, p =.007; Supplement 8).</p> <p>Informant-report total scores tended to have higher average conditional reliability (r<subs>xx</subs>=0.80) than interview total scores (r<subs>xx</subs>=0.73; p <.001), which were higher than the average conditional reliability for ADOS observation total scores (r<subs>xx</subs>=0.64; p <.001) [F(<reflink idref="bib2" id="ref76">2</reflink>, 120) = 55.12, p <.001].</p> <hd id="AN0177598717-26">Discussion</hd> <p>High levels of reliability across nearly all total scales and most SCI and RRB subscales indicate that for many applications, the choice of measure should not be driven primarily by reliability, but rather by information source (clinician, parent interview, parent report) as well as by content, construct, and predictive (diagnostic) validity considerations. This is particularly true since the measures evaluated in this study all tend to cover a broad age range but have very different item content, coverage of autism symptom domains, levels of diagnostic differentiation (Sanchez & Constantino, [<reflink idref="bib53" id="ref77">53</reflink>]), and different demographic and clinical factors influencing observed scores (Frazier et al., [<reflink idref="bib17" id="ref78">17</reflink>]).</p> <p>It is crucial to highlight certain exceptions where reliability becomes crucial to consider. For example, the ADOS modules were not designed for symptom monitoring and have only a small number of RRB items. Thus, using ADOS modules to assess change over time is not warranted since they have smaller score ranges with the measurement precision needed to accurately evaluate individual differences and present with limited scale and conditional reliability for the RRB domain. It is, however, important to emphasize that the ADOS was not developed with the intention to provide a comprehensive and dimensional characterization of RRB or to track autism symptom levels across a wide range of trait levels, thus, identified limitations are not surprising and do not detract from the utility of the instrument when used for its primary purpose. Recent development efforts at creating a brief observation instrument for tracking symptoms in toddlers may address this limitation (Grzadzinski et al., [<reflink idref="bib19" id="ref79">19</reflink>]). Similarly, the SCQ was clearly inferior in both scale and conditional reliability to the SRS and ASDQ. The latter two measures had highly similar total scale and conditional reliability estimates, with the SRS showing slightly stronger reliability for SCI and the ASDQ showing stronger reliability for RRB symptoms. The equivalence in total scale reliability is notable given that the SRS is a commercial measure with 65 items and limited symptom domain coverage, while the ASDQ is a 39-item, free, open-access measure with strong coverage of symptom domains consistent with DSM-5 criteria and recent factor analyses as well as good coverage of constructs aligned with dimensional frameworks such as the National Institute of Mental Health's Research Domain Criteria initiative (Frazier & Hardan, [<reflink idref="bib16" id="ref80">16</reflink>]; Uljarevic et al., [<reflink idref="bib63" id="ref81">63</reflink>], [<reflink idref="bib64" id="ref82">64</reflink>], [<reflink idref="bib65" id="ref83">65</reflink>]).</p> <p>In general, newly developed parent-report measures had better reliability than older, widely used measures. However, it is important to highlight that RBQ-2, although developed almost two decades ago, also showed excellent properties. This is not too surprising, given that these measures were designed to have better content coverage of autism symptom domains. This suggests that clinicians and researchers should focus on the intended purpose when choosing a measure. For example, if the purpose is simply diagnostic differentiation or broad coverage of autism symptom level, then existing measures are likely to be more efficient, while if the purpose is more detailed coverage of specific symptom domains to generate more detailed treatment recommendations, newer measures should be considered. In this regard, the ASDQ may present a good balance, as it has strong diagnostic differentiation for a parent-report instrument, includes good SCI and RRB coverage, is free, has only 39 items, and assesses well-replicated autism symptom sub-domains. For applications where a detailed assessment of SCI or RRB domains is required, the SSDS, RBS-R, and DARB are the best choices among the parent-report scales.</p> <p>Reliability was stronger for newer measures and for informant-report measures relative to interview and observation measures. These observations are likely due to greater attention to psychometrics in recent measure development efforts, especially content and construct coverage, and the fact that informant-report questionnaires are often able to include higher numbers of items with poly-ordinal (Likert) response scales. While not surprising, this does emphasize the need to focus on measures that have an adequate number of items and that response scales are chosen to provide a useful range of information about specific behaviors or symptoms while also making sure that response choices are relevant and easy to rate.</p> <p>Based on the present findings, the following recommendations are made for considering reliability in measure selection and future measure development. (<reflink idref="bib1" id="ref84">1</reflink>) ADOS modules all have good reliability for diagnostic evaluation use. However, reliability is insufficient in situations where monitoring a wide range of symptom levels is desired. Future revisions of the ADOS might consider adding RRB items to enhance the reliability of the assessment of this domain. Using a broader Likert scale may also be useful for reliably capturing individual differences across a wide range of autism symptom presentations. (<reflink idref="bib2" id="ref85">2</reflink>) The ADI-R had good total scale reliability and adequate conditional reliability for measuring individual differences in the range important for diagnostic differentiation. However, conditional reliability was generally lower than the DISCO, and the ADI-R appears less able to capture more significant social communication/interaction symptoms or subtler restricted/repetitive behaviors. For measuring a wider range of autism symptom presentations, the DISCO appears to be a better choice when a parent-interview measure is desired. (<reflink idref="bib3" id="ref86">3</reflink>) The SCQ, given a weaker scale and conditional reliability, should only be chosen in situations where other comparable parent-report measures (SRS and ASDQ) are not feasible. The measure has been attractive to some users based on the simplicity of the dichotomous (yes/no) response scale, but the present analyses suggest that this scaling approach substantially reduces conditional reliability. (<reflink idref="bib4" id="ref87">4</reflink>) If only a single parent-report measure can be implemented and a measure that covers SCI and RRB domains is required, either the SRS or ASDQ should be considered. The choice between these two measures should be largely based on content coverage, construct validity, and predictive validity. (<reflink idref="bib5" id="ref88">5</reflink>) Dedicated parent-report SCI and RRB measures, such as the SSDS, DARB, CRI-R, RBQ-2, have very strong reliability profiles. Similarly, the choice among these measures should be largely based on content coverage, construct validity, predictive validity, and practical considerations.</p> <hd id="AN0177598717-27">Limitations and Future Directions</hd> <p>Samples for each measure were selected based on the availability of large datasets. For observation and interview measures, these samples generally reflect at-risk populations rather than the full population with neurotypical individuals. While all measures had the full score range represented, these sampling differences may shift the conditional reliability curves toward the lower score range (leftward shift). Thus, the middle of the latent trait (and the peaks of the conditional reliability curves) may actually be in ranges where ASD and non-ASD developmental disability cases are differentiated rather than representing measurement precision across the full population. Further, although every attempt was made to include a broad set of both widely used and well-established measures as well as newly developed measures, this study only included the measures where data was accessible to authors. For instance, of the observation measures, only the ADOS modules were available. Future studies including other observation measures, such as the Childhood Autism Rating Scale (Schopler et al., [<reflink idref="bib54" id="ref89">54</reflink>]) or the Autism Observation Scale for Infants (Bryson et al., [<reflink idref="bib5" id="ref90">5</reflink>]), are needed. Furthermore, future studies should attempt to evaluate additional commonly used screening and diagnostic instruments, such as the Autism Spectrum Quotient (Baron-Cohen et al., [<reflink idref="bib2" id="ref91">2</reflink>]), Modified Checklist for Autism in Toddlers (Robins et al., [<reflink idref="bib51" id="ref92">51</reflink>]), Communication and Symbolic Behavior Scales (Wetherby & Prizant, [<reflink idref="bib67" id="ref93">67</reflink>]), Screening Tool for Autism in Toddlers (Stone et al., [<reflink idref="bib56" id="ref94">56</reflink>]), TELE-ASD-PEDS (Corona et al., [<reflink idref="bib11" id="ref95">11</reflink>]), Autism Impact Measure (Kanne et al., [<reflink idref="bib26" id="ref96">26</reflink>]), and Brief Observation of Social Communication Change (Grzadzinski et al., [<reflink idref="bib19" id="ref97">19</reflink>]; Kitzerow et al., [<reflink idref="bib28" id="ref98">28</reflink>]).</p> <p>Finally, the present study did not evaluate measurement invariance or differential item functioning. These parameters are also key to understanding whether scales are measuring consistently across relevant demographic and clinical groups. Unfortunately, the datasets aggregated for the present study did not include sufficient information on race/ethnicity and all data were from US samples. While prior work suggests that many measures show good invariance, at least across age, sex, and race/ethnicity in US and UK populations, these findings merit replication in large datasets and would be important considerations in future measure choice in clinical and research settings. Nor did this study evaluate test-retest or inter-rater reliability, a key consideration, particularly for observation and interview measures and longitudinal monitoring applications.</p> <hd id="AN0177598717-28">Conclusion</hd> <p>In summary, the present study demonstrated good to excellent scale and conditional reliability for nearly all measures evaluated, with the notable exceptions of weaker conditional reliability for the ADOS, ADI-R, and SCQ. Strong reliability estimates extended to measures of the SCI and RRB domains, with the exception that ADOS and ADI-R measurement precision was weaker for the RRB domain. Future studies should consider scale and conditional reliability in measure selection, although, for cases where reliability is roughly equivalent, choices should be based on intended use, cost and availability, applicability to the target population, and validity considerations.</p> <hd id="AN0177598717-29">Publisher's Note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0177598717-30"> <title> References </title> <blist> <bibl id="bib1" idref="ref23" type="bt">1</bibl> <bibtext> Alexander LM, Escalera J, Ai L, Andreotti C, Febre K, Mangone A, Vega Potler N, Langer N, Alexander A, Kovacs M, Litke S, O'Hagan B, Andersen J, Bronstein B, Bui A, Bushey M, Butler H, Castagna V, Camacho N, Chan E, Citera D, Clucas J, Cohen S, Dufek S, Eaves M, Gregory C. The healthy Brain Network Biobank: An open resource for transdiagnostic research in pediatric mental health and learning disorders. Cold Spring Harbor Laboratory. 2017. 10.1101/149369</bibtext> </blist> <blist> <bibl id="bib2" idref="ref73" type="bt">2</bibl> <bibtext> Baron-Cohen S, Wheelwright S, Skinner R, Martin J, Clubley E. The autism-spectrum quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders. 2001; 31; 1: 5-17. 10.1023/a:1005653411471. 11439754</bibtext> </blist> <blist> <bibl id="bib3" idref="ref51" type="bt">3</bibl> <bibtext> Barrett SL, Uljarevic M, Baker EK, Richdale AL, Jones CR, Leekam SR. The adult repetitive Behaviours Questionnaire-2 (RBQ-2A): A self-report measure of restricted and repetitive behaviours. Journal of Autism and Developmental Disorders. 2015; 45; 11: 3680-3692. 10.1007/s10803-015-2514-6. 26155763. 4608982</bibtext> </blist> <blist> <bibl id="bib4" idref="ref3" type="bt">4</bibl> <bibtext> Boateng GO, Neilands TB, Frongillo EA, Melgar-Quinonez HR, Young SL. Best Practices for developing and validating Scales for Health, Social, and behavioral research: A primer. Front Public Health. 2018; 6: 149. 10.3389/fpubh.2018.00149. 29942800. 6004510</bibtext> </blist> <blist> <bibl id="bib5" idref="ref88" type="bt">5</bibl> <bibtext> Bryson SE, Zwaigenbaum L, McDermott C, Rombough V, Brian J. The Autism Observation Scale for Infants: Scale development and reliability data. Journal of Autism and Developmental Disorders. 2008; 38; 4: 731-738. 10.1007/s10803-007-0440-y. 17874180</bibtext> </blist> <blist> <bibl id="bib6" idref="ref33" type="bt">6</bibl> <bibtext> Carrington S, Leekam S, Kent R, Maljaars J, Gould J, Wing L, Le Couteur A, Van Berckelaer-Onnes I, Noens I. Signposting for diagnosis of Autism Spectrum disorder using the diagnostic interview for Social and Communication Disorders (DISCO). Research in Autism Spectrum Disorders. 2015; 9: 45-52. 10.1016/j.rasd.2014.10.003</bibtext> </blist> <blist> <bibl id="bib7" idref="ref42" type="bt">7</bibl> <bibtext> Carrington SJ, Barrett SL, Sivagamasundari U, Fretwell C, Noens I, Maljaars J, Leekam SR. Describing the Profile of Diagnostic features in autistic adults using an Abbreviated Version of the diagnostic interview for Social and Communication Disorders (DISCO-Abbreviated). Journal of Autism and Developmental Disorders. 2019; 49; 12: 5036-5046. 10.1007/s10803-019-04214-7. 31494785. 6841916</bibtext> </blist> <blist> <bibl id="bib8" idref="ref32" type="bt">8</bibl> <bibtext> Carrington SJ, Kent RG, Maljaars J, Le Couteur A, Gould J, Wing L, Noens I, Van Berckelaer-Onnes I, Leekam SR. DSM-5 Autism Spectrum Disorder: In search of essential behaviours for diagnosis. Research in Autism Spectrum Disorders. 2014; 8; 6: 701-715. 10.1016/j.rasd.2014.03.017</bibtext> </blist> <blist> <bibl id="bib9" idref="ref1" type="bt">9</bibl> <bibtext> Charman T, Gotham K. Measurement issues: Screening and diagnostic instruments for autism spectrum disorders - lessons from research and practice. Child Adolesc Ment Health. 2013; 18; 1: 52-63. 10.1111/j.1475-3588.2012.00664.x. 23539140</bibtext> </blist> <blist> <bibtext> Constantino, J. N, & Gruber, C. P. (2012). The social responsiveness scale manual, second edition (SRS-2). Western Psychological Services.</bibtext> </blist> <blist> <bibtext> Corona LL, Wagner L, Wade J, Weitlauf AS, Hine J, Nicholson A, Stone C, Vehorn A, Warren Z. Toward Novel Tools for Autism Identification: Fusing computational and clinical expertise. Journal of Autism and Developmental Disorders. 2021; 51; 11: 4003-4012. 10.1007/s10803-020-04857-x. 33417138. 7791904</bibtext> </blist> <blist> <bibtext> Embretson, S. E, & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.</bibtext> </blist> <blist> <bibtext> Evans DW, Uljarevic M, Lusk LG, Loth E, Frazier T. Development of two dimensional measures of restricted and repetitive behavior in parents and children. Journal of the American Academy of Child and Adolescent Psychiatry. 2017; 56; 1: 51-58. 10.1016/j.jaac.2016.10.014. 27993229</bibtext> </blist> <blist> <bibtext> Fischbach GD, Lord C. The Simons Simplex Collection: A resource for identification of autism genetic risk factors. Neuron. 2010; 68; 2: 192-195. 10.1016/j.neuron.2010.10.006. 20955926</bibtext> </blist> <blist> <bibtext> Frazier TW, Dimitropoulos A, Abbeduto L, Armstrong-Brine M, Kralovic S, Shih A, Hardan AY, Youngstrom EA, Uljarevic MQuadrant Biosciences - As You Are. 2023. 10.1111/dmcn.15497</bibtext> </blist> <blist> <bibtext> Frazier TW, Hardan AY. Equivalence of symptom dimensions in females and males with autism. Autism. 2017; 21; 6: 749-759. 10.1177/1362361316660066. 27503465</bibtext> </blist> <blist> <bibtext> Frazier TW, Youngstrom EA, Embacher R, Hardan AY, Constantino JN, Law P, Findling RL, Eng C. Demographic and clinical correlates of autism symptom domains and autism spectrum diagnosis. Autism. 2014; 18; 5: 571-582. 10.1177/1362361313481506. 24104512</bibtext> </blist> <blist> <bibtext> Geschwind DH, Sowinski J, Lord C, Iversen P, Shestack J, Jones P, Ducat L, Spence S. The autism genetic resource exchange: A resource for the study of autism and related neuropsychiatric conditions. American Journal of Human Genetics. 2001; 69: 463-466. 10.1086/321292. 11452364. 1235320</bibtext> </blist> <blist> <bibtext> Grzadzinski R, Carr T, Colombi C, McGuire K, Dufek S, Pickles A, Lord C. Measuring changes in Social Communication Behaviors: Preliminary Development of the brief Observation of Social Communication Change (BOSCC). Journal of Autism and Developmental Disorders. 2016; 46; 7: 2464-2479. 10.1007/s10803-016-2782-9. 27062034</bibtext> </blist> <blist> <bibtext> Hall D, Huerta MF, McAuliffe MJ, Farber GK. Sharing heterogeneous data: The national database for autism research. Neuroinformatics. 2012; 10; 4: 331-339. 10.1007/s12021-012-9151-4. 22622767. 4219200</bibtext> </blist> <blist> <bibtext> Hambleton, R. K, Swaminathan, H, & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.</bibtext> </blist> <blist> <bibtext> Hayes AF, Coutts JJ. Use omega rather than Cronbach's alpha for estimating reliability. But... Communication Methods and Measures. 2020; 14; 1: 1-24. 10.1080/19312458.2020.1718629</bibtext> </blist> <blist> <bibtext> Huerta M, Bishop SL, Duncan A, Hus V, Lord C. Application of DSM-5 criteria for autism spectrum disorder to three samples of children with DSM-IV diagnoses of pervasive developmental disorders. American Journal of Psychiatry. 2012; 169; 10: 1056-1064. 10.1176/appi.ajp.2012.12020276. 23032385</bibtext> </blist> <blist> <bibtext> IBM Corp (2021). IBM SPSS Statistics for Windows. In (Version 28.0) IBM Corp.</bibtext> </blist> <blist> <bibtext> Janvier D, Choi YB, Klein C, Lord C, Kim SH. Brief report: Examining test-retest reliability of the Autism Diagnostic Observation schedule (ADOS-2) calibrated severity scores (CSS). Journal of Autism and Developmental Disorders. 2022; 52; 3: 1388-1394. 10.1007/s10803-021-04952-7. 33826039</bibtext> </blist> <blist> <bibtext> Kanne SM, Mazurek MO, Sikora D, Bellando J, Branum-Martin L, Handen B, Katz T, Freedman B, Powell MP, Warren Z. The Autism Impact measure (AIM): Initial development of a new tool for treatment outcome measurement. Journal of Autism and Developmental Disorders. 2014; 44; 1: 168-179. 10.1007/s10803-013-1862-3. 23748386</bibtext> </blist> <blist> <bibtext> Kent RG, Carrington SJ, Le Couteur A, Gould J, Wing L, Maljaars J, Noens I, van Berckelaer-Onnes I, Leekam SR. Diagnosing autism spectrum disorder: Who will get a DSM-5 diagnosis?. Journal of Child Psychology and Psychiatry and Allied Disciplines. 2013; 54; 11: 1242-1250. 10.1111/jcpp.12085. 23701321</bibtext> </blist> <blist> <bibtext> Kitzerow J, Teufel K, Wilker C, Freitag CM. Using the brief observation of social communication change (BOSCC) to measure autism-specific development. Autism Research. 2016; 9; 9: 940-950. 10.1002/aur.1588. 26643669</bibtext> </blist> <blist> <bibtext> Lam KS, Aman MG. The repetitive behavior Scale-Revised: Independent validation in individuals with autism spectrum disorders. Journal of Autism and Developmental Disorders. 2007; 37; 5: 855-866. 10.1007/s10803-006-0213-z. 17048092</bibtext> </blist> <blist> <bibtext> Leekam S, Libby S, Wing L, Gould J, Gillberg C. Comparison of ICD-10 and Gillberg's criteria for Asperger syndrome. Autism. 2000; 4; 1: 11-28. 10.1177/1362361300004001002</bibtext> </blist> <blist> <bibtext> Leekam S, Tandos J, McConachie H, Meins E, Parkinson K, Wright C, Turner M, Arnott B, Vittorini L, Le Couteur A. Repetitive behaviours in typically developing 2-year-olds. Journal of Child Psychology and Psychiatry. 2007; 48; 11: 1131-1138. 10.1111/j.1469-7610.2007.01778.x. 17995489</bibtext> </blist> <blist> <bibtext> Leekam SR, Libby SJ, Wing L, Gould J, Taylor C. The diagnostic interview for Social and Communication Disorders: Algorithms for ICD-10 childhood autism and Wing and Gould autistic spectrum disorder. Journal of Child Psychology and Psychiatry and Allied Disciplines. 2002; 43; 3: 327-342. 10.1111/1469-7610.00024. 11944875</bibtext> </blist> <blist> <bibtext> Leekam SR, Libby SJ, Wing L, Gould J, Taylor C. The diagnostic interview for Social and Communication Disorders: Algorithms for ICD-10 childhood autism and Wing and Gould autistic spectrum disorder. Journal of Child Psychology and Psychiatry and Allied Disciplines. 2002; 43; 3: 327-342. 10.1111/1469-7610.00024. 11944875&db=PubMed&dopt=Citation&list_uids=11944875</bibtext> </blist> <blist> <bibtext> Leekam SR, Nieto C, Libby SJ, Wing L, Gould J. Describing the sensory abnormalities of children and adults with autism. Journal of Autism and Developmental Disorders. 2007; 37; 5: 894-910. 10.1007/s10803-006-0218-7. 17016677</bibtext> </blist> <blist> <bibtext> Leon AC, Marzuk PM, Portera L. More reliable outcome measures can reduce sample size requirements. Archives of General Psychiatry. 1995; 52; 10: 867-871. 10.1001/archpsyc.1995.03950220077014. 7575107</bibtext> </blist> <blist> <bibtext> Lidstone J, Uljarevic M, Sullivan JP, Rodgers J, McConachie H, Freeston M, Le Couteur A, Prior M, Leekam S. Relations among restricted and repetitive behviors, anxiety and sensory features in children with autism spectrum disorders. Research in Autism Spectrum Disorders. 2014; 8; 2: 82-92. 10.1016/j.rasd.2013.10.001</bibtext> </blist> <blist> <bibtext> Lord C, Brugha TS, Charman T, Cusack J, Dumas G, Frazier T, Jones EJH, Jones RM, Pickles A, State MW, Taylor JL, Veenstra-VanderWeele J. Autism spectrum disorder. Nat Rev Dis Primers. 2020; 6; 1: 5. 10.1038/s41572-019-0138-4. 31949163. 8900942</bibtext> </blist> <blist> <bibtext> Lord C, Pickles A, McLennan J, Rutter M, Bregman J, Folstein S, Fombonne E, Leboyer M, Minshew N. Diagnosing autism: Analyses of data from the Autism Diagnostic interview. Journal of Autism and Developmental Disorders. 1997; 27; 5: 501-517. 10.1023/A:1025873925661. 9403369</bibtext> </blist> <blist> <bibtext> Lord, C, Rutter, M, DiLavore, P. C, & Risi, S. (2002). Autism Diagnostic Observation schedule: ADOS manual. Western Psychological Services.</bibtext> </blist> <blist> <bibtext> Lord, C, Rutter, M, DiLavore, P. C, Risi, S, Gotham, K, & Bishop, S. L. (2012). Autism Diagnostic Observation schedule, Second Edition (ADOS-2) manual (part 1): Modules 1–4. Western Psychological Services.</bibtext> </blist> <blist> <bibtext> Lord C, Rutter M, LeCouteur A. ADI-Revised: A revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism and Developmental Disorders. 1994; 24: 569-685. 10.1007/BF02172145</bibtext> </blist> <blist> <bibtext> Maljaars J, Noens I, Scholte E, van Berckelaer-Onnes I. Evaluation of the criterion and convergent validity of the diagnostic interview for Social and Communication Disorders in young and low-functioning children. Autism. 2012; 16; 5: 487-497. 10.1177/1362361311402857. 21690082</bibtext> </blist> <blist> <bibtext> Murray AL, McKenzie K, Kuenssberg R, Booth T. Do the Autism Spectrum Quotient (AQ) and Autism Spectrum Quotient Short Form (AQ-S) primarily reflect General ASD Traits or specific ASD traits? A Bi-Factor Analysis. Assessment. 2017; 24; 4: 444-457. 10.1177/1073191115611230. 26475839</bibtext> </blist> <blist> <bibtext> Muthén, L. K, & Muthén, B. O. (1998–2017). Mplus User's Guide. Eighth Edition Muthén & Muthén.</bibtext> </blist> <blist> <bibtext> Nguyen PH, Ocansey ME, Miller M, Le DTK, Schmidt RJ, Prado EL. The reliability and validity of the social responsiveness scale to measure autism symptomology in vietnamese children. Autism Research. 2019; 12; 11: 1706-1718. 10.1002/aur.2179. 31355545. 7397486</bibtext> </blist> <blist> <bibtext> Nunnally, J. C, & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill, Inc.</bibtext> </blist> <blist> <bibtext> Nygren G, Hagberg B, Billstedt E, Skoglund A, Gillberg C, Johansson M. The swedish version of the diagnostic interview for Social and Communication Disorders (DISCO-10). Psychometric properties. Journal of Autism and Developmental Disorders. 2009; 39; 5: 730-741. 10.1007/s10803-008-0678-z. 19148741</bibtext> </blist> <blist> <bibtext> Phillips JM, Uljarevic M, Schuck RK, Schapp S, Solomon EM, Salzman E, Allerhand L, Libove RA, Frazier TW, Hardan AY. Development of the Stanford Social Dimensions Scale: Initial validation in autism spectrum disorder and in neurotypicals. Mol Autism. 2019; 10: 48. 10.1186/s13229-019-0298-9. 31890146. 6921422</bibtext> </blist> <blist> <bibtext> Reise SP, Widaman KF, Pugh RH. Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin. 1993; 114; 3: 552-566. 10.1037/0033-2909.114.3.552. 8272470</bibtext> </blist> <blist> <bibtext> Revelle W, Condon DM. Reliability from alpha to omega: A tutorial. Psychological Assessment. 2019; 31; 12: 1395-1411. 10.1037/pas0000754. 31380696</bibtext> </blist> <blist> <bibtext> Robins DL, Casagrande K, Barton M, Chen CM, Dumont-Mathieu T, Fein D. Validation of the modified checklist for Autism in toddlers, revised with follow-up (M-CHAT-R/F). Pediatrics. 2014; 133; 1: 37-45. 10.1542/peds.2013-1813. 24366990. 3876182</bibtext> </blist> <blist> <bibtext> Rutter, M, Bailey, A, & Lord, C. (2003). The Social Communication Questionnaire Manual. Western Psychological Services.</bibtext> </blist> <blist> <bibtext> Sanchez MJ, Constantino JN. Expediting clinician assessment in the diagnosis of autism spectrum disorder. Developmental Medicine and Child Neurology. 2020; 62; 7: 806-812. 10.1111/dmcn.14530. 32239502. 7540056</bibtext> </blist> <blist> <bibtext> Schopler, E, Van Bourgondien, M, Wellman, G, & Love, S. (2010). Childhood Autism Rating Scale – 2nd Edition. Western Psychological Services.</bibtext> </blist> <blist> <bibtext> Simms LJ, Zelazny K, Williams TF, Bernstein L. Does the number of Response Options Matter? Psychometric perspectives using personality Questionnaire Data. Psychological Assessment. 2019; 31; 4: 557-566. 10.1037/pas0000648. 30869956</bibtext> </blist> <blist> <bibtext> Stone WL, McMahon CR, Henderson LM. Use of the Screening Tool for Autism in Two-Year-Olds (STAT) for children under 24 months: An exploratory study. Autism. 2008; 12; 5: 557-573. 10.1177/1362361308096403. 18805947</bibtext> </blist> <blist> <bibtext> Streiner, D. L, & Norman, G. R. (1995). Health Measurement Scales: A practical guide to their development and use (2nd ed.). Oxford University Press.</bibtext> </blist> <blist> <bibtext> Streiner, D. L, & Norman, G. R. (2008). Health measurement scales: A practical guide to their use (4th ed.). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199231881.001.0001.</bibtext> </blist> <blist> <bibtext> Taylor BP, Liu J, Mowrey W, Eule E, Bolognani F, Hollander EThe Montefiore-Einstein Rigidity Scale-Revised (MERS-R). 2022. 10.1016/j.jpsychires.2021.12.055</bibtext> </blist> <blist> <bibtext> Thissen, D. (2000). Reliability and measurement precision (2nd ed.). Lawrence Erlbaum Associates.</bibtext> </blist> <blist> <bibtext> Uljarevic M, Arnott B, Carrington SJ, Meins E, Fernyhough C, McConachie H, Le Couteur A, Leekam SR. Development of restricted and repetitive behaviors from 15 to 77 months: Stability of two distinct subtypes?. Developmental Psychology. 2017; 53; 10: 1859-1868. 10.1037/dev0000324. 28758781</bibtext> </blist> <blist> <bibtext> Uljarević M, Frazier TW, Jo B, Scahill L, Youngstrom EA, Spackman E, Phillips JM, Billingham W, Hardan AY. Dimensional Assessment of Restricted and repetitive behaviors: Development and preliminary validation of a new measure. Journal of American Academy of Child and Adolescent Psychiatry. 2022. 10.1016/j.jaac.2022.07.863</bibtext> </blist> <blist> <bibtext> Uljarevic M, Frazier TW, Phillips JM, Jo B, Littlefield S, Hardan AY. Mapping the Research Domain Criteria social processes constructs to the Social Responsiveness Scale. Journal of the American Academy of Child and Adolescent Psychiatry. 2019; 58; 10S: S311. 10.1016/j.jaac.2019.07.938</bibtext> </blist> <blist> <bibtext> Uljarevic, M, Frazier, T. W, Phillips, J. M, Jo, B, Littlefield, S, & Hardan, A. Y. (2020). Quantifying Research Domain Criteria Social Communication Subconstructs using the Social Communication Questionnaire in Youth. Journal Of Clinical Child And Adolescent Psychology: The Official Journal For The Society Of Clinical Child And Adolescent Psychology, American Psychological Association, Division 53, 1–11. https://doi.org/10.1080/15374416.2019.1669156.</bibtext> </blist> <blist> <bibtext> Uljarevic M, Jo B, Frazier TW, Scahill L, Youngstrom EA, Hardan AY. Using the big data approach to clarify the structure of restricted and repetitive behaviors across the most commonly used autism spectrum disorder measures. Mol Autism. 2021; 12; 1: 39. 10.1186/s13229-021-00419-9. 34044873. 8162018</bibtext> </blist> <blist> <bibtext> Uljarević, M, Jo, B, Frazier, T. W, Scahill, L, Youngstrom, E. A, & Hardan, A. Y. (2021). Using the big data approach to clarify the structure of restricted repetitive behaviors across the most commonly used autism spectrum disorder measures. Molecular Autism.</bibtext> </blist> <blist> <bibtext> Wetherby, A. M, & Prizant, B. M. (2003). Communication and symbolic behavior scales (CSBS), Normed Edition. Paul A. Brookes Publishing Co.</bibtext> </blist> <blist> <bibtext> Wing L, Leekam SR, Libby SJ, Gould J, Larcombe M. The diagnostic interview for Social and Communication Disorders: background, inter-rater reliability and clinical use. Journal of Child Psychology and Psychiatry and Allied Disciplines. 2002; 43; 3: 307-325. 10.1111/1469-7610.00023. 11944874</bibtext> </blist> </ref> <aug> <p>By Thomas W. Frazier; Andrew J. O. Whitehouse; Susan R. Leekam; Sarah J. Carrington; Gail A. Alvares; David W. Evans; Antonio Y. Hardan and Mirko Uljarević</p> <p>Reported by Author; Author; Author; Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib37" firstref="ref2"></nolink> <nolink nlid="nl2" bibid="bib58" firstref="ref4"></nolink> <nolink nlid="nl3" bibid="bib35" firstref="ref5"></nolink> <nolink nlid="nl4" bibid="bib15" firstref="ref7"></nolink> <nolink nlid="nl5" bibid="bib59" firstref="ref8"></nolink> <nolink nlid="nl6" bibid="bib43" firstref="ref9"></nolink> <nolink nlid="nl7" bibid="bib45" firstref="ref10"></nolink> <nolink nlid="nl8" bibid="bib25" firstref="ref11"></nolink> <nolink nlid="nl9" bibid="bib55" firstref="ref13"></nolink> <nolink nlid="nl10" bibid="bib14" firstref="ref14"></nolink> <nolink nlid="nl11" bibid="bib18" firstref="ref15"></nolink> <nolink nlid="nl12" bibid="bib20" firstref="ref16"></nolink> <nolink nlid="nl13" bibid="bib30" firstref="ref17"></nolink> <nolink nlid="nl14" bibid="bib33" firstref="ref18"></nolink> <nolink nlid="nl15" bibid="bib31" firstref="ref19"></nolink> <nolink nlid="nl16" bibid="bib68" firstref="ref21"></nolink> <nolink nlid="nl17" bibid="bib10" firstref="ref22"></nolink> <nolink nlid="nl18" bibid="bib48" firstref="ref25"></nolink> <nolink nlid="nl19" bibid="bib13" firstref="ref26"></nolink> <nolink nlid="nl20" bibid="bib62" firstref="ref27"></nolink> <nolink nlid="nl21" bibid="bib39" firstref="ref28"></nolink> <nolink nlid="nl22" bibid="bib40" firstref="ref29"></nolink> <nolink nlid="nl23" bibid="bib41" firstref="ref30"></nolink> <nolink nlid="nl24" bibid="bib23" firstref="ref31"></nolink> <nolink nlid="nl25" bibid="bib27" firstref="ref34"></nolink> <nolink nlid="nl26" bibid="bib42" firstref="ref35"></nolink> <nolink nlid="nl27" bibid="bib32" firstref="ref37"></nolink> <nolink nlid="nl28" bibid="bib47" firstref="ref39"></nolink> <nolink nlid="nl29" bibid="bib52" firstref="ref44"></nolink> <nolink nlid="nl30" bibid="bib38" firstref="ref45"></nolink> <nolink nlid="nl31" bibid="bib64" firstref="ref46"></nolink> <nolink nlid="nl32" bibid="bib66" firstref="ref47"></nolink> <nolink nlid="nl33" bibid="bib29" firstref="ref50"></nolink> <nolink nlid="nl34" bibid="bib36" firstref="ref52"></nolink> <nolink nlid="nl35" bibid="bib61" firstref="ref54"></nolink> <nolink nlid="nl36" bibid="bib57" firstref="ref56"></nolink> <nolink nlid="nl37" bibid="bib50" firstref="ref57"></nolink> <nolink nlid="nl38" bibid="bib22" firstref="ref58"></nolink> <nolink nlid="nl39" bibid="bib46" firstref="ref59"></nolink> <nolink nlid="nl40" bibid="bib12" firstref="ref61"></nolink> <nolink nlid="nl41" bibid="bib21" firstref="ref62"></nolink> <nolink nlid="nl42" bibid="bib49" firstref="ref63"></nolink> <nolink nlid="nl43" bibid="bib60" firstref="ref64"></nolink> <nolink nlid="nl44" bibid="bib24" firstref="ref65"></nolink> <nolink nlid="nl45" bibid="bib44" firstref="ref66"></nolink> <nolink nlid="nl46" bibid="bib53" firstref="ref77"></nolink> <nolink nlid="nl47" bibid="bib17" firstref="ref78"></nolink> <nolink nlid="nl48" bibid="bib19" firstref="ref79"></nolink> <nolink nlid="nl49" bibid="bib16" firstref="ref80"></nolink> <nolink nlid="nl50" bibid="bib63" firstref="ref81"></nolink> <nolink nlid="nl51" bibid="bib65" firstref="ref83"></nolink> <nolink nlid="nl52" bibid="bib54" firstref="ref89"></nolink> <nolink nlid="nl53" bibid="bib51" firstref="ref92"></nolink> <nolink nlid="nl54" bibid="bib67" firstref="ref93"></nolink> <nolink nlid="nl55" bibid="bib56" firstref="ref94"></nolink> <nolink nlid="nl56" bibid="bib11" firstref="ref95"></nolink> <nolink nlid="nl57" bibid="bib26" firstref="ref96"></nolink> <nolink nlid="nl58" bibid="bib28" firstref="ref98"></nolink>
Header DbId: eric
DbLabel: ERIC
An: EJ1426531
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Reliability of the Commonly Used and Newly-Developed Autism Measures
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Thomas+W%2E+Frazier%22">Thomas W. Frazier</searchLink> (ORCID <externalLink term="http://orcid.org/0000-0002-6951-2667">0000-0002-6951-2667</externalLink>)<br /><searchLink fieldCode="AR" term="%22Andrew+J%2E+O%2E+Whitehouse%22">Andrew J. O. Whitehouse</searchLink><br /><searchLink fieldCode="AR" term="%22Susan+R%2E+Leekam%22">Susan R. Leekam</searchLink><br /><searchLink fieldCode="AR" term="%22Sarah+J%2E+Carrington%22">Sarah J. Carrington</searchLink><br /><searchLink fieldCode="AR" term="%22Gail+A%2E+Alvares%22">Gail A. Alvares</searchLink><br /><searchLink fieldCode="AR" term="%22David+W%2E+Evans%22">David W. Evans</searchLink><br /><searchLink fieldCode="AR" term="%22Antonio+Y%2E+Hardan%22">Antonio Y. Hardan</searchLink><br /><searchLink fieldCode="AR" term="%22Mirko+Uljarevic%22">Mirko Uljarevic</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Journal+of+Autism+and+Developmental+Disorders%22"><i>Journal of Autism and Developmental Disorders</i></searchLink>. 2024 54(6):2158-2169.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 12
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2024
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Test+Reliability%22">Test Reliability</searchLink><br /><searchLink fieldCode="DE" term="%22Item+Response+Theory%22">Item Response Theory</searchLink><br /><searchLink fieldCode="DE" term="%22Autism+Spectrum+Disorders%22">Autism Spectrum Disorders</searchLink><br /><searchLink fieldCode="DE" term="%22Clinical+Diagnosis%22">Clinical Diagnosis</searchLink><br /><searchLink fieldCode="DE" term="%22Parents%22">Parents</searchLink><br /><searchLink fieldCode="DE" term="%22Symptoms+%28Individual+Disorders%29%22">Symptoms (Individual Disorders)</searchLink><br /><searchLink fieldCode="DE" term="%22Diagnostic+Tests%22">Diagnostic Tests</searchLink>
– Name: SubjectThesaurus
  Label: Assessment and Survey Identifiers
  Group: Su
  Data: <searchLink fieldCode="SU" term="%22Autism+Diagnostic+Observation+Schedule%22">Autism Diagnostic Observation Schedule</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1007/s10803-023-05967-y
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 0162-3257<br />1573-3432
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Purpose: The aim of the present study was to compare scale and conditional reliability derived from item response theory analyses among the most commonly used, as well as several newly developed, observation, interview, and parent-report autism instruments. Methods: When available, data sets were combined to facilitate large sample evaluation. Scale reliability (internal consistency, average corrected item-total correlations, and model reliability) and conditional reliability estimates were computed for total scores and for measure subscales. Results: Generally good to excellent scale reliability was observed for total scores for all measures, scale reliability was weaker for RRB subscales of the ADOS and ADI-R, reflecting the relatively small number of items for these measures. For diagnostic measures, conditional reliability tended to be very good (> 0.80) in the regions of the latent trait where ASD and non-ASD developmental disability cases would be differentiated. For parent-report scales, conditional reliability of total scores tended to be excellent (> 0.90) across very wide ranges of autism symptom levels, with a few notable exceptions. Conclusions: These findings support the use of all of the clinical observation, interview, and parent-report autism symptom measures examined, but also suggest specific limitations that warrant consideration when choosing measures for specific clinical or research applications.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2024
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1426531
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1426531
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s10803-023-05967-y
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 12
        StartPage: 2158
    Subjects:
      – SubjectFull: Test Reliability
        Type: general
      – SubjectFull: Item Response Theory
        Type: general
      – SubjectFull: Autism Spectrum Disorders
        Type: general
      – SubjectFull: Clinical Diagnosis
        Type: general
      – SubjectFull: Parents
        Type: general
      – SubjectFull: Symptoms (Individual Disorders)
        Type: general
      – SubjectFull: Diagnostic Tests
        Type: general
      – SubjectFull: Autism Diagnostic Observation Schedule
        Type: general
    Titles:
      – TitleFull: Reliability of the Commonly Used and Newly-Developed Autism Measures
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Thomas W. Frazier
      – PersonEntity:
          Name:
            NameFull: Andrew J. O. Whitehouse
      – PersonEntity:
          Name:
            NameFull: Susan R. Leekam
      – PersonEntity:
          Name:
            NameFull: Sarah J. Carrington
      – PersonEntity:
          Name:
            NameFull: Gail A. Alvares
      – PersonEntity:
          Name:
            NameFull: David W. Evans
      – PersonEntity:
          Name:
            NameFull: Antonio Y. Hardan
      – PersonEntity:
          Name:
            NameFull: Mirko Uljarevic
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 06
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 0162-3257
            – Type: issn-electronic
              Value: 1573-3432
          Numbering:
            – Type: volume
              Value: 54
            – Type: issue
              Value: 6
          Titles:
            – TitleFull: Journal of Autism and Developmental Disorders
              Type: main
ResultId 1