Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes
Saved in:
| Title: | Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes |
|---|---|
| Language: | English |
| Authors: | Joshua B. Gilbert, Zachary Himmelsbach, Luke W. Miratrix, Andrew D. Ho, Benjamin W. Domingue |
| Source: | Grantee Submission. 2026. |
| Peer Reviewed: | Y |
| Page Count: | 63 |
| Publication Date: | 2026 |
| Sponsoring Agency: | Institute of Education Sciences (ED) |
| Contract Number: | R305D240025 |
| Document Type: | Reports - Research |
| Education Level: | Elementary Secondary Education |
| Descriptors: | Value Added Models, Reliability, Comparative Analysis, Effect Size, Generalizability Theory, Educational Policy, Accountability, Equations (Mathematics), Simulation |
| DOI: | 10.3102/10769986251393339 |
| Abstract: | Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how estimates would differ had the test included alternative items. We introduce a model that estimates the magnitude of item-by-teacher/school variance accurately, revealing that standard VAMs can overstate reliability and overestimate differences between units. Using 16 academic outcomes from 8 studies with item-level data, we show how standard VAMs overstate reliability by a median of 0.04 on the 0-1 reliability scale (mean = 0.09, SD = 0.10) and provide standard deviations of teacher/school effects that are a median of 3% too large (mean = 12%, SD = 23% points). We discuss how imprecision due to heterogeneous VA effects across items attenuates effect sizes, complicates comparisons across studies, and contributes to temporal instability, though these effects are reduced when the number of items is high. Our results suggest that accurate estimation and interpretation of VAMs may be improved using item-level data, including qualitative data about how items represent the content domain. [This paper was published in "Journal of Educational and Behavioral Statistics" 2025.] |
| Abstractor: | As Provided |
| Notes: | https://doi.org/10.7910/DVN/89YITQ |
| IES Funded: | Yes |
| Entry Date: | 2026 |
| Accession Number: | ED679453 |
| Database: | ERIC |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: eric DbLabel: ERIC An: ED679453 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Joshua+B%2E+Gilbert%22">Joshua B. Gilbert</searchLink><br /><searchLink fieldCode="AR" term="%22Zachary+Himmelsbach%22">Zachary Himmelsbach</searchLink><br /><searchLink fieldCode="AR" term="%22Luke+W%2E+Miratrix%22">Luke W. Miratrix</searchLink><br /><searchLink fieldCode="AR" term="%22Andrew+D%2E+Ho%22">Andrew D. Ho</searchLink><br /><searchLink fieldCode="AR" term="%22Benjamin+W%2E+Domingue%22">Benjamin W. Domingue</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Grantee+Submission%22"><i>Grantee Submission</i></searchLink>. 2026. – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 63 – Name: DatePubCY Label: Publication Date Group: Date Data: 2026 – Name: SourceSuprt Label: Sponsoring Agency Group: SrcSuprt Data: Institute of Education Sciences (ED) – Name: NumberContract Label: Contract Number Group: NumCntrct Data: R305D240025 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Reports - Research – Name: Audience Label: Education Level Group: Audnce Data: <searchLink fieldCode="EL" term="%22Elementary+Secondary+Education%22">Elementary Secondary Education</searchLink> – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Value+Added+Models%22">Value Added Models</searchLink><br /><searchLink fieldCode="DE" term="%22Reliability%22">Reliability</searchLink><br /><searchLink fieldCode="DE" term="%22Comparative+Analysis%22">Comparative Analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Effect+Size%22">Effect Size</searchLink><br /><searchLink fieldCode="DE" term="%22Generalizability+Theory%22">Generalizability Theory</searchLink><br /><searchLink fieldCode="DE" term="%22Educational+Policy%22">Educational Policy</searchLink><br /><searchLink fieldCode="DE" term="%22Accountability%22">Accountability</searchLink><br /><searchLink fieldCode="DE" term="%22Equations+%28Mathematics%29%22">Equations (Mathematics)</searchLink><br /><searchLink fieldCode="DE" term="%22Simulation%22">Simulation</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.3102/10769986251393339 – Name: Abstract Label: Abstract Group: Ab Data: Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how estimates would differ had the test included alternative items. We introduce a model that estimates the magnitude of item-by-teacher/school variance accurately, revealing that standard VAMs can overstate reliability and overestimate differences between units. Using 16 academic outcomes from 8 studies with item-level data, we show how standard VAMs overstate reliability by a median of 0.04 on the 0-1 reliability scale (mean = 0.09, SD = 0.10) and provide standard deviations of teacher/school effects that are a median of 3% too large (mean = 12%, SD = 23% points). We discuss how imprecision due to heterogeneous VA effects across items attenuates effect sizes, complicates comparisons across studies, and contributes to temporal instability, though these effects are reduced when the number of items is high. Our results suggest that accurate estimation and interpretation of VAMs may be improved using item-level data, including qualitative data about how items represent the content domain. [This paper was published in "Journal of Educational and Behavioral Statistics" 2025.] – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: Note Label: Notes Group: Note Data: https://doi.org/10.7910/DVN/89YITQ – Name: CodeSource Label: IES Funded Group: SrcInfo Data: Yes – Name: DateEntry Label: Entry Date Group: Date Data: 2026 – Name: AN Label: Accession Number Group: ID Data: ED679453 |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=ED679453 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.3102/10769986251393339 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 63 Subjects: – SubjectFull: Value Added Models Type: general – SubjectFull: Reliability Type: general – SubjectFull: Comparative Analysis Type: general – SubjectFull: Effect Size Type: general – SubjectFull: Generalizability Theory Type: general – SubjectFull: Educational Policy Type: general – SubjectFull: Accountability Type: general – SubjectFull: Equations (Mathematics) Type: general – SubjectFull: Simulation Type: general Titles: – TitleFull: Item-Level Heterogeneity in Value Added Models: Implications for Reliability, Cross-Study Comparability, and Effect Sizes Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Joshua B. Gilbert – PersonEntity: Name: NameFull: Zachary Himmelsbach – PersonEntity: Name: NameFull: Luke W. Miratrix – PersonEntity: Name: NameFull: Andrew D. Ho – PersonEntity: Name: NameFull: Benjamin W. Domingue IsPartOfRelationships: – BibEntity: Dates: – D: 02 M: 04 Type: published Y: 2026 Titles: – TitleFull: Grantee Submission Type: main |
| ResultId | 1 |