Reassessing Weights in Large-Scale Assessments and Multilevel Models

Saved in:
Bibliographic Details
Title: Reassessing Weights in Large-Scale Assessments and Multilevel Models
Language: English
Authors: Umut Atasever, Francis L. Huang, Leslie Rutkowski
Source: Large-scale Assessments in Education. 2025 13.
Availability: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Peer Reviewed: Y
Page Count: 27
Publication Date: 2025
Document Type: Journal Articles
Reports - Research
Education Level: Elementary Secondary Education
Secondary Education
Descriptors: Mathematics Tests, International Assessment, Elementary Secondary Education, Foreign Countries, Mathematics Achievement, Science Tests, Science Achievement, Achievement Tests, Secondary School Students, Sampling, Probability, Statistical Bias, Scaling, Hierarchical Linear Modeling, Monte Carlo Methods
Assessment and Survey Identifiers: Trends in International Mathematics and Science Study, Program for International Student Assessment
DOI: 10.1186/s40536-025-00245-y
ISSN: 2196-0739
Abstract: When analyzing large-scale assessments (LSAs) that use complex sampling designs, it is important to account for probability sampling using weights. However, the use of these weights in multilevel models has been widely debated, particularly regarding their application at different levels of the model. Yet, no consensus has been reached on the best method to apply weights. To address this, we conducted a Monte Carlo simulation, modeling a typical LSA population with known true values for the variables of interest. Using repeated sampling from this population, we generated weights using a stratified two-stage cluster design, where clusters (schools) were selected using probability proportional to size (PPS) sampling from designated explicit strata. We examined both class-level and student-level sampling structures and applied a nonresponse model at both the school and student levels. For each sample drawn, we assessed bias and coverage rates across models that applied weights at two levels, only at level 2, only at level 1, and without weights. Our findings show that applying only level-2 weights produced the most precise estimates, while models with no weights or only rescaled level-1 weights led to the highest bias. Using both level-1 and level-2 weights together was acceptable, although variance components were slightly underestimated. However, scaling level-1 weights would mirror using only the level-2 weights in datasets where there is no variation of weights within clusters. An applied example using TIMSS data supports these findings. This study contributes to the literature by explaining the least biased weight methods with complex sampling scenarios and offering practical guidance on using weights in multilevel models. We provide the R syntax for both the simulation and the applied example for reproducibility.
Abstractor: As Provided
Entry Date: 2025
Accession Number: EJ1464974
Database: ERIC
Description
Abstract:When analyzing large-scale assessments (LSAs) that use complex sampling designs, it is important to account for probability sampling using weights. However, the use of these weights in multilevel models has been widely debated, particularly regarding their application at different levels of the model. Yet, no consensus has been reached on the best method to apply weights. To address this, we conducted a Monte Carlo simulation, modeling a typical LSA population with known true values for the variables of interest. Using repeated sampling from this population, we generated weights using a stratified two-stage cluster design, where clusters (schools) were selected using probability proportional to size (PPS) sampling from designated explicit strata. We examined both class-level and student-level sampling structures and applied a nonresponse model at both the school and student levels. For each sample drawn, we assessed bias and coverage rates across models that applied weights at two levels, only at level 2, only at level 1, and without weights. Our findings show that applying only level-2 weights produced the most precise estimates, while models with no weights or only rescaled level-1 weights led to the highest bias. Using both level-1 and level-2 weights together was acceptable, although variance components were slightly underestimated. However, scaling level-1 weights would mirror using only the level-2 weights in datasets where there is no variation of weights within clusters. An applied example using TIMSS data supports these findings. This study contributes to the literature by explaining the least biased weight methods with complex sampling scenarios and offering practical guidance on using weights in multilevel models. We provide the R syntax for both the simulation and the applied example for reproducibility.
ISSN:2196-0739
DOI:10.1186/s40536-025-00245-y