Predictive Performance of Bayesian Stacking in Multilevel Education Data

Saved in:
Bibliographic Details
Title: Predictive Performance of Bayesian Stacking in Multilevel Education Data
Language: English
Authors: Mingya Huang (ORCID 0000-0002-0647-7390), David Kaplan
Source: Journal of Educational and Behavioral Statistics. 2025 50(2):214-238.
Availability: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
Peer Reviewed: Y
Page Count: 25
Publication Date: 2025
Document Type: Journal Articles
Reports - Research
Education Level: Secondary Education
Descriptors: Bayesian Statistics, Hierarchical Linear Modeling, Statistical Inference, Predictor Variables, Evaluation Methods, Simulation, Achievement Tests, Foreign Countries, Secondary School Students, International Assessment, Academic Achievement
Assessment and Survey Identifiers: Program for International Student Assessment
DOI: 10.3102/10769986241255969
ISSN: 1076-9986
1935-1054
Abstract: The issue of model uncertainty has been gaining interest in education and the social sciences community over the years, and the dominant methods for handling model uncertainty are based on Bayesian inference, particularly, Bayesian model averaging. However, Bayesian model averaging assumes that the true data-generating model is within the candidate model space over which averaging is taking place. Unlike Bayesian model averaging, the method of Bayesian stacking can account for model uncertainty without assuming that a true model exists. An issue with Bayesian stacking, however, is that it is an optimization technique that uses predictor-independent model weights and is, therefore, not fully Bayesian. Bayesian hierarchical stacking, proposed by Yao et al. further incorporates uncertainty by applying a hyperprior to the stacking weights. Considering the importance of multilevel models commonly applied in educational settings, this paper investigates via a simulation study and a real data example the predictive performance of original Bayesian stacking and Bayesian hierarchical stacking along with two other readily available weighting methods, pseudo-BMA and pseudo-BMA bootstrap (PBMA and PBMA+). Predictive performance is measured by the Kullback-Leibler divergence score. Although the differences in predictive performance among these four weighting methods in Bayesian stacking are small, we still find that Bayesian hierarchical stacking performs as well as conventional stacking, PBMA, and PBMA+ in settings where a true model is not assumed to exist.
Abstractor: As Provided
Entry Date: 2025
Accession Number: EJ1468025
Database: ERIC
Description
Abstract:The issue of model uncertainty has been gaining interest in education and the social sciences community over the years, and the dominant methods for handling model uncertainty are based on Bayesian inference, particularly, Bayesian model averaging. However, Bayesian model averaging assumes that the true data-generating model is within the candidate model space over which averaging is taking place. Unlike Bayesian model averaging, the method of Bayesian stacking can account for model uncertainty without assuming that a true model exists. An issue with Bayesian stacking, however, is that it is an optimization technique that uses predictor-independent model weights and is, therefore, not fully Bayesian. Bayesian hierarchical stacking, proposed by Yao et al. further incorporates uncertainty by applying a hyperprior to the stacking weights. Considering the importance of multilevel models commonly applied in educational settings, this paper investigates via a simulation study and a real data example the predictive performance of original Bayesian stacking and Bayesian hierarchical stacking along with two other readily available weighting methods, pseudo-BMA and pseudo-BMA bootstrap (PBMA and PBMA+). Predictive performance is measured by the Kullback-Leibler divergence score. Although the differences in predictive performance among these four weighting methods in Bayesian stacking are small, we still find that Bayesian hierarchical stacking performs as well as conventional stacking, PBMA, and PBMA+ in settings where a true model is not assumed to exist.
ISSN:1076-9986
1935-1054
DOI:10.3102/10769986241255969