How Raters Differ: A Study of Structured Oral Mathematics Assessment

Saved in:
Bibliographic Details
Title: How Raters Differ: A Study of Structured Oral Mathematics Assessment
Language: English
Authors: Samuel Sollerman (ORCID 0000-0002-9676-9521)
Source: Practical Assessment, Research & Evaluation. 2026 31(1).
Availability: University of Massachusetts Amherst Libraries. 154 Hicks Way, Amherst, MA 01003. e-mail: pare@umass.edu; Web site: https://openpublishing.library.umass.edu/pare/
Peer Reviewed: Y
Page Count: 16
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Education Level: Secondary Education
Descriptors: Mathematics Achievement, Student Evaluation, Foreign Countries, Verbal Tests, Mathematics Tests, Evaluation Methods, Experienced Teachers, Test Format, Secondary School Mathematics, Scoring Rubrics, Secondary School Students, Scoring, National Competency Tests, Performance Based Assessment
Geographic Terms: Sweden
ISSN: 1531-7714
Abstract: This study examines the nature and extent of interpretive variability in structured oral mathematics assessments. Using Swedish national test data from 74 students across three oral formats, six experienced teachers independently rated reasoning, communication, and method using shared rubrics. Multiple reliability indicators and Svensson's method were employed to distinguish systematic and unsystematic interpretive variation. Exact agreement was low across formats, with higher but still modest adjacent agreement. Relative Position effects were frequent, indicating systematic differences in rater thresholds. In contrast, the most dialogic format showed greater Relative Rank Variance, suggesting more random inconsistency. Raters reported high confidence even when statistical agreement was low, revealing a gap between perceived certainty and interpretive alignment. The analysis indicates that assessment structure and interactional demands shape both what students display and how raters apply criteria, making variability a feature of professional judgment rather than merely error. Implications include the use of calibrated exemplars, targeted calibration activities, and collaborative scoring practices to enhance reliability without sacrificing the diagnostic value of oral assessment in competency-based systems.
Abstractor: As Provided
Entry Date: 2026
Accession Number: EJ1495825
Database: ERIC
Be the first to leave a comment!
You must be logged in first