How Raters Differ: A Study of Structured Oral Mathematics Assessment

Saved in:
Bibliographic Details
Title: How Raters Differ: A Study of Structured Oral Mathematics Assessment
Language: English
Authors: Samuel Sollerman (ORCID 0000-0002-9676-9521)
Source: Practical Assessment, Research & Evaluation. 2026 31(1).
Availability: University of Massachusetts Amherst Libraries. 154 Hicks Way, Amherst, MA 01003. e-mail: pare@umass.edu; Web site: https://openpublishing.library.umass.edu/pare/
Peer Reviewed: Y
Page Count: 16
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Education Level: Secondary Education
Descriptors: Mathematics Achievement, Student Evaluation, Foreign Countries, Verbal Tests, Mathematics Tests, Evaluation Methods, Experienced Teachers, Test Format, Secondary School Mathematics, Scoring Rubrics, Secondary School Students, Scoring, National Competency Tests, Performance Based Assessment
Geographic Terms: Sweden
ISSN: 1531-7714
Abstract: This study examines the nature and extent of interpretive variability in structured oral mathematics assessments. Using Swedish national test data from 74 students across three oral formats, six experienced teachers independently rated reasoning, communication, and method using shared rubrics. Multiple reliability indicators and Svensson's method were employed to distinguish systematic and unsystematic interpretive variation. Exact agreement was low across formats, with higher but still modest adjacent agreement. Relative Position effects were frequent, indicating systematic differences in rater thresholds. In contrast, the most dialogic format showed greater Relative Rank Variance, suggesting more random inconsistency. Raters reported high confidence even when statistical agreement was low, revealing a gap between perceived certainty and interpretive alignment. The analysis indicates that assessment structure and interactional demands shape both what students display and how raters apply criteria, making variability a feature of professional judgment rather than merely error. Implications include the use of calibrated exemplars, targeted calibration activities, and collaborative scoring practices to enhance reliability without sacrificing the diagnostic value of oral assessment in competency-based systems.
Abstractor: As Provided
Entry Date: 2026
Accession Number: EJ1495825
Database: ERIC
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://eric.ed.gov/contentdelivery/servlet/ERICServlet?accno=EJ1495825
    Name: ERIC Full Text
    Category: fullText
    Text: Full Text from ERIC
Header DbId: eric
DbLabel: ERIC
An: EJ1495825
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: How Raters Differ: A Study of Structured Oral Mathematics Assessment
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Samuel+Sollerman%22">Samuel Sollerman</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-9676-9521">0000-0002-9676-9521</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Practical+Assessment%2C+Research+%26+Evaluation%22"><i>Practical Assessment, Research & Evaluation</i></searchLink>. 2026 31(1).
– Name: Avail
  Label: Availability
  Group: Avail
  Data: University of Massachusetts Amherst Libraries. 154 Hicks Way, Amherst, MA 01003. e-mail: pare@umass.edu; Web site: https://openpublishing.library.umass.edu/pare/
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 16
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2026
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Audience
  Label: Education Level
  Group: Audnce
  Data: <searchLink fieldCode="EL" term="%22Secondary+Education%22">Secondary Education</searchLink>
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Mathematics+Achievement%22">Mathematics Achievement</searchLink><br /><searchLink fieldCode="DE" term="%22Student+Evaluation%22">Student Evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22Verbal+Tests%22">Verbal Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Mathematics+Tests%22">Mathematics Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Evaluation+Methods%22">Evaluation Methods</searchLink><br /><searchLink fieldCode="DE" term="%22Experienced+Teachers%22">Experienced Teachers</searchLink><br /><searchLink fieldCode="DE" term="%22Test+Format%22">Test Format</searchLink><br /><searchLink fieldCode="DE" term="%22Secondary+School+Mathematics%22">Secondary School Mathematics</searchLink><br /><searchLink fieldCode="DE" term="%22Scoring+Rubrics%22">Scoring Rubrics</searchLink><br /><searchLink fieldCode="DE" term="%22Secondary+School+Students%22">Secondary School Students</searchLink><br /><searchLink fieldCode="DE" term="%22Scoring%22">Scoring</searchLink><br /><searchLink fieldCode="DE" term="%22National+Competency+Tests%22">National Competency Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Performance+Based+Assessment%22">Performance Based Assessment</searchLink>
– Name: Subject
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Sweden%22">Sweden</searchLink>
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 1531-7714
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: This study examines the nature and extent of interpretive variability in structured oral mathematics assessments. Using Swedish national test data from 74 students across three oral formats, six experienced teachers independently rated reasoning, communication, and method using shared rubrics. Multiple reliability indicators and Svensson's method were employed to distinguish systematic and unsystematic interpretive variation. Exact agreement was low across formats, with higher but still modest adjacent agreement. Relative Position effects were frequent, indicating systematic differences in rater thresholds. In contrast, the most dialogic format showed greater Relative Rank Variance, suggesting more random inconsistency. Raters reported high confidence even when statistical agreement was low, revealing a gap between perceived certainty and interpretive alignment. The analysis indicates that assessment structure and interactional demands shape both what students display and how raters apply criteria, making variability a feature of professional judgment rather than merely error. Implications include the use of calibrated exemplars, targeted calibration activities, and collaborative scoring practices to enhance reliability without sacrificing the diagnostic value of oral assessment in competency-based systems.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2026
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1495825
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1495825
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 16
    Subjects:
      – SubjectFull: Mathematics Achievement
        Type: general
      – SubjectFull: Student Evaluation
        Type: general
      – SubjectFull: Foreign Countries
        Type: general
      – SubjectFull: Verbal Tests
        Type: general
      – SubjectFull: Mathematics Tests
        Type: general
      – SubjectFull: Evaluation Methods
        Type: general
      – SubjectFull: Experienced Teachers
        Type: general
      – SubjectFull: Test Format
        Type: general
      – SubjectFull: Secondary School Mathematics
        Type: general
      – SubjectFull: Scoring Rubrics
        Type: general
      – SubjectFull: Secondary School Students
        Type: general
      – SubjectFull: Scoring
        Type: general
      – SubjectFull: National Competency Tests
        Type: general
      – SubjectFull: Performance Based Assessment
        Type: general
      – SubjectFull: Sweden
        Type: general
    Titles:
      – TitleFull: How Raters Differ: A Study of Structured Oral Mathematics Assessment
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Samuel Sollerman
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-electronic
              Value: 1531-7714
          Numbering:
            – Type: volume
              Value: 31
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Practical Assessment, Research & Evaluation
              Type: main
ResultId 1