Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset

Saved in:
Bibliographic Details
Title: Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset
Language: English
Authors: Owen Henkel (ORCID 0009-0001-8850-067X), Hannah Horne-Robinson, Maria Dyshel, Greg Thompson, Ralph Abboud (ORCID 0000-0002-2332-0504), Nabil Al Nahin Ch (ORCID 0000-0002-0202-1724), Baptiste Moreau-Pernet (ORCID 0009-0006-9424-455X), Kirk Vanacore (ORCID 0000-0003-0673-5721)
Source: Journal of Learning Analytics. 2025 12(1):50-64.
Availability: Society for Learning Analytics Research. 121 Pointe Marsan, Beaumont, AB T4X 0A2, Canada. Tel: +61-429-920-838; e-mail: info@solaresearch.org; Web site: https://learning-analytics.info/index.php/JLA/index
Peer Reviewed: Y
Page Count: 15
Publication Date: 2025
Document Type: Journal Articles
Reports - Research
Education Level: Junior High Schools
Middle Schools
Secondary Education
High Schools
Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students, High School Students, Computational Linguistics, Grading, Test Items, Mathematics Tests, Artificial Intelligence, Computer Software, Bayesian Statistics, Classification, Foreign Countries, Cues
Geographic Terms: Nigeria, South Africa, Ghana, Africa
ISSN: 1929-7750
Abstract: This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging student answers. In experiment 1, we use a variety of LLM-driven approaches, including zero-shot, fewshot, and chain-of-thought prompting, to grade the 1% of student answers that a rule-based classifier fails to grade accurately. We find that the best-performing approach -- chain-of-thought prompting -- accurately scored 97% of these edge cases, effectively boosting the overall accuracy of the grading from 96% to 97%. In experiment 2, we aim to better understand the consequential validity of the improved grading accuracy by passing grades generated by the best-performing LLM-based approach to a Bayesian Knowledge Tracing (BKT) model, which estimated student mastery of specific lessons. We find that modest improvements in model accuracy can lead to significant changes in mastery estimation. Where the rule-based classifier misclassified the mastery status of 6.9% of students across completed lessons, using the LLM chain-of-thought approach reduced this to 2.6%. These findings suggest that LLMs could be valuable for grading fill-in questions in mathematics education, potentially enabling wider adoption of open-response questions in learning systems.
Abstractor: As Provided
Entry Date: 2025
Accession Number: EJ1465703
Database: ERIC
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://eric.ed.gov/contentdelivery/servlet/ERICServlet?accno=EJ1465703
    Name: ERIC Full Text
    Category: fullText
    Text: Full Text from ERIC
Header DbId: eric
DbLabel: ERIC
An: EJ1465703
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Owen+Henkel%22">Owen Henkel</searchLink> (ORCID <externalLink term="https://orcid.org/0009-0001-8850-067X">0009-0001-8850-067X</externalLink>)<br /><searchLink fieldCode="AR" term="%22Hannah+Horne-Robinson%22">Hannah Horne-Robinson</searchLink><br /><searchLink fieldCode="AR" term="%22Maria+Dyshel%22">Maria Dyshel</searchLink><br /><searchLink fieldCode="AR" term="%22Greg+Thompson%22">Greg Thompson</searchLink><br /><searchLink fieldCode="AR" term="%22Ralph+Abboud%22">Ralph Abboud</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-2332-0504">0000-0002-2332-0504</externalLink>)<br /><searchLink fieldCode="AR" term="%22Nabil+Al+Nahin+Ch%22">Nabil Al Nahin Ch</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-0202-1724">0000-0002-0202-1724</externalLink>)<br /><searchLink fieldCode="AR" term="%22Baptiste+Moreau-Pernet%22">Baptiste Moreau-Pernet</searchLink> (ORCID <externalLink term="https://orcid.org/0009-0006-9424-455X">0009-0006-9424-455X</externalLink>)<br /><searchLink fieldCode="AR" term="%22Kirk+Vanacore%22">Kirk Vanacore</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-0673-5721">0000-0003-0673-5721</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Journal+of+Learning+Analytics%22"><i>Journal of Learning Analytics</i></searchLink>. 2025 12(1):50-64.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: Society for Learning Analytics Research. 121 Pointe Marsan, Beaumont, AB T4X 0A2, Canada. Tel: +61-429-920-838; e-mail: info@solaresearch.org; Web site: https://learning-analytics.info/index.php/JLA/index
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 15
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2025
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Audience
  Label: Education Level
  Group: Audnce
  Data: <searchLink fieldCode="EL" term="%22Junior+High+Schools%22">Junior High Schools</searchLink><br /><searchLink fieldCode="EL" term="%22Middle+Schools%22">Middle Schools</searchLink><br /><searchLink fieldCode="EL" term="%22Secondary+Education%22">Secondary Education</searchLink><br /><searchLink fieldCode="EL" term="%22High+Schools%22">High Schools</searchLink>
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Learning+Analytics%22">Learning Analytics</searchLink><br /><searchLink fieldCode="DE" term="%22Learning+Management+Systems%22">Learning Management Systems</searchLink><br /><searchLink fieldCode="DE" term="%22Mathematics+Instruction%22">Mathematics Instruction</searchLink><br /><searchLink fieldCode="DE" term="%22Middle+School+Students%22">Middle School Students</searchLink><br /><searchLink fieldCode="DE" term="%22High+School+Students%22">High School Students</searchLink><br /><searchLink fieldCode="DE" term="%22Computational+Linguistics%22">Computational Linguistics</searchLink><br /><searchLink fieldCode="DE" term="%22Grading%22">Grading</searchLink><br /><searchLink fieldCode="DE" term="%22Test+Items%22">Test Items</searchLink><br /><searchLink fieldCode="DE" term="%22Mathematics+Tests%22">Mathematics Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Software%22">Computer Software</searchLink><br /><searchLink fieldCode="DE" term="%22Bayesian+Statistics%22">Bayesian Statistics</searchLink><br /><searchLink fieldCode="DE" term="%22Classification%22">Classification</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22Cues%22">Cues</searchLink>
– Name: Subject
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Nigeria%22">Nigeria</searchLink><br /><searchLink fieldCode="DE" term="%22South+Africa%22">South Africa</searchLink><br /><searchLink fieldCode="DE" term="%22Ghana%22">Ghana</searchLink><br /><searchLink fieldCode="DE" term="%22Africa%22">Africa</searchLink>
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 1929-7750
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging student answers. In experiment 1, we use a variety of LLM-driven approaches, including zero-shot, fewshot, and chain-of-thought prompting, to grade the 1% of student answers that a rule-based classifier fails to grade accurately. We find that the best-performing approach -- chain-of-thought prompting -- accurately scored 97% of these edge cases, effectively boosting the overall accuracy of the grading from 96% to 97%. In experiment 2, we aim to better understand the consequential validity of the improved grading accuracy by passing grades generated by the best-performing LLM-based approach to a Bayesian Knowledge Tracing (BKT) model, which estimated student mastery of specific lessons. We find that modest improvements in model accuracy can lead to significant changes in mastery estimation. Where the rule-based classifier misclassified the mastery status of 6.9% of students across completed lessons, using the LLM chain-of-thought approach reduced this to 2.6%. These findings suggest that LLMs could be valuable for grading fill-in questions in mathematics education, potentially enabling wider adoption of open-response questions in learning systems.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2025
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1465703
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1465703
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 15
        StartPage: 50
    Subjects:
      – SubjectFull: Learning Analytics
        Type: general
      – SubjectFull: Learning Management Systems
        Type: general
      – SubjectFull: Mathematics Instruction
        Type: general
      – SubjectFull: Middle School Students
        Type: general
      – SubjectFull: High School Students
        Type: general
      – SubjectFull: Computational Linguistics
        Type: general
      – SubjectFull: Grading
        Type: general
      – SubjectFull: Test Items
        Type: general
      – SubjectFull: Mathematics Tests
        Type: general
      – SubjectFull: Artificial Intelligence
        Type: general
      – SubjectFull: Computer Software
        Type: general
      – SubjectFull: Bayesian Statistics
        Type: general
      – SubjectFull: Classification
        Type: general
      – SubjectFull: Foreign Countries
        Type: general
      – SubjectFull: Cues
        Type: general
      – SubjectFull: Nigeria
        Type: general
      – SubjectFull: South Africa
        Type: general
      – SubjectFull: Ghana
        Type: general
      – SubjectFull: Africa
        Type: general
    Titles:
      – TitleFull: Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Owen Henkel
      – PersonEntity:
          Name:
            NameFull: Hannah Horne-Robinson
      – PersonEntity:
          Name:
            NameFull: Maria Dyshel
      – PersonEntity:
          Name:
            NameFull: Greg Thompson
      – PersonEntity:
          Name:
            NameFull: Ralph Abboud
      – PersonEntity:
          Name:
            NameFull: Nabil Al Nahin Ch
      – PersonEntity:
          Name:
            NameFull: Baptiste Moreau-Pernet
      – PersonEntity:
          Name:
            NameFull: Kirk Vanacore
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-electronic
              Value: 1929-7750
          Numbering:
            – Type: volume
              Value: 12
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Journal of Learning Analytics
              Type: main
ResultId 1