Generating In-Context, Personalized Feedback for Intelligent Tutors with Large Language Models
Saved in:
| Title: | Generating In-Context, Personalized Feedback for Intelligent Tutors with Large Language Models |
|---|---|
| Language: | English |
| Authors: | Jennifer M. Reddig, Arav Arora, Christopher J. MacLellan |
| Source: | International Journal of Artificial Intelligence in Education. 2025 35(6):3459-3500. |
| Availability: | Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ |
| Peer Reviewed: | Y |
| Page Count: | 42 |
| Publication Date: | 2025 |
| Sponsoring Agency: | National Science Foundation (NSF) |
| Contract Number: | 2112532 |
| Document Type: | Journal Articles Reports - Research |
| Education Level: | Higher Education Postsecondary Education |
| Descriptors: | Intelligent Tutoring Systems, Artificial Intelligence, Feedback (Response), Error Correction, Accuracy, Evaluation, College Mathematics, Algebra, Models |
| DOI: | 10.1007/s40593-025-00505-6 |
| ISSN: | 1560-4292 1560-4306 |
| Abstract: | This study explores how large language models (LLMs), specifically GPT-4, could be used to generate personalized feedback within an Intelligent Tutoring System (ITS). The research focuses on evaluating the model's ability to (1) diagnose student errors, (2) generate personalized corrective feedback, and (3) assess the accuracy of diagnoses and helpfulness of the feedback. We analyze student errors from the Apprentice Tutor College Algebra ITS and prompt GPT-4 to give targeted feedback on those errors. The findings suggest that while this model can effectively diagnose a range of student errors, its feedback varies in effectiveness based on the complexity of the problem and the type of error. While GPT-4 generates relevant, specific feedback a majority of the time, 35% of the hints were too general, incorrect, or give away the correct answer. The study also explores methods for using an LLM to automatically evaluate the validity of generated feedback, and finds that only 35% of feedback passes automated helpfulness evaluations. |
| Abstractor: | As Provided |
| Entry Date: | 2026 |
| Accession Number: | EJ1500144 |
| Database: | ERIC |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: eric DbLabel: ERIC An: EJ1500144 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Generating In-Context, Personalized Feedback for Intelligent Tutors with Large Language Models – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Jennifer+M%2E+Reddig%22">Jennifer M. Reddig</searchLink><br /><searchLink fieldCode="AR" term="%22Arav+Arora%22">Arav Arora</searchLink><br /><searchLink fieldCode="AR" term="%22Christopher+J%2E+MacLellan%22">Christopher J. MacLellan</searchLink> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22International+Journal+of+Artificial+Intelligence+in+Education%22"><i>International Journal of Artificial Intelligence in Education</i></searchLink>. 2025 35(6):3459-3500. – Name: Avail Label: Availability Group: Avail Data: Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/ – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 42 – Name: DatePubCY Label: Publication Date Group: Date Data: 2025 – Name: SourceSuprt Label: Sponsoring Agency Group: SrcSuprt Data: National Science Foundation (NSF) – Name: NumberContract Label: Contract Number Group: NumCntrct Data: 2112532 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Audience Label: Education Level Group: Audnce Data: <searchLink fieldCode="EL" term="%22Higher+Education%22">Higher Education</searchLink><br /><searchLink fieldCode="EL" term="%22Postsecondary+Education%22">Postsecondary Education</searchLink> – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Intelligent+Tutoring+Systems%22">Intelligent Tutoring Systems</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Feedback+%28Response%29%22">Feedback (Response)</searchLink><br /><searchLink fieldCode="DE" term="%22Error+Correction%22">Error Correction</searchLink><br /><searchLink fieldCode="DE" term="%22Accuracy%22">Accuracy</searchLink><br /><searchLink fieldCode="DE" term="%22Evaluation%22">Evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22College+Mathematics%22">College Mathematics</searchLink><br /><searchLink fieldCode="DE" term="%22Algebra%22">Algebra</searchLink><br /><searchLink fieldCode="DE" term="%22Models%22">Models</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1007/s40593-025-00505-6 – Name: ISSN Label: ISSN Group: ISSN Data: 1560-4292<br />1560-4306 – Name: Abstract Label: Abstract Group: Ab Data: This study explores how large language models (LLMs), specifically GPT-4, could be used to generate personalized feedback within an Intelligent Tutoring System (ITS). The research focuses on evaluating the model's ability to (1) diagnose student errors, (2) generate personalized corrective feedback, and (3) assess the accuracy of diagnoses and helpfulness of the feedback. We analyze student errors from the Apprentice Tutor College Algebra ITS and prompt GPT-4 to give targeted feedback on those errors. The findings suggest that while this model can effectively diagnose a range of student errors, its feedback varies in effectiveness based on the complexity of the problem and the type of error. While GPT-4 generates relevant, specific feedback a majority of the time, 35% of the hints were too general, incorrect, or give away the correct answer. The study also explores methods for using an LLM to automatically evaluate the validity of generated feedback, and finds that only 35% of feedback passes automated helpfulness evaluations. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2026 – Name: AN Label: Accession Number Group: ID Data: EJ1500144 |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1500144 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s40593-025-00505-6 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 42 StartPage: 3459 Subjects: – SubjectFull: Intelligent Tutoring Systems Type: general – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Feedback (Response) Type: general – SubjectFull: Error Correction Type: general – SubjectFull: Accuracy Type: general – SubjectFull: Evaluation Type: general – SubjectFull: College Mathematics Type: general – SubjectFull: Algebra Type: general – SubjectFull: Models Type: general Titles: – TitleFull: Generating In-Context, Personalized Feedback for Intelligent Tutors with Large Language Models Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Jennifer M. Reddig – PersonEntity: Name: NameFull: Arav Arora – PersonEntity: Name: NameFull: Christopher J. MacLellan IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 1560-4292 – Type: issn-electronic Value: 1560-4306 Numbering: – Type: volume Value: 35 – Type: issue Value: 6 Titles: – TitleFull: International Journal of Artificial Intelligence in Education Type: main |
| ResultId | 1 |