NLP‐enabled automated assessment of scientific explanations: Towards eliminating linguistic discrimination.

Saved in:
Bibliographic Details
Title: NLP‐enabled automated assessment of scientific explanations: Towards eliminating linguistic discrimination.
Authors: Kim, ChanMin1 (AUTHOR) cmk604@psu.edu, Passonneau, Rebecca J.2 (AUTHOR), Lee, Eunseo3 (AUTHOR), Sheikhi Karizaki, Mahsa2 (AUTHOR), Gnesdilow, Dana4 (AUTHOR), Puntambekar, Sadhana4,5 (AUTHOR)
Source: British Journal of Educational Technology. Jan2026, Vol. 57 Issue 1, p79-111. 33p.
Subject Terms: *Algorithmic bias, *Language ability testing, *Scientific communication, *Middle school education, *Educational evaluation, *Artificial intelligence, *Computational linguistics, Discriminatory language
Abstract: As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and provide feedback on middle school science writing without linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assessment of scientific essays based on writing features that are not considered normative such as subject‐verb disagreement. Such unfair assessment is especially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stating relationships among such science concepts as potential energy, kinetic energy and law of conservation of energy. Initial and revised versions of scientific essays written by 307 eighth‐grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not penalize student essays that contained non‐normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non‐normative writing features. Findings and implications are discussed. Practitioner notesWhat is already known about this topic Advancement in AI has created a variety of opportunities in education, including automated assessment, but AI is not bias‐free.Automated writing assessment designed to improve students' scientific explanations has been studied.While limited, some studies reported biased performance of automated writing assessment tools, but without looking into actual linguistic features about which the tools may have discriminated.What this paper adds This study conducted an actual examination of non‐normative linguistic features in essays written by middle school students to uncover how our NLP tool called PyrEval worked to assess them.PyrEval did not penalize essays containing non‐normative linguistic features.Regardless of non‐normative linguistic features, students' essay quality scores significantly improved from initial to revised essays after receiving feedback from PyrEval. Essay quality improvement was observed regardless of students' prior knowledge, school district and teacher variables.Implications for practice and/or policy This paper inspires practitioners to attend to linguistic discrimination (re)produced by AI.This paper offers possibilities of using PyrEval as a reflection tool, to which human assessors compare their assessment and discover implicit bias against non‐normative linguistic features.PyrEval is available for use on github.com/psunlpgroup/PyrEvalv2. [ABSTRACT FROM AUTHOR]
Copyright of British Journal of Educational Technology is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
Full text is not displayed to guests.
Description
Abstract:As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and provide feedback on middle school science writing without linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assessment of scientific essays based on writing features that are not considered normative such as subject‐verb disagreement. Such unfair assessment is especially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stating relationships among such science concepts as potential energy, kinetic energy and law of conservation of energy. Initial and revised versions of scientific essays written by 307 eighth‐grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not penalize student essays that contained non‐normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non‐normative writing features. Findings and implications are discussed. Practitioner notesWhat is already known about this topic Advancement in AI has created a variety of opportunities in education, including automated assessment, but AI is not bias‐free.Automated writing assessment designed to improve students' scientific explanations has been studied.While limited, some studies reported biased performance of automated writing assessment tools, but without looking into actual linguistic features about which the tools may have discriminated.What this paper adds This study conducted an actual examination of non‐normative linguistic features in essays written by middle school students to uncover how our NLP tool called PyrEval worked to assess them.PyrEval did not penalize essays containing non‐normative linguistic features.Regardless of non‐normative linguistic features, students' essay quality scores significantly improved from initial to revised essays after receiving feedback from PyrEval. Essay quality improvement was observed regardless of students' prior knowledge, school district and teacher variables.Implications for practice and/or policy This paper inspires practitioners to attend to linguistic discrimination (re)produced by AI.This paper offers possibilities of using PyrEval as a reflection tool, to which human assessors compare their assessment and discover implicit bias against non‐normative linguistic features.PyrEval is available for use on github.com/psunlpgroup/PyrEvalv2. [ABSTRACT FROM AUTHOR]
ISSN:00071013
DOI:10.1111/bjet.13596