View in EDS HTML Full Text PDF Full Text

Human vs. Machine Marking: A Comparative Study of Chemistry Assessments.

Saved in:

Bibliographic Details
Title:	Human vs. Machine Marking: A Comparative Study of Chemistry Assessments.
Authors:	Ade-Ibijola, Abejide¹ (AUTHOR) abejide@jbs.ac.za, Chikezie, Ijeoma Joy² (AUTHOR) drijeomajchikezie@gmail.com, Oyelere, Solomon Sunday^1,3 (AUTHOR) s.oyelere@exeter.ac.uk
Source:	Journal of Science Education & Technology. Dec2025, Vol. 34 Issue 6, p1430-1440. 11p.
Subject Terms:	Artificial intelligence, Comparative studies, Evaluation methodology, Students, *Educational evaluation, Chemical testing
Geographic Terms:	Nigeria
Abstract:	Artificial intelligence (AI) has transformed educational assessment with automated marking, enhancing efficiency, objectivity, immediate feedback, and identifying students' response patterns. This paper explored the comparative analysis of human expert marking and machine marking in a chemistry class. The study used a comparative research design. The participants comprised 30 Senior Secondary Two (SS2) students and two chemistry experts from the National Institute for Nigerian Languages (NDSS), Abia State, Nigeria, randomly drawn from 98 students offering chemistry. A set of three chemistry short answer questions (SAQs) adopted from NECOSSCE past examination papers was used for data collection. Responses from students were marked by two human chemistry experts and ChatGPT using the marking guide. Pearson product moment correlation (PPMC) was employed to evaluate the relationship between the scores assigned by human experts and those assigned by ChatGPT. The results revealed a substantial correlation between the two human experts (r = 0.75), while the correlations between the human experts and ChatGPT were lower (r = 0.56 and 0.57, respectively). Admittedly, most differences in scores between human experts and ChatGPT were within one point, although larger discrepancies occurred less frequently. Item-by-item analyses of the scores indicated that ChatGPT's scores were within an acceptable range of human expert scores, although ChatGPT's marking exhibited some inconsistencies, particularly in assessing more complex SAQs. The study suggests, among others, that combining human and machine marking is highly recommended to enhance assessment practices in secondary school chemistry, leveraging the strengths of both methods. [ABSTRACT FROM AUTHOR]
	Copyright of Journal of Science Education & Technology is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Education Research Complete
Full text is not displayed to guests. Login for full access.

Description
Abstract:	Artificial intelligence (AI) has transformed educational assessment with automated marking, enhancing efficiency, objectivity, immediate feedback, and identifying students' response patterns. This paper explored the comparative analysis of human expert marking and machine marking in a chemistry class. The study used a comparative research design. The participants comprised 30 Senior Secondary Two (SS2) students and two chemistry experts from the National Institute for Nigerian Languages (NDSS), Abia State, Nigeria, randomly drawn from 98 students offering chemistry. A set of three chemistry short answer questions (SAQs) adopted from NECOSSCE past examination papers was used for data collection. Responses from students were marked by two human chemistry experts and ChatGPT using the marking guide. Pearson product moment correlation (PPMC) was employed to evaluate the relationship between the scores assigned by human experts and those assigned by ChatGPT. The results revealed a substantial correlation between the two human experts (r = 0.75), while the correlations between the human experts and ChatGPT were lower (r = 0.56 and 0.57, respectively). Admittedly, most differences in scores between human experts and ChatGPT were within one point, although larger discrepancies occurred less frequently. Item-by-item analyses of the scores indicated that ChatGPT's scores were within an acceptable range of human expert scores, although ChatGPT's marking exhibited some inconsistencies, particularly in assessing more complex SAQs. The study suggests, among others, that combining human and machine marking is highly recommended to enhance assessment practices in secondary school chemistry, leveraging the strengths of both methods. [ABSTRACT FROM AUTHOR]
ISSN:	10590145
DOI:	10.1007/s10956-025-10223-2