View in EDS

Statistical and Qualitative Analysis of ChatGPT and Human Raters in Preservice Teachers' Writing Assessment

Saved in:

Bibliographic Details
Title:	Statistical and Qualitative Analysis of ChatGPT and Human Raters in Preservice Teachers' Writing Assessment
Language:	English
Authors:	Bahadir Gülden (ORCID 0000-0003-1917-8813), Huzeyfe Bilge (ORCID 0000-0001-7664-488X), Pinar Kanik Uysal (ORCID 0000-0003-1208-9535)
Source:	International Journal of Assessment Tools in Education. 2026 13(1):248-269.
Availability:	International Journal of Assessment Tools in Education. Pamukkale University, Faculty of Education, Kinikli Campus, Denizli 20070, Turkey. e-mail: ijate.editor@gmail.com; Web site: https://dergipark.org.tr/en/pub/ijate
Peer Reviewed:	Y
Page Count:	22
Publication Date:	2026
Document Type:	Journal Articles Reports - Research
Education Level:	Higher Education Postsecondary Education
Descriptors:	Preservice Teachers, Writing Evaluation, Artificial Intelligence, Evaluation Methods, Feedback (Response), Technology Uses in Education, Foreign Countries, Undergraduate Students, Writing Skills, Reliability, Scores, Barriers, Expertise, Turkish, Language Teachers, Scoring, Writing Assignments
Geographic Terms:	Turkey
ISSN:	2148-7456
Abstract:	Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students' assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts' scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in 'explanatory feedback', 'interpretation' and 'experience', while ChatGPT excelled in 'automation and continuity' and 'data processing capacity'. Experts' weaknesses included 'limited time and energy' and 'comparison bias', while ChatGPT's weaknesses were 'ambiguous expressions' and 'repetition'. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.
Abstractor:	As Provided
Entry Date:	2026
Accession Number:	EJ1495754
Database:	ERIC

Full Text from ERIC

Description
Abstract:	Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students' assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts' scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in 'explanatory feedback', 'interpretation' and 'experience', while ChatGPT excelled in 'automation and continuity' and 'data processing capacity'. Experts' weaknesses included 'limited time and energy' and 'comparison bias', while ChatGPT's weaknesses were 'ambiguous expressions' and 'repetition'. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.
ISSN:	2148-7456