View in EDS

AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity

Saved in:

Bibliographic Details
Title:	AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity
Language:	English
Authors:	Yang Zhong (ORCID 0009-0003-1982-4667), Jiangang Hao (ORCID 0000-0003-0502-7571), Michael Fauss, Chen Li, Yuan Wang
Source:	Educational Measurement: Issues and Practice. 2026 45(1).
Availability:	Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:	Y
Page Count:	16
Publication Date:	2026
Document Type:	Journal Articles Reports - Research
Descriptors:	Artificial Intelligence, Essays, Writing (Composition), Automation, Scoring, Integrity
DOI:	10.1111/emip.70013
ISSN:	0731-1745 1745-3992
Abstract:	The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.
Abstractor:	As Provided
Entry Date:	2026
Accession Number:	EJ1498452
Database:	ERIC

Description
Abstract:	The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.
ISSN:	0731-1745 1745-3992
DOI:	10.1111/emip.70013