AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity
Saved in:
| Title: | AI-Generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity |
|---|---|
| Language: | English |
| Authors: | Yang Zhong (ORCID |
| Source: | Educational Measurement: Issues and Practice. 2026 45(1). |
| Availability: | Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us |
| Peer Reviewed: | Y |
| Page Count: | 16 |
| Publication Date: | 2026 |
| Document Type: | Journal Articles Reports - Research |
| Descriptors: | Artificial Intelligence, Essays, Writing (Composition), Automation, Scoring, Integrity |
| DOI: | 10.1111/emip.70013 |
| ISSN: | 0731-1745 1745-3992 |
| Abstract: | The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice. |
| Abstractor: | As Provided |
| Entry Date: | 2026 |
| Accession Number: | EJ1498452 |
| Database: | ERIC |
| Abstract: | The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice. |
|---|---|
| ISSN: | 0731-1745 1745-3992 |
| DOI: | 10.1111/emip.70013 |