GPT vs. Llama2: Which Comes Closer to Human Writing?
Saved in:
| Title: | GPT vs. Llama2: Which Comes Closer to Human Writing? |
|---|---|
| Language: | English |
| Authors: | Fernando Martinez, Gary M. Weiss, Miguel Palma, Haoran Xue, Alexander Borelli, Yijun Zhao |
| Source: | International Educational Data Mining Society. 2024. |
| Availability: | International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: https://educationaldatamining.org/conferences/ |
| Peer Reviewed: | Y |
| Page Count: | 10 |
| Publication Date: | 2024 |
| Document Type: | Speeches/Meeting Papers Reports - Research |
| Education Level: | Higher Education Postsecondary Education |
| Descriptors: | Artificial Intelligence, Technology Uses in Education, Higher Education, Natural Language Processing, Intelligent Tutoring Systems, Writing Evaluation, Accuracy, Vocabulary, Syntax, Authors, Language Usage |
| Abstract: | Large Language Models (LLMs) have prompted widespread application across diverse domains. In some applications, human-like quality in output is essential for optimal user experience and credibility. This is particularly evident in applications such as Chatbots. Conversely, concerns arise regarding LLM use in contexts where human authenticity is crucial, notably in higher education with materials like Letters of Recommendation (LOR) and Statements of Intent (SOI). Despite extensive research in this area, accurately distinguishing between human and LLM-generated content remains challenging. This study conducts a comparative analysis between two leading LLMs, GPT3.5 and Llama2-7B, evaluating their output's resemblance to human writing through vocabulary and structure analysis. Additionally, we apply classification models to detect human vs. LLM-generated content, with higher accuracy signaling deviations from human-like writing. Our findings suggest that both LLMs significantly deviate from human writing in terms of vocabulary and paragraph structure, with GPT-3.5 appearing closer to human. Furthermore, our classification models demonstrated near-perfect performance in identifying LORs and SOIs crafted by LLMs during our evaluation, and we have made these models accessible as online, open-access tools. However, it's important to acknowledge that these models are trained specifically for our tasks. Generalizing their application to other domains requires further research and validation. [For the complete proceedings, see ED675485.] |
| Abstractor: | As Provided |
| Entry Date: | 2025 |
| Accession Number: | ED675637 |
| Database: | ERIC |
Be the first to leave a comment!