Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring
Saved in:
| Title: | Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring |
|---|---|
| Language: | English |
| Authors: | Ekaterina Voskoboinik (ORCID |
| Source: | Language Testing. 2025 42(4):508-538. |
| Availability: | SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com |
| Peer Reviewed: | Y |
| Page Count: | 31 |
| Publication Date: | 2025 |
| Document Type: | Journal Articles Reports - Research |
| Descriptors: | Second Languages, Language Tests, Speech Tests, Finno Ugric Languages, Swedish, Artificial Intelligence, Automation, Uncommonly Taught Languages, Language Proficiency, Scoring, Transcripts (Written Records), Foreign Countries |
| Geographic Terms: | Finland, Sweden |
| DOI: | 10.1177/02655322251351648 |
| ISSN: | 0265-5322 1477-0946 |
| Abstract: | Automated speaking assessment (ASA) of second language proficiency benefits both learners and educators. However, developing these systems for less commonly taught languages like Finnish and Finland Swedish is hindered by the need for large datasets with equal representation of all proficiency levels. Traditional machine learning algorithms used in ASA are data-driven and consequently struggle to generalize to underrepresented proficiency levels. This study leverages large language models (LLMs) to enhance scoring performance in underrepresented proficiency levels through two approaches: augmenting the learner's corpus with LLM-generated transcripts (simulating data) and applying LLMs to score the transcripts of learners' responses directly. Our findings show that both solutions are comparable to or better than a traditional machine learning model trained on the original data for proficiency levels with fewer examples. Additionally, we found that providing LLMs with examples of human grading at various proficiency levels significantly enhances their performance as graders, especially when compared to using a single demonstration or none at all. Finally, our study confirms that using automatic speech recognition transcripts instead of human transcripts does not compromise assessment quality, enabling the development of LLM-based systems that can generate proficiency ratings directly from audio input. |
| Abstractor: | As Provided |
| Entry Date: | 2025 |
| Accession Number: | EJ1486523 |
| Database: | ERIC |
| Abstract: | Automated speaking assessment (ASA) of second language proficiency benefits both learners and educators. However, developing these systems for less commonly taught languages like Finnish and Finland Swedish is hindered by the need for large datasets with equal representation of all proficiency levels. Traditional machine learning algorithms used in ASA are data-driven and consequently struggle to generalize to underrepresented proficiency levels. This study leverages large language models (LLMs) to enhance scoring performance in underrepresented proficiency levels through two approaches: augmenting the learner's corpus with LLM-generated transcripts (simulating data) and applying LLMs to score the transcripts of learners' responses directly. Our findings show that both solutions are comparable to or better than a traditional machine learning model trained on the original data for proficiency levels with fewer examples. Additionally, we found that providing LLMs with examples of human grading at various proficiency levels significantly enhances their performance as graders, especially when compared to using a single demonstration or none at all. Finally, our study confirms that using automatic speech recognition transcripts instead of human transcripts does not compromise assessment quality, enabling the development of LLM-based systems that can generate proficiency ratings directly from audio input. |
|---|---|
| ISSN: | 0265-5322 1477-0946 |
| DOI: | 10.1177/02655322251351648 |