Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring

Saved in:
Bibliographic Details
Title: Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring
Language: English
Authors: Ekaterina Voskoboinik (ORCID 0009-0007-2691-5793), Anna von Zansen (ORCID 0000-0002-6444-7667), Nhan Chi Phan (ORCID 0000-0003-2040-9834), Yaroslav Getman (ORCID 0000-0003-4680-8294), Tamás Grósz (ORCID 0000-0001-7918-9579), Mikko Kurimo (ORCID 0000-0001-5278-7974)
Source: Language Testing. 2025 42(4):508-538.
Availability: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
Peer Reviewed: Y
Page Count: 31
Publication Date: 2025
Document Type: Journal Articles
Reports - Research
Descriptors: Second Languages, Language Tests, Speech Tests, Finno Ugric Languages, Swedish, Artificial Intelligence, Automation, Uncommonly Taught Languages, Language Proficiency, Scoring, Transcripts (Written Records), Foreign Countries
Geographic Terms: Finland, Sweden
DOI: 10.1177/02655322251351648
ISSN: 0265-5322
1477-0946
Abstract: Automated speaking assessment (ASA) of second language proficiency benefits both learners and educators. However, developing these systems for less commonly taught languages like Finnish and Finland Swedish is hindered by the need for large datasets with equal representation of all proficiency levels. Traditional machine learning algorithms used in ASA are data-driven and consequently struggle to generalize to underrepresented proficiency levels. This study leverages large language models (LLMs) to enhance scoring performance in underrepresented proficiency levels through two approaches: augmenting the learner's corpus with LLM-generated transcripts (simulating data) and applying LLMs to score the transcripts of learners' responses directly. Our findings show that both solutions are comparable to or better than a traditional machine learning model trained on the original data for proficiency levels with fewer examples. Additionally, we found that providing LLMs with examples of human grading at various proficiency levels significantly enhances their performance as graders, especially when compared to using a single demonstration or none at all. Finally, our study confirms that using automatic speech recognition transcripts instead of human transcripts does not compromise assessment quality, enabling the development of LLM-based systems that can generate proficiency ratings directly from audio input.
Abstractor: As Provided
Entry Date: 2025
Accession Number: EJ1486523
Database: ERIC
FullText Text:
  Availability: 0
Header DbId: eric
DbLabel: ERIC
An: EJ1486523
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Ekaterina+Voskoboinik%22">Ekaterina Voskoboinik</searchLink> (ORCID <externalLink term="https://orcid.org/0009-0007-2691-5793">0009-0007-2691-5793</externalLink>)<br /><searchLink fieldCode="AR" term="%22Anna+von+Zansen%22">Anna von Zansen</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-6444-7667">0000-0002-6444-7667</externalLink>)<br /><searchLink fieldCode="AR" term="%22Nhan+Chi+Phan%22">Nhan Chi Phan</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-2040-9834">0000-0003-2040-9834</externalLink>)<br /><searchLink fieldCode="AR" term="%22Yaroslav+Getman%22">Yaroslav Getman</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-4680-8294">0000-0003-4680-8294</externalLink>)<br /><searchLink fieldCode="AR" term="%22Tamás+Grósz%22">Tamás Grósz</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-7918-9579">0000-0001-7918-9579</externalLink>)<br /><searchLink fieldCode="AR" term="%22Mikko+Kurimo%22">Mikko Kurimo</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-5278-7974">0000-0001-5278-7974</externalLink>)
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Language+Testing%22"><i>Language Testing</i></searchLink>. 2025 42(4):508-538.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 31
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2025
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Second+Languages%22">Second Languages</searchLink><br /><searchLink fieldCode="DE" term="%22Language+Tests%22">Language Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Speech+Tests%22">Speech Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Finno+Ugric+Languages%22">Finno Ugric Languages</searchLink><br /><searchLink fieldCode="DE" term="%22Swedish%22">Swedish</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Automation%22">Automation</searchLink><br /><searchLink fieldCode="DE" term="%22Uncommonly+Taught+Languages%22">Uncommonly Taught Languages</searchLink><br /><searchLink fieldCode="DE" term="%22Language+Proficiency%22">Language Proficiency</searchLink><br /><searchLink fieldCode="DE" term="%22Scoring%22">Scoring</searchLink><br /><searchLink fieldCode="DE" term="%22Transcripts+%28Written+Records%29%22">Transcripts (Written Records)</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink>
– Name: Subject
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Finland%22">Finland</searchLink><br /><searchLink fieldCode="DE" term="%22Sweden%22">Sweden</searchLink>
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1177/02655322251351648
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 0265-5322<br />1477-0946
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Automated speaking assessment (ASA) of second language proficiency benefits both learners and educators. However, developing these systems for less commonly taught languages like Finnish and Finland Swedish is hindered by the need for large datasets with equal representation of all proficiency levels. Traditional machine learning algorithms used in ASA are data-driven and consequently struggle to generalize to underrepresented proficiency levels. This study leverages large language models (LLMs) to enhance scoring performance in underrepresented proficiency levels through two approaches: augmenting the learner's corpus with LLM-generated transcripts (simulating data) and applying LLMs to score the transcripts of learners' responses directly. Our findings show that both solutions are comparable to or better than a traditional machine learning model trained on the original data for proficiency levels with fewer examples. Additionally, we found that providing LLMs with examples of human grading at various proficiency levels significantly enhances their performance as graders, especially when compared to using a single demonstration or none at all. Finally, our study confirms that using automatic speech recognition transcripts instead of human transcripts does not compromise assessment quality, enabling the development of LLM-based systems that can generate proficiency ratings directly from audio input.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2025
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1486523
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1486523
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1177/02655322251351648
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 31
        StartPage: 508
    Subjects:
      – SubjectFull: Second Languages
        Type: general
      – SubjectFull: Language Tests
        Type: general
      – SubjectFull: Speech Tests
        Type: general
      – SubjectFull: Finno Ugric Languages
        Type: general
      – SubjectFull: Swedish
        Type: general
      – SubjectFull: Artificial Intelligence
        Type: general
      – SubjectFull: Automation
        Type: general
      – SubjectFull: Uncommonly Taught Languages
        Type: general
      – SubjectFull: Language Proficiency
        Type: general
      – SubjectFull: Scoring
        Type: general
      – SubjectFull: Transcripts (Written Records)
        Type: general
      – SubjectFull: Foreign Countries
        Type: general
      – SubjectFull: Finland
        Type: general
      – SubjectFull: Sweden
        Type: general
    Titles:
      – TitleFull: Enhancing Second Language Speaking Assessment: Integrating Large Language Models for Finnish and Finland Swedish Proficiency Scoring
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Ekaterina Voskoboinik
      – PersonEntity:
          Name:
            NameFull: Anna von Zansen
      – PersonEntity:
          Name:
            NameFull: Nhan Chi Phan
      – PersonEntity:
          Name:
            NameFull: Yaroslav Getman
      – PersonEntity:
          Name:
            NameFull: Tamás Grósz
      – PersonEntity:
          Name:
            NameFull: Mikko Kurimo
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 10
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 0265-5322
            – Type: issn-electronic
              Value: 1477-0946
          Numbering:
            – Type: volume
              Value: 42
            – Type: issue
              Value: 4
          Titles:
            – TitleFull: Language Testing
              Type: main
ResultId 1