The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data

Saved in:
Bibliographic Details
Title: The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data
Language: English
Authors: Yang Shi (ORCID 0000-0001-6486-4340), Robin Schmucker, Keith Tran, John Bacher, Kenneth Koedinger, Thomas Price (ORCID 0000-0001-9375-2292), Min Chi, Tiffany Barnes
Source: Journal of Educational Data Mining. 2024 16(1):1-33.
Availability: International Educational Data Mining. e-mail: jedm.editor@gmail.com; Web site: https://jedm.educationaldatamining.org/index.php/JEDM
Peer Reviewed: Y
Page Count: 33
Publication Date: 2024
Sponsoring Agency: National Science Foundation (NSF)
Contract Number: 2013502
2112635
Document Type: Journal Articles
Reports - Research
Education Level: Higher Education
Postsecondary Education
Descriptors: Programming Languages, Undergraduate Students, Learning Processes, Teaching Models, Information Transfer, Data Collection, Data Use, Program Design, Cognitive Structures, Cognitive Processes
Geographic Terms: Virginia
ISSN: 2157-2100
Abstract: Understanding students' learning of knowledge components (KCs) is an important educational data mining task and enables many educational applications. However, in the domain of computing education, where program exercises require students to practice many KCs simultaneously, it is a challenge to attribute their errors to specific KCs and, therefore, to model student knowledge of these KCs. In this paper, we define this task as the KC attribution problem. We first demonstrate a novel approach to addressing this task using deep neural networks and explore its performance in identifying expert-defined KCs (RQ1). Because the labeling process takes costly expert resources, we further evaluate the effectiveness of transfer learning for KC attribution, using more easily acquired labels, such as problem correctness (RQ2). Finally, because prior research indicates the incorporation of educational theory in deep learning models could potentially enhance model performance, we investigated how to incorporate learning curves in the model design and evaluated their performance (RQ3). Our results show that in a supervised learning scenario, we can use a deep learning model, code2vec, to attribute KCs with a relatively high performance (AUC > 75% in two of the three examined KCs). Further using transfer learning, we achieve reasonable performance on the task without any costly expert labeling. However, the incorporation of learning curves shows limited effectiveness in this task. Our research lays important groundwork for personalized feedback for students based on which KCs they applied correctly, as well as more interpretable and accurate student models.
Abstractor: As Provided
Entry Date: 2024
Accession Number: EJ1430503
Database: ERIC
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://eric.ed.gov/contentdelivery/servlet/ERICServlet?accno=EJ1430503
    Name: ERIC Full Text
    Category: fullText
    Text: Full Text from ERIC
Header DbId: eric
DbLabel: ERIC
An: EJ1430503
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Yang+Shi%22">Yang Shi</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-6486-4340">0000-0001-6486-4340</externalLink>)<br /><searchLink fieldCode="AR" term="%22Robin+Schmucker%22">Robin Schmucker</searchLink><br /><searchLink fieldCode="AR" term="%22Keith+Tran%22">Keith Tran</searchLink><br /><searchLink fieldCode="AR" term="%22John+Bacher%22">John Bacher</searchLink><br /><searchLink fieldCode="AR" term="%22Kenneth+Koedinger%22">Kenneth Koedinger</searchLink><br /><searchLink fieldCode="AR" term="%22Thomas+Price%22">Thomas Price</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-9375-2292">0000-0001-9375-2292</externalLink>)<br /><searchLink fieldCode="AR" term="%22Min+Chi%22">Min Chi</searchLink><br /><searchLink fieldCode="AR" term="%22Tiffany+Barnes%22">Tiffany Barnes</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="SO" term="%22Journal+of+Educational+Data+Mining%22"><i>Journal of Educational Data Mining</i></searchLink>. 2024 16(1):1-33.
– Name: Avail
  Label: Availability
  Group: Avail
  Data: International Educational Data Mining. e-mail: jedm.editor@gmail.com; Web site: https://jedm.educationaldatamining.org/index.php/JEDM
– Name: PeerReviewed
  Label: Peer Reviewed
  Group: SrcInfo
  Data: Y
– Name: Pages
  Label: Page Count
  Group: Src
  Data: 33
– Name: DatePubCY
  Label: Publication Date
  Group: Date
  Data: 2024
– Name: SourceSuprt
  Label: Sponsoring Agency
  Group: SrcSuprt
  Data: National Science Foundation (NSF)
– Name: NumberContract
  Label: Contract Number
  Group: NumCntrct
  Data: 2013502<br />2112635
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Journal Articles<br />Reports - Research
– Name: Audience
  Label: Education Level
  Group: Audnce
  Data: <searchLink fieldCode="EL" term="%22Higher+Education%22">Higher Education</searchLink><br /><searchLink fieldCode="EL" term="%22Postsecondary+Education%22">Postsecondary Education</searchLink>
– Name: Subject
  Label: Descriptors
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Programming+Languages%22">Programming Languages</searchLink><br /><searchLink fieldCode="DE" term="%22Undergraduate+Students%22">Undergraduate Students</searchLink><br /><searchLink fieldCode="DE" term="%22Learning+Processes%22">Learning Processes</searchLink><br /><searchLink fieldCode="DE" term="%22Teaching+Models%22">Teaching Models</searchLink><br /><searchLink fieldCode="DE" term="%22Information+Transfer%22">Information Transfer</searchLink><br /><searchLink fieldCode="DE" term="%22Data+Collection%22">Data Collection</searchLink><br /><searchLink fieldCode="DE" term="%22Data+Use%22">Data Use</searchLink><br /><searchLink fieldCode="DE" term="%22Program+Design%22">Program Design</searchLink><br /><searchLink fieldCode="DE" term="%22Cognitive+Structures%22">Cognitive Structures</searchLink><br /><searchLink fieldCode="DE" term="%22Cognitive+Processes%22">Cognitive Processes</searchLink>
– Name: Subject
  Label: Geographic Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Virginia%22">Virginia</searchLink>
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 2157-2100
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Understanding students' learning of knowledge components (KCs) is an important educational data mining task and enables many educational applications. However, in the domain of computing education, where program exercises require students to practice many KCs simultaneously, it is a challenge to attribute their errors to specific KCs and, therefore, to model student knowledge of these KCs. In this paper, we define this task as the KC attribution problem. We first demonstrate a novel approach to addressing this task using deep neural networks and explore its performance in identifying expert-defined KCs (RQ1). Because the labeling process takes costly expert resources, we further evaluate the effectiveness of transfer learning for KC attribution, using more easily acquired labels, such as problem correctness (RQ2). Finally, because prior research indicates the incorporation of educational theory in deep learning models could potentially enhance model performance, we investigated how to incorporate learning curves in the model design and evaluated their performance (RQ3). Our results show that in a supervised learning scenario, we can use a deep learning model, code2vec, to attribute KCs with a relatively high performance (AUC > 75% in two of the three examined KCs). Further using transfer learning, we achieve reasonable performance on the task without any costly expert labeling. However, the incorporation of learning curves shows limited effectiveness in this task. Our research lays important groundwork for personalized feedback for students based on which KCs they applied correctly, as well as more interpretable and accurate student models.
– Name: AbstractInfo
  Label: Abstractor
  Group: Ab
  Data: As Provided
– Name: DateEntry
  Label: Entry Date
  Group: Date
  Data: 2024
– Name: AN
  Label: Accession Number
  Group: ID
  Data: EJ1430503
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1430503
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 33
        StartPage: 1
    Subjects:
      – SubjectFull: Programming Languages
        Type: general
      – SubjectFull: Undergraduate Students
        Type: general
      – SubjectFull: Learning Processes
        Type: general
      – SubjectFull: Teaching Models
        Type: general
      – SubjectFull: Information Transfer
        Type: general
      – SubjectFull: Data Collection
        Type: general
      – SubjectFull: Data Use
        Type: general
      – SubjectFull: Program Design
        Type: general
      – SubjectFull: Cognitive Structures
        Type: general
      – SubjectFull: Cognitive Processes
        Type: general
      – SubjectFull: Virginia
        Type: general
    Titles:
      – TitleFull: The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Yang Shi
      – PersonEntity:
          Name:
            NameFull: Robin Schmucker
      – PersonEntity:
          Name:
            NameFull: Keith Tran
      – PersonEntity:
          Name:
            NameFull: John Bacher
      – PersonEntity:
          Name:
            NameFull: Kenneth Koedinger
      – PersonEntity:
          Name:
            NameFull: Thomas Price
      – PersonEntity:
          Name:
            NameFull: Min Chi
      – PersonEntity:
          Name:
            NameFull: Tiffany Barnes
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-electronic
              Value: 2157-2100
          Numbering:
            – Type: volume
              Value: 16
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Journal of Educational Data Mining
              Type: main
ResultId 1