AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment
Saved in:
| Title: | AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment |
|---|---|
| Language: | English |
| Authors: | Ricardo Primi (ORCID |
| Source: | Journal of Creative Behavior. 2026 60(1). |
| Availability: | Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us |
| Peer Reviewed: | Y |
| Page Count: | 12 |
| Publication Date: | 2026 |
| Document Type: | Journal Articles Reports - Research |
| Education Level: | Secondary Education |
| Descriptors: | Artificial Intelligence, Computer Assisted Testing, Scoring, Creativity Tests, Creative Thinking, Natural Language Processing, Automation, Semantics, Prompting, Inferences, Psychometrics, Item Response Theory, Achievement Tests, Foreign Countries, International Assessment, Secondary School Students |
| Assessment and Survey Identifiers: | Program for International Student Assessment |
| DOI: | 10.1002/jocb.70082 |
| ISSN: | 0022-0175 2162-6057 |
| Abstract: | The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally. |
| Abstractor: | As Provided |
| Entry Date: | 2026 |
| Accession Number: | EJ1500530 |
| Database: | ERIC |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: eric DbLabel: ERIC An: EJ1500530 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Ricardo+Primi%22">Ricardo Primi</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-4227-6745">0000-0003-4227-6745</externalLink>)<br /><searchLink fieldCode="AR" term="%22Roger+E%2E+Beaty%22">Roger E. Beaty</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-6114-5973">0000-0001-6114-5973</externalLink>)<br /><searchLink fieldCode="AR" term="%22Mathias+Benedek%22">Mathias Benedek</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-6258-4476">0000-0001-6258-4476</externalLink>)<br /><searchLink fieldCode="AR" term="%22Denis+Dumas%22">Denis Dumas</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-8446-4720">0000-0002-8446-4720</externalLink>)<br /><searchLink fieldCode="AR" term="%22Peter+Organisciak%22">Peter Organisciak</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-9058-2280">0000-0002-9058-2280</externalLink>)<br /><searchLink fieldCode="AR" term="%22John+D%2E+Patterson%22">John D. Patterson</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-7455-3535">0000-0002-7455-3535</externalLink>)<br /><searchLink fieldCode="AR" term="%22Tiago+Calico%22">Tiago Calico</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-3080-343X">0000-0003-3080-343X</externalLink>)<br /><searchLink fieldCode="AR" term="%22Mario+Piacentini%22">Mario Piacentini</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-8624-2833">0000-0001-8624-2833</externalLink>) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Journal+of+Creative+Behavior%22"><i>Journal of Creative Behavior</i></searchLink>. 2026 60(1). – Name: Avail Label: Availability Group: Avail Data: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 12 – Name: DatePubCY Label: Publication Date Group: Date Data: 2026 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Audience Label: Education Level Group: Audnce Data: <searchLink fieldCode="EL" term="%22Secondary+Education%22">Secondary Education</searchLink> – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Assisted+Testing%22">Computer Assisted Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Scoring%22">Scoring</searchLink><br /><searchLink fieldCode="DE" term="%22Creativity+Tests%22">Creativity Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Creative+Thinking%22">Creative Thinking</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+Language+Processing%22">Natural Language Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Automation%22">Automation</searchLink><br /><searchLink fieldCode="DE" term="%22Semantics%22">Semantics</searchLink><br /><searchLink fieldCode="DE" term="%22Prompting%22">Prompting</searchLink><br /><searchLink fieldCode="DE" term="%22Inferences%22">Inferences</searchLink><br /><searchLink fieldCode="DE" term="%22Psychometrics%22">Psychometrics</searchLink><br /><searchLink fieldCode="DE" term="%22Item+Response+Theory%22">Item Response Theory</searchLink><br /><searchLink fieldCode="DE" term="%22Achievement+Tests%22">Achievement Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22International+Assessment%22">International Assessment</searchLink><br /><searchLink fieldCode="DE" term="%22Secondary+School+Students%22">Secondary School Students</searchLink> – Name: SubjectThesaurus Label: Assessment and Survey Identifiers Group: Su Data: <searchLink fieldCode="SU" term="%22Program+for+International+Student+Assessment%22">Program for International Student Assessment</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1002/jocb.70082 – Name: ISSN Label: ISSN Group: ISSN Data: 0022-0175<br />2162-6057 – Name: Abstract Label: Abstract Group: Ab Data: The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2026 – Name: AN Label: Accession Number Group: ID Data: EJ1500530 |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1500530 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1002/jocb.70082 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 12 Subjects: – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Computer Assisted Testing Type: general – SubjectFull: Scoring Type: general – SubjectFull: Creativity Tests Type: general – SubjectFull: Creative Thinking Type: general – SubjectFull: Natural Language Processing Type: general – SubjectFull: Automation Type: general – SubjectFull: Semantics Type: general – SubjectFull: Prompting Type: general – SubjectFull: Inferences Type: general – SubjectFull: Psychometrics Type: general – SubjectFull: Item Response Theory Type: general – SubjectFull: Achievement Tests Type: general – SubjectFull: Foreign Countries Type: general – SubjectFull: International Assessment Type: general – SubjectFull: Secondary School Students Type: general – SubjectFull: Program for International Student Assessment Type: general Titles: – TitleFull: AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Ricardo Primi – PersonEntity: Name: NameFull: Roger E. Beaty – PersonEntity: Name: NameFull: Mathias Benedek – PersonEntity: Name: NameFull: Denis Dumas – PersonEntity: Name: NameFull: Peter Organisciak – PersonEntity: Name: NameFull: John D. Patterson – PersonEntity: Name: NameFull: Tiago Calico – PersonEntity: Name: NameFull: Mario Piacentini IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 03 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 0022-0175 – Type: issn-electronic Value: 2162-6057 Numbering: – Type: volume Value: 60 – Type: issue Value: 1 Titles: – TitleFull: Journal of Creative Behavior Type: main |
| ResultId | 1 |