AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment

Saved in:
Bibliographic Details
Title: AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment
Language: English
Authors: Ricardo Primi (ORCID 0000-0003-4227-6745), Roger E. Beaty (ORCID 0000-0001-6114-5973), Mathias Benedek (ORCID 0000-0001-6258-4476), Denis Dumas (ORCID 0000-0002-8446-4720), Peter Organisciak (ORCID 0000-0002-9058-2280), John D. Patterson (ORCID 0000-0002-7455-3535), Tiago Calico (ORCID 0000-0003-3080-343X), Mario Piacentini (ORCID 0000-0001-8624-2833)
Source: Journal of Creative Behavior. 2026 60(1).
Availability: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed: Y
Page Count: 12
Publication Date: 2026
Document Type: Journal Articles
Reports - Research
Education Level: Secondary Education
Descriptors: Artificial Intelligence, Computer Assisted Testing, Scoring, Creativity Tests, Creative Thinking, Natural Language Processing, Automation, Semantics, Prompting, Inferences, Psychometrics, Item Response Theory, Achievement Tests, Foreign Countries, International Assessment, Secondary School Students
Assessment and Survey Identifiers: Program for International Student Assessment
DOI: 10.1002/jocb.70082
ISSN: 0022-0175
2162-6057
Abstract: The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally.
Abstractor: As Provided
Entry Date: 2026
Accession Number: EJ1500530
Database: ERIC
Be the first to leave a comment!
You must be logged in first