AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment
Saved in:
| Title: | AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment |
|---|---|
| Language: | English |
| Authors: | Ricardo Primi (ORCID |
| Source: | Journal of Creative Behavior. 2026 60(1). |
| Availability: | Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us |
| Peer Reviewed: | Y |
| Page Count: | 12 |
| Publication Date: | 2026 |
| Document Type: | Journal Articles Reports - Research |
| Education Level: | Secondary Education |
| Descriptors: | Artificial Intelligence, Computer Assisted Testing, Scoring, Creativity Tests, Creative Thinking, Natural Language Processing, Automation, Semantics, Prompting, Inferences, Psychometrics, Item Response Theory, Achievement Tests, Foreign Countries, International Assessment, Secondary School Students |
| Assessment and Survey Identifiers: | Program for International Student Assessment |
| DOI: | 10.1002/jocb.70082 |
| ISSN: | 0022-0175 2162-6057 |
| Abstract: | The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally. |
| Abstractor: | As Provided |
| Entry Date: | 2026 |
| Accession Number: | EJ1500530 |
| Database: | ERIC |
Be the first to leave a comment!