View in EDS

AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment

Saved in:

Bibliographic Details
Title:	AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment
Language:	English
Authors:	Ricardo Primi (ORCID 0000-0003-4227-6745), Roger E. Beaty (ORCID 0000-0001-6114-5973), Mathias Benedek (ORCID 0000-0001-6258-4476), Denis Dumas (ORCID 0000-0002-8446-4720), Peter Organisciak (ORCID 0000-0002-9058-2280), John D. Patterson (ORCID 0000-0002-7455-3535), Tiago Calico (ORCID 0000-0003-3080-343X), Mario Piacentini (ORCID 0000-0001-8624-2833)
Source:	Journal of Creative Behavior. 2026 60(1).
Availability:	Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:	Y
Page Count:	12
Publication Date:	2026
Document Type:	Journal Articles Reports - Research
Education Level:	Secondary Education
Descriptors:	Artificial Intelligence, Computer Assisted Testing, Scoring, Creativity Tests, Creative Thinking, Natural Language Processing, Automation, Semantics, Prompting, Inferences, Psychometrics, Item Response Theory, Achievement Tests, Foreign Countries, International Assessment, Secondary School Students
Assessment and Survey Identifiers:	Program for International Student Assessment
DOI:	10.1002/jocb.70082
ISSN:	0022-0175 2162-6057
Abstract:	The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally.
Abstractor:	As Provided
Entry Date:	2026
Accession Number:	EJ1500530
Database:	ERIC

FullText	Text: Availability: 0
Header	DbId: eric DbLabel: ERIC An: EJ1500530 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Ricardo+Primi%22">Ricardo Primi</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-4227-6745">0000-0003-4227-6745</externalLink>)<br /><searchLink fieldCode="AR" term="%22Roger+E%2E+Beaty%22">Roger E. Beaty</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-6114-5973">0000-0001-6114-5973</externalLink>)<br /><searchLink fieldCode="AR" term="%22Mathias+Benedek%22">Mathias Benedek</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-6258-4476">0000-0001-6258-4476</externalLink>)<br /><searchLink fieldCode="AR" term="%22Denis+Dumas%22">Denis Dumas</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-8446-4720">0000-0002-8446-4720</externalLink>)<br /><searchLink fieldCode="AR" term="%22Peter+Organisciak%22">Peter Organisciak</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-9058-2280">0000-0002-9058-2280</externalLink>)<br /><searchLink fieldCode="AR" term="%22John+D%2E+Patterson%22">John D. Patterson</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-7455-3535">0000-0002-7455-3535</externalLink>)<br /><searchLink fieldCode="AR" term="%22Tiago+Calico%22">Tiago Calico</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0003-3080-343X">0000-0003-3080-343X</externalLink>)<br /><searchLink fieldCode="AR" term="%22Mario+Piacentini%22">Mario Piacentini</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-8624-2833">0000-0001-8624-2833</externalLink>) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Journal+of+Creative+Behavior%22"><i>Journal of Creative Behavior</i></searchLink>. 2026 60(1). – Name: Avail Label: Availability Group: Avail Data: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 12 – Name: DatePubCY Label: Publication Date Group: Date Data: 2026 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Audience Label: Education Level Group: Audnce Data: <searchLink fieldCode="EL" term="%22Secondary+Education%22">Secondary Education</searchLink> – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Assisted+Testing%22">Computer Assisted Testing</searchLink><br /><searchLink fieldCode="DE" term="%22Scoring%22">Scoring</searchLink><br /><searchLink fieldCode="DE" term="%22Creativity+Tests%22">Creativity Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Creative+Thinking%22">Creative Thinking</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+Language+Processing%22">Natural Language Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Automation%22">Automation</searchLink><br /><searchLink fieldCode="DE" term="%22Semantics%22">Semantics</searchLink><br /><searchLink fieldCode="DE" term="%22Prompting%22">Prompting</searchLink><br /><searchLink fieldCode="DE" term="%22Inferences%22">Inferences</searchLink><br /><searchLink fieldCode="DE" term="%22Psychometrics%22">Psychometrics</searchLink><br /><searchLink fieldCode="DE" term="%22Item+Response+Theory%22">Item Response Theory</searchLink><br /><searchLink fieldCode="DE" term="%22Achievement+Tests%22">Achievement Tests</searchLink><br /><searchLink fieldCode="DE" term="%22Foreign+Countries%22">Foreign Countries</searchLink><br /><searchLink fieldCode="DE" term="%22International+Assessment%22">International Assessment</searchLink><br /><searchLink fieldCode="DE" term="%22Secondary+School+Students%22">Secondary School Students</searchLink> – Name: SubjectThesaurus Label: Assessment and Survey Identifiers Group: Su Data: <searchLink fieldCode="SU" term="%22Program+for+International+Student+Assessment%22">Program for International Student Assessment</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1002/jocb.70082 – Name: ISSN Label: ISSN Group: ISSN Data: 0022-0175<br />2162-6057 – Name: Abstract Label: Abstract Group: Ab Data: The introduction of the PISA 2022 Creative Thinking assessment underscores the growing need for scalable, valid, and reliable methods to evaluate creativity in international large-scale assessments. Traditional human scoring, while nuanced, is time-consuming, costly, and subject to inconsistencies. This paper explores recent advances in artificial intelligence (AI) and natural language processing (NLP)--particularly transformer-based large language models (LLMs)--as promising alternatives for automated scoring. We review three methodological approaches: (1) unsupervised methods using semantic distance, (2) supervised fine-tuning with labeled data, and (3) few-/zero-shot learning using prompt-based inference. Empirical findings from verbal and visual creative tasks show that AI-based scoring systems can approximate human ratings with substantial accuracy (r = 0.70-0.85), even across different languages and task formats. A case study using the PISA Book Covers task demonstrates convergence between AI and human scores, with reliability levels comparable to traditional scoring. However, key challenges remain, particularly regarding cross-cultural comparability, bias mitigation, and interpretability. We discuss psychometric strategies (e.g., Many-Facet Rasch Models) to model these issues and propose future directions, including scoring of distinct creativity dimensions and developing transparent, open-source platforms. If rigorously validated, AI-based scoring offers a feasible and equitable path forward for assessing creativity globally. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2026 – Name: AN Label: Accession Number Group: ID Data: EJ1500530
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1500530
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1002/jocb.70082 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 12 Subjects: – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Computer Assisted Testing Type: general – SubjectFull: Scoring Type: general – SubjectFull: Creativity Tests Type: general – SubjectFull: Creative Thinking Type: general – SubjectFull: Natural Language Processing Type: general – SubjectFull: Automation Type: general – SubjectFull: Semantics Type: general – SubjectFull: Prompting Type: general – SubjectFull: Inferences Type: general – SubjectFull: Psychometrics Type: general – SubjectFull: Item Response Theory Type: general – SubjectFull: Achievement Tests Type: general – SubjectFull: Foreign Countries Type: general – SubjectFull: International Assessment Type: general – SubjectFull: Secondary School Students Type: general – SubjectFull: Program for International Student Assessment Type: general Titles: – TitleFull: AI-Powered Scoring for Creative Thinking: Methods and Challenges in PISA Assessment Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Ricardo Primi – PersonEntity: Name: NameFull: Roger E. Beaty – PersonEntity: Name: NameFull: Mathias Benedek – PersonEntity: Name: NameFull: Denis Dumas – PersonEntity: Name: NameFull: Peter Organisciak – PersonEntity: Name: NameFull: John D. Patterson – PersonEntity: Name: NameFull: Tiago Calico – PersonEntity: Name: NameFull: Mario Piacentini IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 03 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 0022-0175 – Type: issn-electronic Value: 2162-6057 Numbering: – Type: volume Value: 60 – Type: issue Value: 1 Titles: – TitleFull: Journal of Creative Behavior Type: main
ResultId	1