Developing and Validating a Historical Thinking Skills Assessment Instrument: A Psychometric Study Using Classical Test Theory and Item Response Theory
Saved in:
| Title: | Developing and Validating a Historical Thinking Skills Assessment Instrument: A Psychometric Study Using Classical Test Theory and Item Response Theory |
|---|---|
| Language: | English |
| Authors: | Tri Zahra Ningsih (ORCID |
| Source: | International Society for Technology, Education, and Science. 2025. |
| Availability: | International Society for Technology, Education, and Science. 944 Maysey Drive, San Antonio, TX 78227. Tel: 515-294-1075; Fax: 515-294-1003; email: istesoffice@gmail.com; Web site: http://www.istes.org |
| Peer Reviewed: | Y |
| Page Count: | 23 |
| Publication Date: | 2025 |
| Document Type: | Speeches/Meeting Papers Reports - Research |
| Education Level: | Secondary Education Grade 11 High Schools |
| Descriptors: | Test Construction, Test Validity, Psychometrics, Test Theory, Thinking Skills, History Instruction, Item Response Theory, Evaluation Methods, Secondary School Curriculum, Student Evaluation, Historical Interpretation, Perspective Taking, Grade 11, Foreign Countries |
| Geographic Terms: | Indonesia |
| Abstract: | The gap between history learning objectives that emphasize historical thinking skills and assessment practices that still focus on memorizing facts is a major challenge in history education. This study aims to develop and validate a competency-based assessment instrument to measure historical thinking skills holistically at the high school level. The instrument is constructed based on four main domains: source evaluation and interpretation, causal reasoning, continuity and change, and ethical reflection and perspective taking. Through a Research and Development (R&D) approach, validity testing was conducted using Aiken's V analysis on 16 items, while empirical testing involved 134 students and was analyzed using Classical Test Theory (CTT) and Item Response Theory (IRT), specifically the Graded Response Model (GRM). The results show that the instrument has high content validity (Aiken's V = 0.85-1.00), strong internal reliability ([alpha] = 0.804), adequate discriminatory power (r [greater than or equal to] 0.30), and the best model fit on the GRM. The test's information function and conditional reliability indicate that the instrument is most effective for measuring students' abilities at moderate to low levels ([theta] = -1 to 0). The study concludes that the instrument meets solid psychometric criteria and is relevant for use in the context of competency-based formative and summative assessments within the Independent Curriculum. [For the complete proceedings, see ED678749.] |
| Abstractor: | As Provided |
| Entry Date: | 2026 |
| Accession Number: | ED678758 |
| Database: | ERIC |
Be the first to leave a comment!