Bibliographic Details
| Title: |
Transforming Combat Casualty Care Training: Generative AI-Enabled Adaptive Learning in Forward Medical Settings |
| Language: |
English |
| Authors: |
Alan D. Koenig, John J. Lee, Eric Savitsky, Gabriele Nataneli, Karson Lindstrom, David L. Schriger, Tyler Savitsky, National Center for Research on Evaluation, Standards, and Student Testing (CRESST) |
| Source: |
National Center for Research on Evaluation, Standards, and Student Testing (CRESST). 2025. |
| Availability: |
National Center for Research on Evaluation, Standards, and Student Testing (CRESST). 300 Charles E Young Drive N, GSE&IS Building 3rd Floor, Mailbox 951522, Los Angeles, CA 90095-1522. Tel: 310-206-1532; Fax: 310-825-3883; Web site: http://www.cresst.org |
| Peer Reviewed: |
N |
| Page Count: |
20 |
| Publication Date: |
2025 |
| Document Type: |
Reports - Research |
| Descriptors: |
Artificial Intelligence, Computer Uses in Education, Medical Education, Multiple Choice Tests, Cost Effectiveness, Military Personnel, Electronic Learning, War, Conflict |
| Abstract: |
The urgent need to train military and civilian responders in combat casualty care during large-scale operations presents challenges due to the variability of learner preparedness and the resource demands of traditional curriculum development. This study examines the application of generative artificial intelligence (AI) in authoring and evaluating multiple-choice question-and-answer (QA) sets for medical training, with a specific focus on far-forward combat environments. Leveraging OpenAI's latest large language models (LLMs)--including GPT-4 (Open AI, 2023), GPT-4o (OpenAI, 2024a), o1, (OpenAI, 2024c) and o1-mini (OpenAI, 2024d)--the study compares AI-generated QA sets to those created by a seasoned human subject matter expert (SME), using National Board of Medical Examiners (NBME) guidelines as the benchmark. Results show that GPT-4o produced high-quality QA sets in 86.6% of cases, while interrater agreement between human and AI raters was strong (Krippendorff's [alpha] = 0.85; Gwet's AC2 = 0.96). The AI-generated QA sets were created with a 31-fold time savings and over 4,000-fold cost reduction relative to SME-authored items. Beyond performance metrics, the study introduces a replicable human-in-the-loop methodology for AI-assisted educational assessment design, striking a balance between scalability and pedagogical integrity. This framework provides a viable path for integrating LLMs into adaptive learning systems across various domains, while emphasizing the continued need for expert oversight to ensure contextual fidelity, instructional relevance, and quality assurance. |
| Abstractor: |
As Provided |
| Entry Date: |
2025 |
| Accession Number: |
ED676628 |
| Database: |
ERIC |