View in EDS HTML Full Text PDF Full Text

A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.

Saved in:

Bibliographic Details
Title:	A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.
Authors:	Shereefdeen, Hisba¹ (AUTHOR), Thaivalappil, Abhinand^2,3 (AUTHOR), Young, Ian³ (AUTHOR), MacKay, Melissa¹ (AUTHOR) melissam@uoguelph.ca
Source:	Inquiry (00469580). 11/29/2025, Vol. 62, p1-9. 9p.
Subject Terms:	Generative artificial intelligence, Qualitative research, Experimental design, Research bias, Computer assisted instruction, Inter-observer reliability, Consensus (Social sciences), Professional practice, Research evaluation, Pilot projects, Quality assurance, Evidence-based medicine, Judgment (Psychology), Human voice, Research ethics, User interfaces
Abstract:	Generative artificial intelligence (genAI) tools are transforming workflows, with growing interest in their potential applications in qualitative research. While the use of genAI in facilitating the systematic review process has been explored, its application in the quality appraisal of qualitative research remains to be understood. This pilot study aims to evaluate the degree to which ChatGPT appraises qualitative research using popular appraisal tools compared to human assessments. Two reviewers applied the Critical Appraisal Skills Program (CASP) and Joanna Briggs Institute (JBI) checklists for qualitative research to studies identified through a previously published review (n = 21). Next, iteratively developed prompts along with a copy of each study were uploaded to ChatGPT to instruct it to appraise each article. Interrater reliability measures and crude agreements were conducted to estimate the level of agreement between human and genAI assessments. Interrater reliability assessments between human and ChatGPT (GPT-5) revealed no agreement to moderate agreement for CASP checklist items (kappa: <.00-.46; crude agreement: 23.8%-100%) and from none to substantial for JBI items (kappa: <.00-.83; crude agreement: 4.8%-95.2%). Agreement was highest for reporting-based elements such as study aims, ethics approval, value of research (CASP), and participant voices and conclusions (JBI). Disagreements were greatest for interpretive and context-dependent items such as research design, researcher–participant relationships, and worldview–methodology congruity. Findings demonstrate that ChatGPT (GPT-5) can reliably identify objective components yet performs inconsistently when assessing items requiring nuance and contextual understanding across both checklists. Currently, any adoption of genAI for quality appraisal of qualitative research must be carefully applied only alongside human assessments and uphold principles of transparency and data privacy. [ABSTRACT FROM AUTHOR]
	Copyright of Inquiry (00469580) is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database:	Education Research Complete
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Text: Availability: 1
Header	DbId: ehh DbLabel: Education Research Complete An: 189855744 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Shereefdeen%2C+Hisba%22">Shereefdeen, Hisba</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Thaivalappil%2C+Abhinand%22">Thaivalappil, Abhinand</searchLink><relatesTo>2,3</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Young%2C+Ian%22">Young, Ian</searchLink><relatesTo>3</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22MacKay%2C+Melissa%22">MacKay, Melissa</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> melissam@uoguelph.ca</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Inquiry+%2800469580%29%22">Inquiry (00469580)</searchLink>. 11/29/2025, Vol. 62, p1-9. 9p. – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Generative+artificial+intelligence%22">Generative artificial intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Qualitative+research%22">Qualitative research</searchLink><br /><searchLink fieldCode="DE" term="%22Experimental+design%22">Experimental design</searchLink><br /><searchLink fieldCode="DE" term="%22Research+bias%22">Research bias</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+assisted+instruction%22">Computer assisted instruction</searchLink><br /><searchLink fieldCode="DE" term="%22Inter-observer+reliability%22">Inter-observer reliability</searchLink><br /><searchLink fieldCode="DE" term="%22Consensus+%28Social+sciences%29%22">Consensus (Social sciences)</searchLink><br /><searchLink fieldCode="DE" term="%22Professional+practice%22">Professional practice</searchLink><br /><searchLink fieldCode="DE" term="%22Research+evaluation%22">Research evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Pilot+projects%22">Pilot projects</searchLink><br /><searchLink fieldCode="DE" term="%22Quality+assurance%22">Quality assurance</searchLink><br /><searchLink fieldCode="DE" term="%22Evidence-based+medicine%22">Evidence-based medicine</searchLink><br /><searchLink fieldCode="DE" term="%22Judgment+%28Psychology%29%22">Judgment (Psychology)</searchLink><br /><searchLink fieldCode="DE" term="%22Human+voice%22">Human voice</searchLink><br /><searchLink fieldCode="DE" term="%22Research+ethics%22">Research ethics</searchLink><br /><searchLink fieldCode="DE" term="%22User+interfaces%22">User interfaces</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Generative artificial intelligence (genAI) tools are transforming workflows, with growing interest in their potential applications in qualitative research. While the use of genAI in facilitating the systematic review process has been explored, its application in the quality appraisal of qualitative research remains to be understood. This pilot study aims to evaluate the degree to which ChatGPT appraises qualitative research using popular appraisal tools compared to human assessments. Two reviewers applied the Critical Appraisal Skills Program (CASP) and Joanna Briggs Institute (JBI) checklists for qualitative research to studies identified through a previously published review (n = 21). Next, iteratively developed prompts along with a copy of each study were uploaded to ChatGPT to instruct it to appraise each article. Interrater reliability measures and crude agreements were conducted to estimate the level of agreement between human and genAI assessments. Interrater reliability assessments between human and ChatGPT (GPT-5) revealed no agreement to moderate agreement for CASP checklist items (kappa: <.00-.46; crude agreement: 23.8%-100%) and from none to substantial for JBI items (kappa: <.00-.83; crude agreement: 4.8%-95.2%). Agreement was highest for reporting-based elements such as study aims, ethics approval, value of research (CASP), and participant voices and conclusions (JBI). Disagreements were greatest for interpretive and context-dependent items such as research design, researcher–participant relationships, and worldview–methodology congruity. Findings demonstrate that ChatGPT (GPT-5) can reliably identify objective components yet performs inconsistently when assessing items requiring nuance and contextual understanding across both checklists. Currently, any adoption of genAI for quality appraisal of qualitative research must be carefully applied only alongside human assessments and uphold principles of transparency and data privacy. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Inquiry (00469580) is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=189855744
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1177/00469580251399374 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 9 StartPage: 1 Subjects: – SubjectFull: Generative artificial intelligence Type: general – SubjectFull: Qualitative research Type: general – SubjectFull: Experimental design Type: general – SubjectFull: Research bias Type: general – SubjectFull: Computer assisted instruction Type: general – SubjectFull: Inter-observer reliability Type: general – SubjectFull: Consensus (Social sciences) Type: general – SubjectFull: Professional practice Type: general – SubjectFull: Research evaluation Type: general – SubjectFull: Pilot projects Type: general – SubjectFull: Quality assurance Type: general – SubjectFull: Evidence-based medicine Type: general – SubjectFull: Judgment (Psychology) Type: general – SubjectFull: Human voice Type: general – SubjectFull: Research ethics Type: general – SubjectFull: User interfaces Type: general Titles: – TitleFull: A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Shereefdeen, Hisba – PersonEntity: Name: NameFull: Thaivalappil, Abhinand – PersonEntity: Name: NameFull: Young, Ian – PersonEntity: Name: NameFull: MacKay, Melissa IsPartOfRelationships: – BibEntity: Dates: – D: 29 M: 11 Text: 11/29/2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 00469580 Numbering: – Type: volume Value: 62 Titles: – TitleFull: Inquiry (00469580) Type: main
ResultId	1