A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.

Saved in:
Bibliographic Details
Title: A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.
Authors: Shereefdeen, Hisba1 (AUTHOR), Thaivalappil, Abhinand2,3 (AUTHOR), Young, Ian3 (AUTHOR), MacKay, Melissa1 (AUTHOR) melissam@uoguelph.ca
Source: Inquiry (00469580). 11/29/2025, Vol. 62, p1-9. 9p.
Subject Terms: *Generative artificial intelligence, *Qualitative research, *Experimental design, *Research bias, *Computer assisted instruction, *Inter-observer reliability, Consensus (Social sciences), Professional practice, Research evaluation, Pilot projects, Quality assurance, Evidence-based medicine, Judgment (Psychology), Human voice, Research ethics, User interfaces
Abstract: Generative artificial intelligence (genAI) tools are transforming workflows, with growing interest in their potential applications in qualitative research. While the use of genAI in facilitating the systematic review process has been explored, its application in the quality appraisal of qualitative research remains to be understood. This pilot study aims to evaluate the degree to which ChatGPT appraises qualitative research using popular appraisal tools compared to human assessments. Two reviewers applied the Critical Appraisal Skills Program (CASP) and Joanna Briggs Institute (JBI) checklists for qualitative research to studies identified through a previously published review (n = 21). Next, iteratively developed prompts along with a copy of each study were uploaded to ChatGPT to instruct it to appraise each article. Interrater reliability measures and crude agreements were conducted to estimate the level of agreement between human and genAI assessments. Interrater reliability assessments between human and ChatGPT (GPT-5) revealed no agreement to moderate agreement for CASP checklist items (kappa: <.00-.46; crude agreement: 23.8%-100%) and from none to substantial for JBI items (kappa: <.00-.83; crude agreement: 4.8%-95.2%). Agreement was highest for reporting-based elements such as study aims, ethics approval, value of research (CASP), and participant voices and conclusions (JBI). Disagreements were greatest for interpretive and context-dependent items such as research design, researcher–participant relationships, and worldview–methodology congruity. Findings demonstrate that ChatGPT (GPT-5) can reliably identify objective components yet performs inconsistently when assessing items requiring nuance and contextual understanding across both checklists. Currently, any adoption of genAI for quality appraisal of qualitative research must be carefully applied only alongside human assessments and uphold principles of transparency and data privacy. [ABSTRACT FROM AUTHOR]
Copyright of Inquiry (00469580) is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Education Research Complete
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: ehh
DbLabel: Education Research Complete
An: 189855744
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: A Pilot Study on Generative Artificial Intelligence&#39;s Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.
– Name: Author
  Label: Authors
  Group: Au
  Data: &lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Shereefdeen%2C+Hisba%22&quot;&gt;Shereefdeen, Hisba&lt;/searchLink&gt;&lt;relatesTo&gt;1&lt;/relatesTo&gt; (AUTHOR)&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Thaivalappil%2C+Abhinand%22&quot;&gt;Thaivalappil, Abhinand&lt;/searchLink&gt;&lt;relatesTo&gt;2,3&lt;/relatesTo&gt; (AUTHOR)&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22Young%2C+Ian%22&quot;&gt;Young, Ian&lt;/searchLink&gt;&lt;relatesTo&gt;3&lt;/relatesTo&gt; (AUTHOR)&lt;br /&gt;&lt;searchLink fieldCode=&quot;AR&quot; term=&quot;%22MacKay%2C+Melissa%22&quot;&gt;MacKay, Melissa&lt;/searchLink&gt;&lt;relatesTo&gt;1&lt;/relatesTo&gt; (AUTHOR)&lt;i&gt; melissam@uoguelph.ca&lt;/i&gt;
– Name: TitleSource
  Label: Source
  Group: Src
  Data: &lt;searchLink fieldCode=&quot;JN&quot; term=&quot;%22Inquiry+%2800469580%29%22&quot;&gt;Inquiry (00469580)&lt;/searchLink&gt;. 11/29/2025, Vol. 62, p1-9. 9p.
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: *&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Generative+artificial+intelligence%22&quot;&gt;Generative artificial intelligence&lt;/searchLink&gt;&lt;br /&gt;*&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Qualitative+research%22&quot;&gt;Qualitative research&lt;/searchLink&gt;&lt;br /&gt;*&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Experimental+design%22&quot;&gt;Experimental design&lt;/searchLink&gt;&lt;br /&gt;*&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Research+bias%22&quot;&gt;Research bias&lt;/searchLink&gt;&lt;br /&gt;*&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Computer+assisted+instruction%22&quot;&gt;Computer assisted instruction&lt;/searchLink&gt;&lt;br /&gt;*&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Inter-observer+reliability%22&quot;&gt;Inter-observer reliability&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Consensus+%28Social+sciences%29%22&quot;&gt;Consensus (Social sciences)&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Professional+practice%22&quot;&gt;Professional practice&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Research+evaluation%22&quot;&gt;Research evaluation&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Pilot+projects%22&quot;&gt;Pilot projects&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Quality+assurance%22&quot;&gt;Quality assurance&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Evidence-based+medicine%22&quot;&gt;Evidence-based medicine&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Judgment+%28Psychology%29%22&quot;&gt;Judgment (Psychology)&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Human+voice%22&quot;&gt;Human voice&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22Research+ethics%22&quot;&gt;Research ethics&lt;/searchLink&gt;&lt;br /&gt;&lt;searchLink fieldCode=&quot;DE&quot; term=&quot;%22User+interfaces%22&quot;&gt;User interfaces&lt;/searchLink&gt;
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Generative artificial intelligence (genAI) tools are transforming workflows, with growing interest in their potential applications in qualitative research. While the use of genAI in facilitating the systematic review process has been explored, its application in the quality appraisal of qualitative research remains to be understood. This pilot study aims to evaluate the degree to which ChatGPT appraises qualitative research using popular appraisal tools compared to human assessments. Two reviewers applied the Critical Appraisal Skills Program (CASP) and Joanna Briggs Institute (JBI) checklists for qualitative research to studies identified through a previously published review (n = 21). Next, iteratively developed prompts along with a copy of each study were uploaded to ChatGPT to instruct it to appraise each article. Interrater reliability measures and crude agreements were conducted to estimate the level of agreement between human and genAI assessments. Interrater reliability assessments between human and ChatGPT (GPT-5) revealed no agreement to moderate agreement for CASP checklist items (kappa: &lt;.00-.46; crude agreement: 23.8%-100%) and from none to substantial for JBI items (kappa: &lt;.00-.83; crude agreement: 4.8%-95.2%). Agreement was highest for reporting-based elements such as study aims, ethics approval, value of research (CASP), and participant voices and conclusions (JBI). Disagreements were greatest for interpretive and context-dependent items such as research design, researcher–participant relationships, and worldview–methodology congruity. Findings demonstrate that ChatGPT (GPT-5) can reliably identify objective components yet performs inconsistently when assessing items requiring nuance and contextual understanding across both checklists. Currently, any adoption of genAI for quality appraisal of qualitative research must be carefully applied only alongside human assessments and uphold principles of transparency and data privacy. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: &lt;i&gt;Copyright of Inquiry (00469580) is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites without the copyright holder&#39;s express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.&lt;/i&gt; (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=ehh&AN=189855744
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1177/00469580251399374
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 9
        StartPage: 1
    Subjects:
      – SubjectFull: Generative artificial intelligence
        Type: general
      – SubjectFull: Qualitative research
        Type: general
      – SubjectFull: Experimental design
        Type: general
      – SubjectFull: Research bias
        Type: general
      – SubjectFull: Computer assisted instruction
        Type: general
      – SubjectFull: Inter-observer reliability
        Type: general
      – SubjectFull: Consensus (Social sciences)
        Type: general
      – SubjectFull: Professional practice
        Type: general
      – SubjectFull: Research evaluation
        Type: general
      – SubjectFull: Pilot projects
        Type: general
      – SubjectFull: Quality assurance
        Type: general
      – SubjectFull: Evidence-based medicine
        Type: general
      – SubjectFull: Judgment (Psychology)
        Type: general
      – SubjectFull: Human voice
        Type: general
      – SubjectFull: Research ethics
        Type: general
      – SubjectFull: User interfaces
        Type: general
    Titles:
      – TitleFull: A Pilot Study on Generative Artificial Intelligence's Reliability in Qualitative Research Quality Appraisal Using CASP and JBI Checklists.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Shereefdeen, Hisba
      – PersonEntity:
          Name:
            NameFull: Thaivalappil, Abhinand
      – PersonEntity:
          Name:
            NameFull: Young, Ian
      – PersonEntity:
          Name:
            NameFull: MacKay, Melissa
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 29
              M: 11
              Text: 11/29/2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 00469580
          Numbering:
            – Type: volume
              Value: 62
          Titles:
            – TitleFull: Inquiry (00469580)
              Type: main
ResultId 1