Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Saved in:
Bibliographic Details
Title: Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans
Language: English
Authors: Yuan Tian, Xi Yang, Suhail A. Doi, Luis Furuya-Kanamori (ORCID 0000-0002-4337-9757), Lifeng Lin (ORCID 0000-0002-3562-9816), Joey S. W. Kwong, Chang Xu (ORCID 0000-0002-2627-1250)
Source: Research Synthesis Methods. 2024 15(6):1111-1119.
Availability: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed: Y
Page Count: 9
Publication Date: 2024
Document Type: Journal Articles
Reports - Evaluative
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics, Artificial Intelligence, Evaluators, Computer Software, Evaluation Methods, Comparative Analysis, Research Problems, Reliability
DOI: 10.1002/jrsm.1761
ISSN: 1759-2879
1759-2887
Abstract: RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two different approaches: (1) manually by human reviewers, and (2) automatically by the RobotReviewer. The manual assessment was based on two groups independently, with two additional rounds of verification. The agreement between RobotReviewer and humans was measured via the concordance rate and Cohen's kappa statistics, based on the comparison of binary classification of the risk of bias (low vs. high/unclear) as restricted by RobotReviewer. The concordance rates varied by domain, ranging from 63.07% to 83.32%. Cohen's kappa statistics showed a poor agreement between humans and RobotReviewer for allocation concealment (K = 0.25, 95% CI: 0.21-0.30), blinding of outcome assessors (K = 0.27, 95% CI: 0.23-0.31); While moderate for random sequence generation (K = 0.46, 95% CI: 0.41-0.50) and blinding of participants and personnel (K = 0.59, 95% CI: 0.55-0.64). The findings demonstrate that there were domain-specific differences in the level of agreement between RobotReviewer and humans. We suggest that it might be a useful auxiliary tool, but the specific manner of its integration as a complementary tool requires further discussion.
Abstractor: As Provided
Notes: https://osf.io/k6w9q
Entry Date: 2024
Accession Number: EJ1447269
Database: ERIC
Full text is not displayed to guests.
Description
Abstract:RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two different approaches: (1) manually by human reviewers, and (2) automatically by the RobotReviewer. The manual assessment was based on two groups independently, with two additional rounds of verification. The agreement between RobotReviewer and humans was measured via the concordance rate and Cohen's kappa statistics, based on the comparison of binary classification of the risk of bias (low vs. high/unclear) as restricted by RobotReviewer. The concordance rates varied by domain, ranging from 63.07% to 83.32%. Cohen's kappa statistics showed a poor agreement between humans and RobotReviewer for allocation concealment (K = 0.25, 95% CI: 0.21-0.30), blinding of outcome assessors (K = 0.27, 95% CI: 0.23-0.31); While moderate for random sequence generation (K = 0.46, 95% CI: 0.41-0.50) and blinding of participants and personnel (K = 0.59, 95% CI: 0.55-0.64). The findings demonstrate that there were domain-specific differences in the level of agreement between RobotReviewer and humans. We suggest that it might be a useful auxiliary tool, but the specific manner of its integration as a complementary tool requires further discussion.
ISSN:1759-2879
1759-2887
DOI:10.1002/jrsm.1761