Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education
Saved in:
| Title: | Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education |
|---|---|
| Language: | English |
| Authors: | Qiyang Zhang (ORCID |
| Source: | Journal of Research on Educational Effectiveness. 2024 17(3):513-539. |
| Availability: | Routledge. Available from: Taylor & Francis, Ltd. 530 Walnut Street Suite 850, Philadelphia, PA 19106. Tel: 800-354-1420; Tel: 215-625-8900; Fax: 215-207-0050; Web site: http://www.tandf.co.uk/journals |
| Peer Reviewed: | Y |
| Page Count: | 27 |
| Publication Date: | 2024 |
| Document Type: | Journal Articles Information Analyses |
| Descriptors: | Selection Tools, Educational Resources, Artificial Intelligence, Selection Criteria, Computer Software Selection, Computer Software Evaluation, Literature Reviews |
| DOI: | 10.1080/19345747.2023.2209079 |
| ISSN: | 1934-5747 1934-5739 |
| Abstract: | In recent years, the rapid development of artificial intelligence has enabled the launch of many new screening tools. This review aims to facilitate screening tool selection through a systematic narrative review and feature analysis. The current adoption rate of transparent tool reporting is low: by screening 191 studies published in the "Review of Educational Research" since 2015, we found that only eight studies reported screening tools. More research is needed to understand the reasons behind this phenomenon. After consulting various sources, 26 available screening tools in the market were found. Among them, we identified and evaluated 12 screening tools for educational reviewers and ranked them in descending order of feature score: Covidence (1), DistillerSR (2, tied), EPPI-Reviewer (2, tied), CADIMA (4), Swift-Active (5), Rayyan (6, tied), SysRev (6, tied), Abstrackr (8, tied), ReLiS (8, tied), RevMan (8, tied), ASReview (11), and Excel (12). In the discussion, we provide insights into the promise and bias in tools' machine learning algorithms. Our results encourage researchers to report their tool usage in publications and select tools based on suitability instead of convenience. |
| Abstractor: | As Provided |
| Entry Date: | 2024 |
| Accession Number: | EJ1431206 |
| Database: | ERIC |
|
Full text is not displayed to guests.
Login for full access.
|
|
| FullText | Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwFHFw9gf9TE6EivzsgQjYLJAAAA4zCB4AYJKoZIhvcNAQcGoIHSMIHPAgEAMIHJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDFYQuxDBXou_JcCQbAIBEICBm7CerhTTrKFjQFcn2H3lmTo9et6wyIdCm4JxQjFyUuZzllTiF7SemOrKdPzTJFN66c7pnSQXexb-_Q0QvzFt9mrXUHDZ6rw_shHlpk7uaov9NyDA-0Y3vp8Z0dUDvU4pc3zpsmdbJ2LcVK2AFvyPtcciEC6G4ednDXdod8bbpCArXWfChTeaGkTDMFY5GXJg8z1AiG2kGIzCioJB Text: Availability: 1 Value: <anid>AN0178458564;[5ew9]01jul.24;2024Jul18.05:09;v2.2.500</anid> <title id="AN0178458564-1">Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education </title> <p>In recent years, the rapid development of artificial intelligence has enabled the launch of many new screening tools. This review aims to facilitate screening tool selection through a systematic narrative review and feature analysis. The current adoption rate of transparent tool reporting is low: by screening 191 studies published in the Review of Educational Research since 2015, we found that only eight studies reported screening tools. More research is needed to understand the reasons behind this phenomenon. After consulting various sources, 26 available screening tools in the market were found. Among them, we identified and evaluated 12 screening tools for educational reviewers and ranked them in descending order of feature score: Covidence (<reflink idref="bib1" id="ref1">1</reflink>), DistillerSR (<reflink idref="bib2" id="ref2">2</reflink>, tied), EPPI-Reviewer (<reflink idref="bib2" id="ref3">2</reflink>, tied), CADIMA (<reflink idref="bib4" id="ref4">4</reflink>), Swift-Active (<reflink idref="bib5" id="ref5">5</reflink>), Rayyan (<reflink idref="bib6" id="ref6">6</reflink>, tied), SysRev (<reflink idref="bib6" id="ref7">6</reflink>, tied), Abstrackr (<reflink idref="bib8" id="ref8">8</reflink>, tied), ReLiS (<reflink idref="bib8" id="ref9">8</reflink>, tied), RevMan (<reflink idref="bib8" id="ref10">8</reflink>, tied), ASReview (<reflink idref="bib11" id="ref11">11</reflink>), and Excel (<reflink idref="bib12" id="ref12">12</reflink>). In the discussion, we provide insights into the promise and bias in tools' machine learning algorithms. Our results encourage researchers to report their tool usage in publications and select tools based on suitability instead of convenience.</p> <p>Keywords: Systematic reviews; screening tools; machine learning; feature analysis; decision tree</p> <p> <emph>If I had eight hours to chop a tree, I'd spend six sharpening my ax.-Abraham Lincoln</emph> </p> <hd id="AN0178458564-2">History of Tool Development in Systematic Review</hd> <p>In systematic reviews, the <emph>screening process</emph> is the procedure for identifying eligible studies for inclusion and analysis. A screening process may include two stages, starting with a title and abstract screening stage where the studies are assessed for relevance, and a second stage where the full-text of the article is reviewed for inclusion. The standard practice of screening involves multiple reviewers making inclusion-exclusion decisions on the same set of studies as a consistency check (Cooper et al., [<reflink idref="bib15" id="ref13">15</reflink>]). Historically, systematic reviewers often manually collaborated to screen studies using hard copies. A folder of printed-out potential studies passed around an office room often underpinned the basis for early systematic reviews and meta-analyses. This manual screening process took a considerable amount of time to manage literature and precluded blind review as well as multi-geographical collaboration. Fortunately, technological development has transformed screening routines, drastically saving time and labor committed to this intensive process.</p> <p>With Excel's launch in 1985, reviewers gradually started to digitize the screening and coding process using spreadsheets. In the early 2000s, citation management software, such as EndNote and Zotero, started to provide a more convenient platform for literature storage and retrieval. These reference management tools began to replace spreadsheets. There has also been the development of tools specifically designed to support the screening and reviewing stages of systematic reviews. In 2010s, the rapid boom in artificial intelligence (AI) and machine-learning applications prompted the launch of some new meta-analysis software tools with AI-enhanced features.</p> <hd id="AN0178458564-3">Why Screening Tools Are Better than Spreadsheets and Citation Management Tools</hd> <p>Despite the convenience and benefits spreadsheets and citation management tools brought, they have major limitations in essential features because they are not developed specifically for screening purposes. Excel spreadsheets were primarily developed for task management, data organization, analysis, and visualization purposes (Microsoft, [<reflink idref="bib34" id="ref14">34</reflink>]) and citation management tools were developed to collect, organize, annotate, cite, and share research. Systematic reviewers and meta-analysts who rely on spreadsheets and citation management tools face various problems in conducting recommended practices. For instance, best practices in screening abstracts include ensuring independent double-screening, conducting pilot testing of screening, and arranging regular team meetings to reconcile disagreements (Polanin et al., [<reflink idref="bib45" id="ref15">45</reflink>]). Screening tools developed based on these guidelines ideally embed features, such as blind reviewing, screening progress overview, conflict resolution, inter-rater reliability, and reviewers' progress tracking. Domain knowledge enabled the screening tool developers to incorporate protocols for high-quality systematic reviews. These added functions partially explain the increasing popularity of screening tools among systematic review teams.</p> <p>The other reason for review teams to switch to screening tools is that advanced algorithms can help expedite the often manual and tiresome screening process. Empowering systematic review teams to screen faster is the main attraction of screening tools. For example, some new screening tools were equipped with machine learning-based text classification algorithms to rank the relevancy of the studies based on human reviewers' screening patterns, which significantly reduce resources required for conducting reviews (Cohen et al., [<reflink idref="bib14" id="ref16">14</reflink>]). These features are particularly helpful when conducting large-scale reviews in cross-disciplinary fields, such as education. The additional functions enabled by these semi-automated software tools hold promise to save a tremendous amount of time and labor in evidence-based synthesis (Ouzzani et al., [<reflink idref="bib39" id="ref17">39</reflink>]; Rathbone et al., [<reflink idref="bib46" id="ref18">46</reflink>]) if selected and used appropriately.</p> <p>The availability of new screening tools presents new problems. Not all screening tools are the same and researchers' selection of the tools can be consequential. For instance, some tools may have less accurate deduplication functions (McKeown &amp; Mir, [<reflink idref="bib31" id="ref19">31</reflink>]) while some other tools may have biased algorithms embedded in machine learning features (Varghese et al., [<reflink idref="bib59" id="ref20">59</reflink>]). As readers, information on tool usage behind systematic reviews enables us to better compare different reviews of the same topic or interpret the review results. As researchers, understanding the reasons behind tools used in past reviews can guide us to select suitable tools and conduct better research. In published reviews, the screening process is generally not well-explained and the rationale behind tool selection is rarely transparent. This article aims to provide information for educational researchers to select the right tool for a particular set of needs. We will identify and compare software tools specifically designed to perform title and abstract screening and/or full-text review. Meanwhile, we will assess published review articles in education to understand the current tool usage among educational researchers. Eventually, our goal is to provide sufficient and simple information, such as a decision tree, to assist reviewers' tool selection.</p> <hd id="AN0178458564-4">Screening Tools in Education</hd> <p>Within the field of education, there is a lack of research comparing different screening tools. This kind of research has been conducted in some other fields, such as biomedical research (Van der Mierden et al., [<reflink idref="bib58" id="ref21">58</reflink>]) and healthcare (Harrison et al., [<reflink idref="bib24" id="ref22">24</reflink>]). Van der Mierden et al. ([<reflink idref="bib58" id="ref23">58</reflink>]) performed a feature analysis on 16 available screening tools in biomedical research and ranked Rayyan as the best free tool and Microsoft products as the least preferable tools for biomedical researchers. Harrison et al. ([<reflink idref="bib24" id="ref24">24</reflink>]) conducted a weighted feature analysis on six screening tools in healthcare and recommended Covidence and Rayyan for healthcare systematic reviewers. However, considering this question from the perspective of education is valuable. While the screening process (title and abstract screening followed by full-text review against inclusion/exclusion criteria) is universal and some features, such as the ability to collaborate by allowing multiple users would be of interest regardless of field, there are specific features discussed below that would be particularly helpful within education. We build on similar work in other fields (i.e., Van der Mierden et al., [<reflink idref="bib58" id="ref25">58</reflink>]) by including an additional focus on these features of particular relevance in education.</p> <p>Evidence-based reform in education has contributed to a proliferation of well-designed and large-scale experiments in recent years, with a corresponding call for up-to-date, comprehensive systematic reviews across a broad set of topics. However, this proliferation can be regarded as a double-edged sword. On one hand, more evidence contributes to the development of rigorous educational research synthesis. On the other hand, a higher volume of studies can make meta-analysis more labor- and time-consuming. In addition, education is an interdisciplinary field that requires a literature search from multiple databases and sources. Educational systematic reviewers urgently need technology-based screening tools with time-saving features, such as deduplication and machine learning, to expedite the screening process. These needs may differ in other fields, where they need to only search a single or several databases and the results will only include a few hundred studies. In the field of education, there is a particularly acute urgency for tools that can make screening less resource-intensive since conducting interdisciplinary searches often retrieves thousands of studies to screen. In large educational systematic reviews, machine learning approaches are particularly helpful. Since machine learning is of special interest, we propose to take a deeper look at the approaches used in machine learning, so that it is not just whether or not machine learning is available, but whether it also includes more modern deep neural networks.</p> <p>The speed with which the education field develops, tests, and markets new approaches is fast. Vendors and developers are constantly bringing to market new programs and practices that school districts must decide whether to adopt. The results of these must be rapidly integrated into existing reviews so that the ability to update existing reviews is an important feature in education that may not be relevant in other fields, where the pace of testing and change is slower or more regulated by other forces.</p> <p>Another factor that may determine which features are especially relevant is domain-specific definitions of what constitutes a "high-quality" review. These differences in what is "good enough" are expected to vary if we accept a best-evidence approach, which focuses the synthesis on the best available evidence (Slavin, [<reflink idref="bib55" id="ref26">55</reflink>]). It is natural to believe that reviews in fields with more randomized control trials would focus on those for their reviews, while fields with fewer randomized studies or fields for which randomized studies are not always appropriate or possible would include a more diverse set of research designs. Within education, the level of evidence available depends on the particular question of interest, so there must be flexibility in how or even whether those standards are handled during the screening process or instead are addressed during the coding process. The process of assessing the research design could be included in the software tool if reviewers wish to focus only on including studies that meet a particular standard of methodological rigor. The tools that are designed for fields that prize experiments may not be able to flexibly handle additional study designs, such as quasi-experiments or case studies.</p> <p>Thirdly, there are even particular software tools that were developed for specific fields that may struggle to be adapted to the education field. For example, Parsifal, a screening tool developed for the software engineering field, is fully integrated with one database, Scopus, so that one could conduct searches of that database from within the software, save the results of the search, and then import to the screening tool. This is a remarkable feature that simplifies the process but has not yet been implemented with education-specific databases, such as ERIC. Therefore, while the integration of databases directly into the screening software is of value in some fields, it is less relevant in education when those integrated databases are not those with the most coverage of education-related topics. This ability is even of less relevance for educational researchers because education is an interdisciplinary field, requiring the use of many different databases, making the integration of a single database less useful. Because of these differences, the conclusions drawn from biomedical research and healthcare on choosing screening tools are not necessarily transferable to educational researchers. Therefore, we are motivated to conduct this study specifically in the field of education.</p> <p>This study intends to identify screening tools used by educational researchers and provide guidance for tool selection in the field of education. To the best of our knowledge, this study is the first to address this issue for educational researchers. Through systematic narrative review and feature analysis of tools' features, this article aims to examine currently available screening tools in education to empower educational meta-analysts with the information necessary to optimize tool selection. To facilitate the selection process of screening tools, we present an opinion-based decision tree at the end of the article.</p> <hd id="AN0178458564-5">Review Objectives</hd> <p>The three objectives of this study are to:</p> <p></p> <ulist> <item> Identify available screening tools for educational analysts. To do so, we will first (1a) identify all available screening tools in the market, and then (1b) identify the subset of those screening tools that could be used by educational researchers.</item> <p></p> <item> Assess the tools educational researchers are already using through tool reporting in published reviews. To achieve this objective, we used two steps: (2a) screening published articles in <emph>Review of Educational Research (RER)</emph> to determine the prevalence of reporting screening tools in published educational research, and (2b) cross-checking whether we have identified a complete set of screening tools used by educational researchers in Objective 1b.</item> <p></p> <item> Analyze differing features of available screening tools to inform educational meta-analysts' tool selection. In this objective, we have two procedures: (3a) collecting information on screening tools identified in Objective 1b, and (3b) ranking tools through feature analysis to inform educational systematic reviewers' tool selection.</item> </ulist> <hd id="AN0178458564-6">Methods</hd> <p></p> <hd id="AN0178458564-7">Objective 1. Identifying Available Screening Tools for Educational Analysts</hd> <p>In the current screening tool software market, there are many meta-analytical tools developed for specific purposes in particular fields. For example, SyRF was developed for preclinical studies; SRDB.PRO was developed for the pharmaceutical industry and healthcare consultancies; Colandr was launched by biodiversity researchers; RobotAnalyst and SRDR Plus were developed for healthcare researchers; and PASRIFAL was developed for software engineering. As discussed in the previous section, not all screening tools are feasible for educational systematic research due to niche differences across fields. To locate screening tools currently available in the market, we conducted a web-based search to obtain names of different tools from academic publications (Harrison et al., [<reflink idref="bib24" id="ref27">24</reflink>]; Schoot et al., [<reflink idref="bib52" id="ref28">52</reflink>]; Van der Mierden et al., [<reflink idref="bib58" id="ref29">58</reflink>]), personal blogs (Bradburn, [<reflink idref="bib8" id="ref30">8</reflink>]), university library resources (Roth, [<reflink idref="bib48" id="ref31">48</reflink>]), and research centers (Center for Evidence Synthesis in Health, [<reflink idref="bib11" id="ref32">11</reflink>]). To narrow down and identify available and appropriate tools for educational researchers, we included and analyzed the screening tools that have been used in educational research. To assess this criterion, we searched for "tool name + education/school/students + meta-analysis/systematic review" on Google Scholar to find systematic reviews in education that cite the screening tools. Finally, among all the identified tools, we exclude tools that were designed for a specific non-education field and tools that are coding frameworks or R packages.</p> <hd id="AN0178458564-8">Objective 2. Assessing the Tools Educational Researchers Are Already Using through Tool Repor...</hd> <p>Variations in reviewers' screening procedures and usage of different screening tools can result in systematic differences in their research results. With the advancement of the open science movement and transparent research, more researchers are expected to report their screening procedures in detail (Patall, [<reflink idref="bib41" id="ref33">41</reflink>]). We are interested in finding out the current status of reporting the tools used for screening in educational research. Therefore, we screened publications from the journal—<emph>Review of Educational Research (RER)</emph>—to conduct a systematic search and screening. <emph>RER</emph> is among the most impactful and representative peer-reviewed journals for systematic reviews in the field of education. Because almost all new screening software tools, except Revman and Excel, were released post-2010 (see Table 1) and it takes time for research teams to train with and implement new software effectively, we selected the cutoff year of 2015 to assess published studies' reporting status of screening tools. This also ensures that we are examining the tools currently in-use in the field and most likely to still be available.</p> <p>Table 1. Basic information on the included screening tools.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;Name&lt;/td&gt;&lt;td&gt;Info&lt;/td&gt;&lt;td&gt;Year of launch&lt;/td&gt;&lt;td&gt;Developer publication&lt;/td&gt;&lt;td&gt;Developer&lt;/td&gt;&lt;td&gt;Country&lt;/td&gt;&lt;td&gt;Examples in education&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td&gt;Abstrackr&lt;/td&gt;&lt;td&gt;Semi-automated tool for teamwork in meta-analysis&lt;/td&gt;&lt;td char="."&gt;2012&lt;/td&gt;&lt;td&gt;Wallace et al.&lt;/td&gt;&lt;td&gt;Tufts Evidence-based Practice Center, maintained by Brown Center for Evidence Synthesis in Health&lt;/td&gt;&lt;td&gt;USA&lt;/td&gt;&lt;td&gt;Clinton &amp; Khan, &lt;xref ref-type="bibr" rid="bibr13"&gt;2019&lt;/xref&gt;; Polanin et al., &lt;xref ref-type="bibr" rid="bibr44"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Covidence&lt;/td&gt;&lt;td&gt;Web-based systematic review tool for screening, data extraction, and analysis&lt;/td&gt;&lt;td char="."&gt;2013&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;td&gt;Australian nonprofit company Veritas Health Innovation&lt;/td&gt;&lt;td&gt;Australia&lt;/td&gt;&lt;td&gt;Car et al., &lt;xref ref-type="bibr" rid="bibr10"&gt;2019&lt;/xref&gt;; Lee et al., &lt;xref ref-type="bibr" rid="bibr30"&gt;2021&lt;/xref&gt;; Zhang et al., &lt;xref ref-type="bibr" rid="bibr64"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ASReveiw&lt;/td&gt;&lt;td&gt;Uses the latest machine learning algorithms to minimize errors and maximize accuracy&lt;/td&gt;&lt;td char="."&gt;2019&lt;/td&gt;&lt;td&gt;Schoot et al.&lt;/td&gt;&lt;td&gt;Utrecht University&lt;/td&gt;&lt;td&gt;Netherlands&lt;/td&gt;&lt;td&gt;Zhang et al., &lt;xref ref-type="bibr" rid="bibr65"&gt;2023&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RevMan Web/RevMan 5&lt;/td&gt;&lt;td&gt;- Facilitates protocol development, screening, and full-text reviews - Present the results graphically&lt;/td&gt;&lt;td char="."&gt;2008&lt;/td&gt;&lt;td&gt;The Cochrane Collaboration&lt;/td&gt;&lt;td&gt;The Cochrane Collaboration&lt;/td&gt;&lt;td&gt;UK&lt;/td&gt;&lt;td&gt;Pei &amp; Wu, &lt;xref ref-type="bibr" rid="bibr42"&gt;2019&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rayyan&lt;/td&gt;&lt;td&gt;Accelerates abstract/title screening with semi-automation&lt;/td&gt;&lt;td char="."&gt;2014&lt;/td&gt;&lt;td&gt;Ouzzani et al.&lt;/td&gt;&lt;td&gt;Qatar Computing Research Institute: Rayyan Systems Inc.&lt;/td&gt;&lt;td&gt;Qatar&lt;/td&gt;&lt;td&gt;&amp;#216;degaard et al., &lt;xref ref-type="bibr" rid="bibr38"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;EPPI-Reviewer&lt;/td&gt;&lt;td&gt;- Supports study screening through data collection, analysis, and synthesis - Includes features, such as text mining, data clustering, classification, term extraction, and machine learning&lt;/td&gt;&lt;td char="."&gt;2010&lt;/td&gt;&lt;td&gt;Thomas et al.&lt;/td&gt;&lt;td&gt;EPPI-Center at the Social Science Research Unit at the Institute of Education, University College London, and University of London&lt;/td&gt;&lt;td&gt;UK&lt;/td&gt;&lt;td&gt;Merrill, &lt;xref ref-type="bibr" rid="bibr33"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DistillerSR&lt;/td&gt;&lt;td&gt;Automates literature collection, triage, and assessment using AI and intelligent workflows&lt;/td&gt;&lt;td char="."&gt;2018&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;td&gt;Evidence Partners Inc., Ottawa&lt;/td&gt;&lt;td&gt;Canada&lt;/td&gt;&lt;td&gt;Salter et al., &lt;xref ref-type="bibr" rid="bibr51"&gt;2014&lt;/xref&gt;; Noetel et al., &lt;xref ref-type="bibr" rid="bibr37"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Excel&lt;/td&gt;&lt;td&gt;Excel's VonVille can help with systematic reviews&lt;/td&gt;&lt;td char="."&gt;1985&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;td&gt;Microsoft Corporation&lt;/td&gt;&lt;td&gt;USA&lt;/td&gt;&lt;td&gt;Billingsley &amp; Bettini, &lt;xref ref-type="bibr" rid="bibr5"&gt;2019&lt;/xref&gt;; Firestone et al., &lt;xref ref-type="bibr" rid="bibr19"&gt;2020&lt;/xref&gt;; Hallinger &amp; Kova&amp;#269;evi&amp;#263;, &lt;xref ref-type="bibr" rid="bibr23"&gt;2019&lt;/xref&gt;; Rowan et al., &lt;xref ref-type="bibr" rid="bibr49"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Other four tools not used in education yet but have the potential to be utilized&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SysRev&lt;/td&gt;&lt;td&gt;SysRev helps systematic reviewers with machines to extract data from documents. It claims to be built for general purpose data miners and is available on GitHub (&lt;ext-link ext-link-type="url" href="https://github.com/sysrev/Sysrev%5fDocumentation" /&gt;)&lt;/td&gt;&lt;td char="."&gt;2021&lt;/td&gt;&lt;td&gt;Bozada et al., &lt;xref ref-type="bibr" rid="bibr7"&gt;2021&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Insilica&lt;/td&gt;&lt;td&gt;USA&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SWIFT-Active Screener&lt;/td&gt;&lt;td&gt;SWIFT-Active Screener is a web-based, collaborative systematic review software application. It was developed by Sciome LLC, which is an innovative research and technology consulting company.&lt;/td&gt;&lt;td char="."&gt;2016&lt;/td&gt;&lt;td&gt;Miller et al., &lt;xref ref-type="bibr" rid="bibr35"&gt;2016&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Sciome LLC&lt;/td&gt;&lt;td&gt;USA&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CADIMA&lt;/td&gt;&lt;td&gt;CADIMA is a free web tool facilitating the conduct and assuring for the documentation of systematic reviews, systematic maps, and further literature reviews.&lt;/td&gt;&lt;td char="."&gt;2018&lt;/td&gt;&lt;td&gt;Kohl et al., &lt;xref ref-type="bibr" rid="bibr28"&gt;2018&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Julius K&amp;#252;hn-Institut&lt;/td&gt;&lt;td&gt;Germany&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ReLiS&lt;/td&gt;&lt;td&gt;ReLiS stands for Revue Litteraire Syst&amp;#233;matique which is French for Systematic Literature Reviews ReliS literally translates to "reread." It is available on GitHub (&lt;ext-link ext-link-type="url" href="https://github.com/geodes-sms/relis" /&gt;). ReLiS is a highly configurable tool to conduct systematic reviews collaboratively and iteratively on the cloud.&lt;/td&gt;&lt;td char="."&gt;2018&lt;/td&gt;&lt;td&gt;Bigendako &amp; Syriani, &lt;xref ref-type="bibr" rid="bibr4"&gt;2018&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Software engineering lab GEODES in the department of computer science and operations research, University of Montreal&lt;/td&gt;&lt;td&gt;Canada&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Article retrieval was enabled with an open-source web-app called Paperfetcher (Pallath &amp; Zhang, [<reflink idref="bib40" id="ref34">40</reflink>]), which automates handsearching, forward and backward citation chasing, and allows bulk export to common bibliographic data formats used by screening tools. We used the handsearching function in Paperfetcher to retrieve studies published in <emph>RER</emph> from January 1, 2015 to October 2, 2021. The screening process involves two steps. First, we examine each study's eligibility by excluding studies that do not report any tool usage in the method section. Second, in the full-text screening, we classified studies by their reported use of tools used during the screening process. In the second step, we only included studies that report the use of screening tools. We chose Covidence to conduct this screening process because it enables both title and abstract screening and full-text review, and because we have an institutional license to use this tool.</p> <hd id="AN0178458564-9">Objective 3. Analyzing Differing Features of Available Screening Tools to Inform Educational...</hd> <p>To understand the specific features and functions of our identified screening tools, we conducted a feature analysis. The process of selecting and assigning scores to features was guided by Kitchenham's ([<reflink idref="bib26" id="ref35">26</reflink>]) DESMET method, a validated method to evaluate a software (Kitchenham &amp; Linkman, [<reflink idref="bib27" id="ref36">27</reflink>]). Out of all the evaluation methods presented in DESMET, we selected the screening mode under feature analysis since this method requires the least amount of labor and is the fastest and the least costly method. The first step in the screening mode approach is to identify candidate tools for evaluation, which is completed in Objective 1b. The second step is to choose features, which is completed after an iterative process of discussion among the authors and within the authors' internal network. In addition, previous research on similar topics and the PRISMA statement are consulted and considered in the feature selection process. After deciding on which features to compare, we then examined these screening tools' official websites, publicly-accessible training videos, and academic publications introducing the tools. Tools that were developed by academics gave detailed documentation on their functions, robustness, and limitations in publication. There was no restriction on the date of development of tools—tools could have been developed at any point.</p> <p>Tools' features were coded as dummy variables (0/1), where 0 means the absence of a feature and 1 means the presence of a feature. We separated various features into general functions and the two stages of the screening process: title and abstract screening stage, and full-text review stage. General functions include bulk import and export, teamwork, blind review, duplicate removal, and research update. The title and abstract screening stage's features include the possibility of title/abstract screening, machine learning classifiers, deep learning, and inter-rater reliability. Apart from machine learning classifiers, we also coded whether the screening tool uses deep neural network models for improved performance. Full-text review stage's features include the possibility of full-text review, inter-rater reliability, and reason labels. Categorizing these features gives reviewers a more holistic comparison for each stage of the systematic reviewing process and enables them to compare and select the most suitable tool. We recorded some other non-quantifiable features separately, such as cost and privacy policy. We also calculated a feature analysis score that is simply the sum of the number of features coded above. While this approach treats each feature as equally important, we recognize this is likely not the case, because each researcher will rank them differently. However, this does give one way to identify which tools have the greatest number of the important features we have identified.</p> <hd id="AN0178458564-10">Title/Abstract Screening and Full-Text Review</hd> <p>These are the two basic features a screening tool should address to meet the PRISMA standard. All tools analyzed in this article provide a mechanism for screening based on the content of the title and abstract; however, some tools provide the additional capability of streaming the included studies into the second stage of full-text review. Depending on the research goals and methods, sometimes it is sufficient to use tools that only provide title and abstract screening. For comprehensive systematic reviews, it is important for tools to handle full-text reviews as well since it helps generate the flow chart diagram, keeps the record of screening decisions together, and makes it more flexible to change screening decisions. This can assist researchers to better document decisions at different stages of the review process.</p> <hd id="AN0178458564-11">Bulk Import and Export</hd> <p>After a literature search, researchers usually download files with their search results directly from databases or store retrieved articles in citation management tools. In both cases, citations of retrieved articles can be stored in common bibliographic formats (e.g., RIS, BibTeX, CSV). These standard data formats are compatible with a wide range of applications, including citation management tools, such as EndNote and Zotero, and screening tools, such as Covidence and Abstrackr. Bulk import and export mean two functions: First, researchers can import thousands of studies into the screening tool in their bibliographic formats; Second, upon completion of screening, researchers can export screening decisions in bulk to facilitate further coding analysis. This feature is coded 1 if the tool allows researchers to import or export thousands of studies in one single action. Bulk import and export is a necessary function for researchers to manage numerous studies in an efficient way. These features can save time for researchers because they can efficiently combine results from multiple sources.</p> <hd id="AN0178458564-12">Machine Learning Classifiers and Deep Learning</hd> <p>Machine learning algorithms require labeled training data (such as the abstract of a study and a label that classifies the study as relevant or irrelevant) to make predictions for unseen data. In the case of systematic review, labels for data are not available <emph>a-priori</emph>. Therefore, a systematic review can benefit from active learning algorithms, a special case of machine learning, which iteratively chooses data points they would like the user to label to gradually improve classification performance (Singh et al., [<reflink idref="bib54" id="ref37">54</reflink>]). Machine learning has immense potential to change how modern systematic review is conducted. Traditionally, screening has consumed significant time and labor and is prone to human error. Machine learning-based text classification algorithms can help reduce time, labor, and human error by classifying or ranking studies by their relevance to expedite the systematic review process (Cohen et al., [<reflink idref="bib14" id="ref38">14</reflink>]). While these algorithms are increasingly being applied in the abstract and title screening stage, they are not yet ready to replace human reviewers in the full-text article quality assessment stage (Pigott &amp; Polanin, [<reflink idref="bib43" id="ref39">43</reflink>]).</p> <p>Feature extraction algorithms process the title and abstract text and convert them into a form that can be used by text classifiers to predict the relevance of the study. Machine learning algorithms use these features to classify the text. Shallow support vector machine (SVM) models are among the most popular models used for text classification (Schoot et al., [<reflink idref="bib52" id="ref40">52</reflink>]). However, these models are increasingly being replaced by more advanced algorithms. In the last few years, deep neural network models, which use multiple non-linear processing layers to classify data, have rapidly become the state of the art due to their high modeling capacity, generalizability, and performance (Young et al., [<reflink idref="bib63" id="ref41">63</reflink>]). For small review projects, machine learning and deep learning may not make substantial differences. For large review projects, these algorithms can expedite the screening process and tools that embed deep learning make more reliable predictions on inclusion and exclusion than tools that embed machine learning. We code the machine learning feature as 1 or 0 to differentiate whether the tools embed any text-classification machine learning algorithms. Among tools that embed machine learning algorithms, we code deep learning features as 1 or 0 to differentiate whether the tools embed any deep neural network models.</p> <hd id="AN0178458564-13">Teamwork and Blind Review</hd> <p>For collaborative systematic reviews, web-based screening tools can support teamwork functions while desktop or terminal based tools are generally limited to local computer use. This feature is coded 1 if the tool has a web-based version that allows users to collaborate on the same project or review. In some cases, different account users are allowed to collaborate on projects only if they pay additional fees. Teamwork is an essential element for high-quality and rigorous systematic review. Many guidelines and handbooks (Cooper et al., [<reflink idref="bib15" id="ref42">15</reflink>]; Pigott &amp; Polanin, [<reflink idref="bib43" id="ref43">43</reflink>]) list teamwork as one of the best practices for rigorous systematic reviews. Authors of these guidelines and handbooks also advocate or even require that two reviewers should screen articles independently and conduct regular team meetings to discuss and resolve conflicts (Cooper et al., [<reflink idref="bib15" id="ref44">15</reflink>]; Pigott &amp; Polanin, [<reflink idref="bib43" id="ref45">43</reflink>]). These features can simplify the process of collaboration and independent review.</p> <hd id="AN0178458564-14">Deduplication</hd> <p>Although the function of duplication removal is available in most citation management software tools, having this function embedded in screening tools directly can save reviewers enormous time and effort. In addition, deduplication helps authors keep a systematic record of screening procedures and track the screening history for any clarifications or modifications. Most importantly, automatic deduplication prevents double counting of studies and double screening in research synthesis and prevents reviewers from wasting time screening multiple entries of the same study. Furthermore, reporting the deduplication step in research articles also assists the open science movement.</p> <p>On a side note, users should be aware of the varying effectiveness in the accuracy of deduplication among screening software. Perhaps since the development and implementation of advanced tools is a recent phenomenon, there is a paucity of research that investigates and compares sensitivity, specificity, negative predictive value, and positive predictive value among screening software. One study found that Rayyan has higher accuracy and sensitivity than Covidence, but the latter has higher specificity than the former (McKeown &amp; Mir, [<reflink idref="bib31" id="ref46">31</reflink>]). Another study that compared review tools with reference management tools founds that Rayyan, Mendeley, and Systematic Review Accelerator are more reliable than EndNote and Zotero (Guimarães et al., [<reflink idref="bib22" id="ref47">22</reflink>]). In the future, more research is required for us to gain a better understanding of the reliability of deduplication functions in screening software.</p> <hd id="AN0178458564-15">Inter-Rater Reliability</hd> <p>Inter-rater reliability (IRR) is defined as the agreement between two independent coders[<reflink idref="bib1" id="ref48">1</reflink>] on inclusion or exclusion decisions. IRR can be calculated at both the title and abstract screening stage as well as the full-text review stage. Reporting IRR details contributes to the transparency of systematic reviews. Although IRR can be calculated manually, enabling this function with a click of a button in the screening software could save reviewers a tremendous amount of time and effort. When the review team consists of multiple reviewers with different experience levels, IRR is an important metric to control for the review quality and reviewers' reliability since it computes the amount of error the review intends to tolerate. Disagreements on screening decisions may arise due to different understandings of the eligibility criteria or reviewers' physical and mental fatigue and subsequent human errors (Belur et al., [<reflink idref="bib3" id="ref49">3</reflink>]). To tackle the problem of "coder drift," Polanin et al. ([<reflink idref="bib45" id="ref50">45</reflink>]) recommends arranging regular review meetings to reconcile disagreement to reduce interpretation inconsistencies. We think that for larger review projects, using in-the-progress IRR can assist the reconciliation process as early as possible.</p> <hd id="AN0178458564-16">Research Update</hd> <p>Evidence-based reform in education has contributed to a proliferation of well-designed experiments in recent years, which constantly calls for updated systematic reviews. On average, the median update time for a systematic review is more than five years (Bashir et al., [<reflink idref="bib2" id="ref51">2</reflink>]). Updating systematic reviews and including the latest evidence on the topic has a practical influence on evidence-based policy-making. Therefore, research update is another feature we compare in our feature analysis. We coded this feature as 1 if the tool retains the review materials and allows researchers to access review materials in the future for updating purposes.</p> <hd id="AN0178458564-17">Reason Label</hd> <p>This is a function in full-text screening to help reviewers note their reasons for making inclusion or exclusion decisions. Reason labeling is important for reviewers, as it helps them understand frequently occurring limitations in the research they are screening. In some cases, reviewers adjust either relax inclusion criteria or tighten exclusion criteria depending on whether too few or too many studies are qualified. In this scenario, reason labeling enables reviewers to re-screen excluded studies by category and adjust in a more efficient and organized way.</p> <hd id="AN0178458564-18">Cost</hd> <p>Software prices differ substantially based on the purpose (e.g., academic or business), user (e.g., student or faculty), and required functions (e.g., number of projects, number of collaborators, access to partial or all functions). Due to this variety, we code this feature qualitatively. Cost is one of the most important features because researchers' decisions to select a screening tool can largely depend on whether they can afford it with research funding or whether access to the software is provided by the researchers' institutions.</p> <hd id="AN0178458564-19">Privacy Policy</hd> <p>Information on privacy policy is collected from software companies' policy documents. This feature is difficult to compare directly. We analyzed privacy policy adapting from the table of comparison developed by Bischoff ([<reflink idref="bib6" id="ref52">6</reflink>]) when the author compared internet companies' privacy policies. In addition to the points Bischoff ([<reflink idref="bib6" id="ref53">6</reflink>]) coded, we added a regular review of privacy policy, option to delete personal information, contact you for opinions, collects payment details, and collects device information.</p> <hd id="AN0178458564-20">Results</hd> <p></p> <hd id="AN0178458564-21">Objective 1. Identifying Available Screening Tools</hd> <p>The web-based wide search identified 26 available screening tools in the market. Among the 26 identified tools, eight have been used by educational researchers, including Abstrackr (Wallace et al., [<reflink idref="bib60" id="ref54">60</reflink>]), Covidence (Covidence Systematic Review Software, [<reflink idref="bib16" id="ref55">16</reflink>]), ASReview (Schoot et al., [<reflink idref="bib52" id="ref56">52</reflink>]), RevMan (Review Manager, [<reflink idref="bib47" id="ref57">47</reflink>]), Rayyan (Ouzzani et al., [<reflink idref="bib39" id="ref58">39</reflink>]), EPPI-Reviewer (Thomas et al., [<reflink idref="bib57" id="ref59">57</reflink>]), DistillerSR ([<reflink idref="bib17" id="ref60">17</reflink>]), and Excel spreadsheet. Table 1 presents detailed basic information about the above-mentioned eight included screening tools and other four tools not used in education yet but have the potential to be utilized. Table 2 presents the features of the other 14 tools and reasons for excluding them.</p> <p>Table 2. Other 14 screening tools.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;Name&lt;/td&gt;&lt;td&gt;Reason for exclusion&lt;/td&gt;&lt;td&gt;Information&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td&gt;Colandr&lt;/td&gt;&lt;td&gt;Biodiversity&lt;/td&gt;&lt;td&gt;Colandr is an open-source and open-access machine-learning assisted online platform for conducting systematic reviews and syntheses with a focus in biodiversity research (Cheng et al., &lt;xref ref-type="bibr" rid="bibr12"&gt;2018&lt;/xref&gt;).&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SyRF&lt;/td&gt;&lt;td&gt;Preclinical&lt;/td&gt;&lt;td&gt;SyRF is a fully integrated online platform for conducting systematic reviews of preclinical studies developed by researchers from The University of Edinburgh. It is completely free for researchers.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RobotAnalyst&lt;/td&gt;&lt;td&gt;Public health&lt;/td&gt;&lt;td&gt;RobotAnalyst was developed to support systematic reviews in public health interventions. It supports literature screening with machine learning and text mining algorithms. It adopts topic modeling and relevance feedback-based text classification models.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SRDR Plus&lt;/td&gt;&lt;td&gt;Healthcare&lt;/td&gt;&lt;td&gt;SRDR Plus (The Systematic Review Data Repository: Plus) is a free tool for extracting, managing, and archiving data. It was developed by researchers at Brown University to support healthcare research primarily. It contributes to open science through an open systematic review data repository.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DRAGON&lt;/td&gt;&lt;td&gt;Health and environmental research&lt;/td&gt;&lt;td&gt;DRAGON, launched in 2018, stores qualitative and quantitative data from literature to help scientists implement the elements of systematic review, including problem formulation, literature screening, risk of bias evaluation, and data integration. It was developed for health and environmental research.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SRDB.PRO&lt;/td&gt;&lt;td&gt;Pharmaceutical industry and health, and health economics consultancies&lt;/td&gt;&lt;td&gt;SRDB.PRO is the first enterprise level systematic review and data analysis platform designed specifically for the pharmaceutical industry and health, and health economics consultancies. It integrates the PubMed literature database. It costs $119 for a two-user commercial license and $76,500 for a 5-user commercial license. For an academic license, the cost is $0 for two users and $8,500 for 100 users.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;BioReader (Biomedical Research Article Distiller)&lt;/td&gt;&lt;td&gt;Biomedical research&lt;/td&gt;&lt;td&gt;BioReader is a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. The tool is trained by uploading article corpora for two training categories&amp;#8212;e.g., one positive and one negative for content of interest&amp;#8212;as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms. BioReader is freely available as a web service.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;PASRIFAL&lt;/td&gt;&lt;td&gt;Software engineering&lt;/td&gt;&lt;td&gt;Parsifal is an online tool designed to support researchers to perform systematic literature reviews within the context of Software Engineering. It claims to support geographically distributed researchers.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Thoth&lt;/td&gt;&lt;td&gt;Software engineering&lt;/td&gt;&lt;td&gt;Thoth is a web-based support tool developed to support the SLR process in software engineering.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;StArt&lt;/td&gt;&lt;td&gt;Software engineering&lt;/td&gt;&lt;td&gt;State of the Art through systematic review (StArt) aims to provide support for each stage of the SR process in software engineering. It was developed in Brazil by The Laboratory of Research on Software Engineering (LaPES), the Computing Department of the Federal University of S&amp;#227;o Carlos (DC/UFSCar). Available at &lt;ext-link ext-link-type="url" href="http://lapes.dc.ufscar.br/tools/start%5ftool" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Revtools&lt;/td&gt;&lt;td&gt;R package&lt;/td&gt;&lt;td&gt;Revtools is an R package to support article screening for evidence synthesis. It supports importing, deduplication, title/article screening, and visualization of article content using topic models (Westgate, &lt;xref ref-type="bibr" rid="bibr62"&gt;2019&lt;/xref&gt;).&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;METAGEAR&lt;/td&gt;&lt;td&gt;R package&lt;/td&gt;&lt;td&gt;Metagear is a R package published in 2021. It is a comprehensive, multifunctional toolbox with capabilities aimed to cover much of the research synthesis taxonomy: from applying a systematic review approach to objectively assemble and screen the literature, to extracting data from studies, and to finally summarize and analyze these data with the statistics of meta-analysis. It is available on GitHub (&lt;ext-link ext-link-type="url" href="https://github.com/mjlajeunesse/metagear" /&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;FASTREAD&lt;/td&gt;&lt;td&gt;Machine learning framework&lt;/td&gt;&lt;td&gt;FASTREAD is one of the state-of-the-art automatic methods to expedite reference screening with study prioritization. It is open-source and open-access and is available on GitHub (&lt;ext-link ext-link-type="url" href="https://github.com/fastread/SLR%5fon%5fTCP" /&gt;). It is a machine learning framework.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DBPedia&lt;/td&gt;&lt;td&gt;Framework&lt;/td&gt;&lt;td&gt;A resource description framework repository to support automated selection of primary studies.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <hd id="AN0178458564-22">Objective 2. Assessing the Tools Educational Researchers Are Already Using through Tool Repor...</hd> <p>Article retrieval returned 191 full-text studies from <emph>RER.</emph>Figure 1 is a PRISMA diagram that shows the record selection process conducted in Covidence (Covidence Systematic Review Software, [<reflink idref="bib16" id="ref61">16</reflink>]). In the first full-text stage, 166 studies were excluded for not reporting any tool usage. Only 25 studies reported the use of any tools. In the second full-text stage, we excluded 12 studies for reporting tools not used for screening or citation management purposes. These 12 studies reported tools used to aid in coding information from studies, which include CMA, Nvivo, SAS, SPSS, Google forms, and Excel. In addition, we excluded five studies that reported the use of citation management tools to conduct screening, with one study using Scopus (Schrijvers et al., [<reflink idref="bib53" id="ref62">53</reflink>]), three studies using EndNote X7 (Braunack-Mayer et al., [<reflink idref="bib9" id="ref63">9</reflink>]; Kyndt et al., [<reflink idref="bib29" id="ref64">29</reflink>]; Surr et al., [<reflink idref="bib56" id="ref65">56</reflink>]), and one study using Zotero (Sabey et al., [<reflink idref="bib50" id="ref66">50</reflink>]).</p> <p>Graph: Figure 1. PRISMA diagram.</p> <p>Our research found that only eight (4.19%) studies reported tools used during screening, with one study using DistillerAI (Noetel et al., [<reflink idref="bib37" id="ref67">37</reflink>]), one study using EPPI-Reviewer 4 (Merrill, [<reflink idref="bib33" id="ref68">33</reflink>]), one study using Covidence (Lee et al., [<reflink idref="bib30" id="ref69">30</reflink>]), one study using Abstrackr (Bae et al., [<reflink idref="bib1" id="ref70">1</reflink>]), and others using Excel spreadsheets (Billingsley &amp; Bettini, [<reflink idref="bib5" id="ref71">5</reflink>]; Firestone et al., [<reflink idref="bib19" id="ref72">19</reflink>]; Hallinger &amp; Kovačević, [<reflink idref="bib23" id="ref73">23</reflink>]; Rowan et al., [<reflink idref="bib49" id="ref74">49</reflink>]). Figure 2 presents bar plots of the above findings.</p> <p>Graph: Figure 2. Barplot for citation management tool and screening tool reporting status. Note. The R code used to generate this figure is available on GitHub: https://github.com/qiyangzh/Data-on-Choosing-the-right-tool-for-the-job-Screening-tools-for-systematic-reviews-in-education.git.</p> <hd id="AN0178458564-23">Objective 3. Analyzing Differing Features of Available Screening Tools to Inform Educational...</hd> <p>Figure 3 presents the feature analysis score of included screening tools. The ranking from high to low for the eight tools currently used in educational research is: Covidence (<reflink idref="bib1" id="ref75">1</reflink>), DistillerSR (<reflink idref="bib2" id="ref76">2</reflink>, tied), EPPI-Reviewer (<reflink idref="bib2" id="ref77">2</reflink>, tied), Rayyan (<reflink idref="bib4" id="ref78">4</reflink>), Abstrackr (<reflink idref="bib5" id="ref79">5</reflink>, tied), RevMan (<reflink idref="bib5" id="ref80">5</reflink>, tied), ASReview (<reflink idref="bib7" id="ref81">7</reflink>), and Excel (<reflink idref="bib8" id="ref82">8</reflink>). If we include the four tools not used in education yet but have the potential to be utilized, the overall ranking becomes Covidence (<reflink idref="bib1" id="ref83">1</reflink>), DistillerSR (<reflink idref="bib2" id="ref84">2</reflink>, tied), EPPI-Reviewer (<reflink idref="bib2" id="ref85">2</reflink>, tied), CADIMA (<reflink idref="bib4" id="ref86">4</reflink>), Swift-Active (<reflink idref="bib5" id="ref87">5</reflink>), Rayyan (<reflink idref="bib6" id="ref88">6</reflink>, tied), SysRev (<reflink idref="bib6" id="ref89">6</reflink>, tied), Abstrackr (<reflink idref="bib8" id="ref90">8</reflink>, tied), ReLiS (<reflink idref="bib8" id="ref91">8</reflink>, tied), RevMan (<reflink idref="bib8" id="ref92">8</reflink>, tied), ASReview (<reflink idref="bib11" id="ref93">11</reflink>), and Excel (<reflink idref="bib12" id="ref94">12</reflink>). Table 3 presents a detailed comparison of scores among these tools. The Supplementary Materials provide more information (e.g., each tool's features, strengths, and limitations) on the eight included screening tools. The following text provides some additional comments on the feature comparison.</p> <p>Graph: Figure 3. Feature analysis score ranking.</p> <p>Table 3. Features of the eight screening tools.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;Name&lt;/td&gt;&lt;td&gt;Platform&lt;/td&gt;&lt;td&gt;General functions&lt;xref ref-type="table-fn" rid="tfn5"&gt;***&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Title/abstract screening stage&lt;/td&gt;&lt;td&gt;Full-text review stage&lt;/td&gt;&lt;td /&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Bulk import/ export&lt;/td&gt;&lt;td&gt;Team work&lt;/td&gt;&lt;td&gt;Blind review&lt;/td&gt;&lt;td&gt;De- duplicate&lt;/td&gt;&lt;td&gt;Update&lt;/td&gt;&lt;td&gt;Title/abstract screening&lt;/td&gt;&lt;td&gt;ML&lt;/td&gt;&lt;td&gt;Deep learning&lt;/td&gt;&lt;td&gt;IRR&lt;/td&gt;&lt;td&gt;Full-text review&lt;/td&gt;&lt;td&gt;IRR&lt;/td&gt;&lt;td&gt;Reason label&lt;xref ref-type="table-fn" rid="tfn3"&gt;*&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Decision labels&lt;xref ref-type="table-fn" rid="tfn4"&gt;**&lt;/xref&gt;&lt;/td&gt;&lt;td&gt;Licensing&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td&gt;Abstrackr&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;Relevant, borderline, irrelevant&lt;/td&gt;&lt;td&gt;Free&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Covidence&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td&gt;Include, maybe, exclude&lt;/td&gt;&lt;td&gt;Not free $240&amp;#8211;$635&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ASReveiw&lt;/td&gt;&lt;td&gt;Terminal and Python-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;Relevant, irrelevant&lt;/td&gt;&lt;td&gt;Free&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RevMan Web/RevMan 5&lt;/td&gt;&lt;td&gt;Web-based or desktop software&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;td&gt;$72.87&amp;#8211;$120.24 and more&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rayyan&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;Include, undecide, exclude&lt;/td&gt;&lt;td&gt;Free with limited functions, $48&amp;#8211;$99 and more&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;EPPI-Reviewer&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td&gt;Exclude, include for second opinion, include&lt;/td&gt;&lt;td&gt;Not free $144.54&amp;#8211;$505.89 and more&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DistillerSR&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td&gt;Yes, no, can't tell&lt;/td&gt;&lt;td&gt;Not free $239.4&amp;#8211;$3636&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Excel&lt;/td&gt;&lt;td&gt;Web-based or desktop software&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;&amp;#8211;&lt;/td&gt;&lt;td&gt;Free&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sysrev&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td /&gt;&lt;td&gt;Not Free $0&amp;#8211;$120 and more&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SWIFT-Active Screener&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td&gt;Include, exclude&lt;/td&gt;&lt;td&gt;Not Free Nontransparent pricing&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CADMIA&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td&gt;Criteria or comment&lt;/td&gt;&lt;td&gt;Free&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ReLiS&lt;/td&gt;&lt;td&gt;Web-based&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td&gt;Include, exclude&lt;/td&gt;&lt;td&gt;Free&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>1 IRR: inter-rater reliability; ML: machine learning classifiers.</p> <ulist> <item>2 <emph>Note.</emph> When a feature is labeled 0, it means that the authors did not find the specific function, which does not necessarily mean that the tool lacks this function. In licensing prices, price range is the annual subscription fee in US dollars as per information accessed on March 14, 2023. Detailed information on price plans is provided in the Supplementary Information. Raw data used to create this table is available through GitHub: https://github.com/qiyangzh/Data-on-Choosing-the-right-tool-for-the-job-Screening-tools-for-systematic-reviews-in-education.git.</item> <item>3 *Reason label means whether the tool allows users to customize comments on the reasons for their decisions.</item> <item>4 **Decision label means labels provided by the tools when making a decision.</item> <item>5 ***General functions' scores are only counted for full-text review score if the tool enables full-text review.</item> </ulist> <p>More than five screening tools use machine learning algorithms for title and abstract screening. All the tools we include use shallow support vector machine (SVM) models for text classification. Among the tools we included, only ASReview has the option to use deep neural network models for both classification and feature extraction, in addition to a wide range of shallow machine learning algorithms. One thing to note is that ASReview's use of a training model returns a relevancy ranking, which means that the reviewers need to arbitrarily decide a stopping point and the number of relevant studies to extract from the list of relevancy ranking.</p> <p>In terms of real-time collaboration, all included tools, except ASReview, are web-based. It is difficult to enable real-time collaboration on ASReview and almost impossible to keep track of screening decisions. ASReview does allow researchers to set up a common server using certain codes in the terminal, but the procedure is not straightforward and often requires assistance from IT department. As not all researchers have the knowledge or resources to set up custom IT infrastructure, ASReview might preclude the possibility of teamwork. Unlike researchers from STEM disciplines, educational researchers often are not required to have advanced training in Python and other programming languages. Since ASReview is Python- and terminal-based, it poses technical obstacles for systematic reviewers in the field of education.</p> <p>In terms of deduplication, Covidence, Rayyan, EPPI-Reviewer, and DistillerSR enable deduplication at the title and abstract screening stage. However, not all deduplication algorithms are reliable. A previous study comparing Covidence and Rayyan (Kellermeyer et al., [<reflink idref="bib25" id="ref95">25</reflink>]) reports that Rayyan missed several known duplicates and thus recommends reviewers to carry out the deduplication process using citation software instead of Rayyan. For tools that cannot automatically detect or remove duplicates, reviewers need to first remove duplicates in another software (e.g., citation management software) before uploading them into these screening tools.</p> <p>In terms of the cost of the software license, not all tools are free of charge. In fact, some tools are quite expensive. Only Abstrackr and ASReview are completely free of charge at the moment. Rayyan provides a free license with limited access to functions. At this time, DistillerSR is the most expensive software among all included screening tools in this article. Since prices change frequently, readers should be aware that data on pricing were accessed on March 14, 2023. Details on pricing can be found in Supplementary Information.</p> <p>Table 4 compares tools in terms of privacy policies. Unlike many other tools, Abstrackr promises to never share users' personal information with any third parties. Users also have the option to erase all recorded information by emailing the Brown University maintenance team. ASReview has the most secure privacy policy, which states that the software never tries to access user information.</p> <p>Table 4. Privacy policy feature analysis.</p> <p> <ephtml> &lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;Tool&lt;/td&gt;&lt;td&gt;Adds cookies&lt;/td&gt;&lt;td&gt;Collects personal information&lt;/td&gt;&lt;td&gt;Collects device information&lt;/td&gt;&lt;td&gt;Collects payment details&lt;/td&gt;&lt;td&gt;Contact you for opinions&lt;/td&gt;&lt;td&gt;Collects location&lt;/td&gt;&lt;td&gt;Uses 1st party ads&lt;/td&gt;&lt;td&gt;Uses 3rd party ads/links&lt;/td&gt;&lt;td&gt;Shares personal info with 3rd party&lt;/td&gt;&lt;td&gt;Stores info in multiple countries&lt;/td&gt;&lt;td&gt;Indefinite retention of information&lt;/td&gt;&lt;td&gt;Regular review of privacy policy&lt;/td&gt;&lt;td&gt;Option to delete personal information&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td&gt;Abstrackr&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Covidence&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ASReveiw&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RevMan&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rayyan&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;EPPI-Reviewer&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DistillerSR&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Excel&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;SysRev&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Swift-Active&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CADIMA&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;td char="."&gt;1&lt;/td&gt;&lt;td char="."&gt;0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ReLiS&lt;/td&gt;&lt;td&gt;NA&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>6 <emph>Note.</emph> When a feature is labeled 0, it means that the authors did not find the specific function, which does not necessarily mean that the tool lacks this function. ReLiS does not have privacy policy available online.</p> <hd id="AN0178458564-24">Discussion</hd> <p>In total, this article identifies 26 available tools to support screening large numbers of studies for systematic reviews. Not all 26 tools were suitable for the field of education. In Objective 1b, we identified 12 tools: eight screening tools used by educational researchers and four tools that are not yet utilized but have the potential for educational researchers. In our second objective, we screened published articles in <emph>RER</emph> and found a low rate of transparent tool reporting. In the third objective, we performed feature analysis on the 12 tools identified in Objective 1b. Based on the incorporation of the features documented in this study, these tools are more appropriate than spreadsheet or citation management tools and support higher quality research syntheses. Thus, research teams should make a tool decision that fits their specific set of needs and research questions.</p> <p>Although the rapid development of machine learning algorithms provides efficient ways to classify and rank the relevance of studies, algorithms are not perfect, and researchers need to be aware of the limitations. One such limitation is the potential existence of bias in machine learning. Mehrabi et al. ([<reflink idref="bib32" id="ref96">32</reflink>]) presented a comprehensive list of different biases in machine learning. Among these biases, representation bias, which arises when the set of data being trained does not follow the population distribution (Farquhar et al., [<reflink idref="bib18" id="ref97">18</reflink>]), is worth noting by systematic reviewers. In systematic review tools, supervised machine learning algorithms label unclassified studies based on a small set of studies manually classified by researchers, which is prone to representation bias. A case study on active learning in automated text classification provides evidence for the existence of such bias due to skewed sampling (Varghese et al., [<reflink idref="bib59" id="ref98">59</reflink>]). It is essential for tool developers to incorporate cutting-edge methods, such as novel corrective weights (Farquhar et al., [<reflink idref="bib18" id="ref99">18</reflink>]), to remove or compensate such biases. However, since this is still an emerging and developing field, existing screening tools may take some time to update their algorithms. One study ranked the performance of machine learning in descending order among three tools as: Abstrackr, RobotAnalyst, and DistillerSR, but the performance depends on the tool and review topics (Gates et al., [<reflink idref="bib20" id="ref100">20</reflink>]). In the future, more research on correcting machine learning bias and comparing such bias among various tools can help with tool development and selection.</p> <p>Given the limitations of machine learning, there are two ways to exercise caution and reduce algorithmic bias. One method researcher can adopt is to double check the studies excluded automatically by machine learning classifiers. For example, Zhang et al. ([<reflink idref="bib65" id="ref101">65</reflink>]) manually scanned 10% of the studies excluded by ASReview's machine learning algorithm to verify that these studies were indeed irrelevant. Another method is to replace one of the two human screeners with the machine screener to increase reliability since both human and machine screeners are imperfect (Gates et al., [<reflink idref="bib21" id="ref102">21</reflink>]). For human reviewers, an empirical study found 10.76% total error rate for false inclusion and exclusion (Wang et al., [<reflink idref="bib61" id="ref103">61</reflink>]). With aid from machine learning, human reviewers can focus their efforts on a smaller portion of studies and could reduce their error rate. However, for machines, there is a tradeoff between saving tons of workload and the potential of missing relevant studies. In general, we are not ready to completely automate the screening process at the moment, and semi-automation with thoughtful design is a more careful approach.</p> <hd id="AN0178458564-25">Implications for Practice</hd> <p>When making the decision on the right screening tools, researchers should first decide on the screening tool functions necessary for their particular set of study needs. This will require determining whether the studies will include quantitative or qualitative studies. The identified tools may be suitable for both types of research studies, particularly at the title and abstract screening phase, though some researchers may prefer to use qualitative software for the full-text review. Furthermore, researchers should decide on the number of reviewers on their screening team and their geographical locations to proceed with either individual research or team collaboration. Once these decisions are made, some of the tools described above can be excluded, making tool selection easier.</p> <p>To assist reviewers' tool selection process, we present a decision tree (Figure 4) based on our opinions. The first question in the decision tree asks whether reviewers are looking for a tool to review studies' full text. The second question in the decision tree asks researchers to check their research funding and institutional resource support to ensure that the cost of the software tools is covered. Depending on the pool size of literature, time frame, and labor resource, researchers may decide whether machine learning classifiers or deep learning is necessary. Therefore, the third and fourth questions ask whether reviewers are looking for a tool that has machine learning or deep learning support. This decision tree provides one logical flow to select tools. Reviewers may reverse the question order or skip non-essential questions to make informed tool-selecting decisions for their own review team. We understand that there may be other questions of interest, such as the ability to access the tool online (such as certain cloud-based products being banned in mainland China or limited Internet bandwidth in some areas making web-based tools unusable) or the ability of the tool to handle non-Latin characters. As part of this paper, we have posted the coded tools in a csv file so interested readers can rank based on whichever criteria they deem important. This decision tree is not meant to be comprehensive, rather to provide a starting point. Once researchers have identified a few potential tools, they can explore those in detail to determine the fit with their project goals.</p> <p>Graph: Figure 4. Decision tree. Note. The R code used to generate this figure is available on GitHub: https://github.com/qiyangzh/Data-on-Choosing-the-right-tool-for-the-job-Screening-tools-for-systematic-reviews-in-education.git.</p> <p>Finally, while it is essential to have good tools, it is also essential that the tools should be used in the right way. After choosing a suitable tool, it is important to train the team to use the tool correctly and systematically.</p> <hd id="AN0178458564-26">Directions for Future Research</hd> <p>One problem this article identifies is that publications seldom report the screening tools they use. Not reporting software tools for the screening procedure means that either the author(s) did not use any software, or they did not report it. If they did not use any assistive tools and culled thousands of articles manually, these researchers could benefit from leveraging existing technology for a less costly and more efficient literature screening process. The latter explanation is probably more common and reasonable, which means that the problem of scientific opacity is haunting the field of educational systematic reviews. This contradicts the PRISMA statement. The PRISMA statement illustrates the importance of reporting by stating that coherent, lucid, and transparent reporting is an important determinant of the value of a systematic review (Moher et al., [<reflink idref="bib36" id="ref104">36</reflink>]). Reporting the screening process in as much detail as possible is essential for future updates, reproducible research, assessment of the search's quality, and display of scientific rigor. The current use of screening tool reporting is far from ideal, and more effort is necessary to expedite the adoption of open science practices in the field of systematic reviews.</p> <p>Following the open science movement's call for open materials and open data, researchers should try to report or preregister their systematic reviews in as much detail as possible. The choice of whether to use semi-automation in the screening stage could have a great impact on the cost-effectiveness of the research procedure and the accuracy of the research outcomes. Having the information on the specific screening tools used in each research study could better help us compare the accuracy and reliability of the tools. Future studies can benchmark the machine learning algorithms against human reviewers to compare their performance against each other on the same dataset. Researchers could quantify labor and the software license as the cost formula and performance as the benefit to calculate cost-benefit ratio for each tool. This cost-benefit ratio analysis and accuracy comparison could further provide knowledge on tool selection and its impact on systematic review. Apart from reporting the tools used, researchers should also strive to make screening files public and shareable. For instance, attaching a link to screening files in the supplementary section of publications is a common practice to promote open science. Sharing in research facilitates new scientific advances, saves duplicative effort, and helps the field to progress together.</p> <p>In conclusion, this study contributes to the field of systematic reviews by comparing common screening tools used in educational research and providing practical guidance in tool selection. We advocate for researchers to select tools based on the suitability with research questions, domains, and designs, instead of on the convenience of accessing tools. Moreover, we found that published systematic reviews seldomly report tools used to assist research, which may create a barrier for replication. We encourage journal reviewers as well as authors to adopt open science approach to transparently report tool usage.</p> <hd id="AN0178458564-27">Open Research Statements</hd> <p></p> <hd id="AN0178458564-28">Study and Analysis Plan Registration</hd> <p>There is no study and analysis plan registration associated with this manuscript.</p> <hd id="AN0178458564-29">Data, Code, and Materials Transparency</hd> <p>The data and code underlying the results reported in this manuscript are available on GitHub: https://github.com/qiyangzh/Data-on-Choosing-the-right-tool-for-the-job-Screening-tools-for-systematic-reviews-in-education.git.</p> <hd id="AN0178458564-30">Design and Analysis Reporting Guidelines</hd> <p>Not applicable.</p> <hd id="AN0178458564-31">Transparency Declaration</hd> <p>The lead author (the manuscript's guarantor) affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.</p> <hd id="AN0178458564-32">Replication Statement</hd> <p>This manuscript reports an original study.</p> <hd id="AN0178458564-33">Acknowledgments</hd> <p>The authors acknowledge Hunter Gehlbach for constructive suggestions on the final draft. We also want to thank Akash Pallath and Zhipeng Hou for useful discussions on machine learning, and Susan Davis, Lisa Nehring, Rohan Arcot, and Katherine Cornwall for their suggestions on the earlier draft.</p> <hd id="AN0178458564-34">Disclosure Statement</hd> <p>No potential conflict of interest was reported by the author(s).</p> <hd id="AN0178458564-35">Open Scholarship</hd> <p>Graph</p> <p>This article has earned the https://osf.io/tvyxz/wiki/1.%20View%20the%20Badges/ badges for Open Data through Open Practices Disclosure. The data are openly accessible at https://osf.io/hckfq%20.</p> <ref id="AN0178458564-36"> <title> Footnotes </title> <blist> <bibl id="bib1" idref="ref1" type="bt">1</bibl> <bibtext> Coders refer to researchers who document the characteristics of each included study to foster a deeper understanding of the study quality.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref2" type="bt">2</bibl> <bibtext> Supplemental data for this article is available online at https://doi.org/10.1080/19345747.2023.2209079</bibtext> </blist> </ref> <ref id="AN0178458564-37"> <title> References </title> <blist> <bibtext> Bae, C. L., Mills, D. C., Zhang, F., Sealy, M., Cabrera, L., &amp; Sea, M. (2021). A systematic review of science discourse in k–12 urban classrooms in the united states: Accounting for individual, collective, and contextual factors. Review of Educational Research, 91 (6), 831 – 877. https://doi.org/10.3102/00346543211042415</bibtext> </blist> <blist> <bibtext> Bashir, R., Surian, D., &amp; Dunn, A. G. (2018). Time-to-update of systematic reviews relative to the availability of new evidence. Systematic Reviews, 7 (1), 195. https://doi.org/10.1186/s13643-018-0856-9</bibtext> </blist> <blist> <bibl id="bib3" idref="ref49" type="bt">3</bibl> <bibtext> Belur, J., Tompson, L., Thornton, A., &amp; Simon, M. (2021). Interrater reliability in systematic review methodology: Exploring variation in coder decision-making. Sociological Methods &amp; Research, 50 (2), 837 – 865. https://doi.org/10.1177/0049124118799372</bibtext> </blist> <blist> <bibl id="bib4" idref="ref4" type="bt">4</bibl> <bibtext> Bigendako, &amp; Syriani (2018). Modeling a tool for conducting systematic reviews iteratively. In Proceedings of the 6th International Conference on Model-Driven Engineering and Software Development (pp. 552 – 559). https://doi.org/10.5220/0006664405520559</bibtext> </blist> <blist> <bibl id="bib5" idref="ref5" type="bt">5</bibl> <bibtext> Billingsley, B., &amp; Bettini, E. (2019). Special education teacher attrition and retention: A review of the literature. Review of Educational Research, 89 (5), 697 – 744. https://doi.org/10.3102/0034654319862495</bibtext> </blist> <blist> <bibl id="bib6" idref="ref6" type="bt">6</bibl> <bibtext> Bischoff, P. (2017, March 20). Comparing the privacy policy of internet giants side-by-side. Comparitech. Retrieved from https://<ulink href="http://www.comparitech.com/blog/vpn-privacy/we-compared-the-privacy-policies-of-internet-giants-side-by-side/">www.comparitech.com/blog/vpn-privacy/we-compared-the-privacy-policies-of-internet-giants-side-by-side/</ulink></bibtext> </blist> <blist> <bibl id="bib7" idref="ref81" type="bt">7</bibl> <bibtext> Bozada, T., Borden, J., Workman, J., Del Cid, M., Malinowski, J., &amp; Luechtefeld, T. (2021). Sysrev: A FAIR platform for data curation and systematic evidence review. Frontiers in Artificial Intelligence, 4, 685298. https://doi.org/10.3389/frai.2021.685298</bibtext> </blist> <blist> <bibl id="bib8" idref="ref8" type="bt">8</bibl> <bibtext> Bradburn, S. (2018, July 2). 13 Best free meta-analysis software to use. Top Tip Bio. Retrieved from https://toptipbio.com/free-meta-analysis-software/</bibtext> </blist> <blist> <bibl id="bib9" idref="ref63" type="bt">9</bibl> <bibtext> Braunack-Mayer, A. J., Street, J. M., Tooher, R., Feng, X., &amp; Scharling-Gamba, K. (2020). Student and staff perspectives on the use of big data in the tertiary education sector: A scoping review and reflection on the ethical issues. Review of Educational Research, 90 (6), 788 – 823. https://doi.org/10.3102/0034654320960213</bibtext> </blist> <blist> <bibtext> Car, J., Carlstedt-Duke, J., Car, L. T., Posadzki, P., Whiting, P., Zary, N., Atun, R., Majeed, A., Campbell, J., &amp; Digital Health Education Collaboration (2019). Digital education in health professions: The need for overarching evidence synthesis. Journal of Medical Internet Research, 21 (2), e12913. https://doi.org/10.2196/12913</bibtext> </blist> <blist> <bibtext> Center for Evidence Synthesis in Health (2021). Software | Center for Evidence Synthesis in Health | Brown University. Retrieved from https://<ulink href="http://www.brown.edu/public-health/cesh/resources/software">www.brown.edu/public-health/cesh/resources/software</ulink></bibtext> </blist> <blist> <bibtext> Cheng, S. H., Augustin, C., Bethel, A., Gill, D., Anzaroot, S., Brun, J., DeWilde, B., Minnich, R. C., Garside, R., Masuda, Y. J., Miller, D. C., Wilkie, D., Wongbusarakum, S., &amp; McKinnon, M. C. (2018). Using machine learning to advance synthesis and use of conservation and environmental evidence. Conservation Biology: The Journal of the Society for Conservation Biology, 32 (4), 762 – 764. https://doi.org/10.1111/cobi.13117</bibtext> </blist> <blist> <bibtext> Clinton, V., &amp; Khan, S. (2019). Efficacy of open textbook adoption on learning performance and course withdrawal rates: A meta-analysis. AERA Open, 5 (3), 233285841987221. https://doi.org/10.1177/2332858419872212</bibtext> </blist> <blist> <bibtext> Cohen, A. M., Hersh, W. R., Peterson, K., &amp; Yen, P.-Y. (2006). Reducing workload in systematic review preparation using automated citation classification. Journal of the American Medical Informatics Association, 13 (2), 206 – 219. https://doi.org/10.1197/jamia.M1929</bibtext> </blist> <blist> <bibtext> Cooper, H., Hedges, L., &amp; Valentine, J. C. (2019). The handbook of research synthesis and meta-analysis (3rd ed.). Russell Sage Foundation. Retrieved from https://<ulink href="http://www.russellsage.org/publications/handbook-research-synthesis-and-meta-analysis-second-edition">www.russellsage.org/publications/handbook-research-synthesis-and-meta-analysis-second-edition</ulink></bibtext> </blist> <blist> <bibtext> Covidence Systematic Review Software (2013). Veritas health innovation. Retrieved from <ulink href="http://www.covidence.org">www.covidence.org</ulink></bibtext> </blist> <blist> <bibtext> DistillerSR (2021). Evidence Partners (2.35) [Computer software]. Retrieved from https://<ulink href="http://www.evidencepartners.com">www.evidencepartners.com</ulink></bibtext> </blist> <blist> <bibtext> Farquhar, S., Gal, Y., &amp; Rainforth, T. (2021). On statistical bias in active learning: How and when to fix it. arXiv:2101.11665. Retrieved from <ulink href="http://arxiv.org/abs/2101.11665">http://arxiv.org/abs/2101.11665</ulink></bibtext> </blist> <blist> <bibtext> Firestone, A. R., Cruz, R. A., &amp; Rodl, J. E. (2020). Teacher study groups: An integrative literature synthesis. Review of Educational Research, 90 (5), 675 – 709. https://doi.org/10.3102/0034654320938128</bibtext> </blist> <blist> <bibtext> Gates, A., Guitard, S., Pillay, J., Elliott, S. A., Dyson, M. P., Newton, A. S., &amp; Hartling, L. (2019). Performance and usability of machine learning for screening in systematic reviews: A comparative evaluation of three tools. Systematic Reviews, 8 (1), 278. https://doi.org/10.1186/s13643-019-1222-2</bibtext> </blist> <blist> <bibtext> Gates, A., Johnson, C., &amp; Hartling, L. (2018). Technology-assisted title and abstract screening for systematic reviews: A retrospective evaluation of the Abstrackr machine learning tool. Systematic Reviews, 7 (1), 45. https://doi.org/10.1186/s13643-018-0707-8</bibtext> </blist> <blist> <bibtext> Guimarães, N. S., Ferreira, A. J. F., Ribeiro Silva, R. d C., de Paula, A. A., Lisboa, C. S., Magno, L., Ichiara, M. Y., &amp; Barreto, M. L. (2022). Deduplicating records in systematic reviews: There are free, accurate automated ways to do so. Journal of Clinical Epidemiology, 152, 110 – 115. https://doi.org/10.1016/j.jclinepi.2022.10.009</bibtext> </blist> <blist> <bibtext> Hallinger, P., &amp; Kovačević, J. (2019). A bibliometric review of research on educational administration: Science mapping the literature, 1960 to 2018. Review of Educational Research, 89 (3), 335 – 369. https://doi.org/10.3102/0034654319830380</bibtext> </blist> <blist> <bibtext> Harrison, H., Griffin, S. J., Kuhn, I., &amp; Usher-Smith, J. A. (2020). Software tools to support title and abstract screening for systematic reviews in healthcare: An evaluation. BMC Medical Research Methodology, 20 (1), 7. https://doi.org/10.1186/s12874-020-0897-3</bibtext> </blist> <blist> <bibtext> Kellermeyer, L., Harnke, B., &amp; Knight, S. (2018). Covidence and Rayyan. Journal of the Medical Library Association, 106 (4), 580 – 583. https://doi.org/10.5195/jmla.2018.513</bibtext> </blist> <blist> <bibtext> Kitchenham, B. (1996). Desmet: A method for evaluating software engineering methods and tools [Technical Report TR96-09]. University of Keele. Retrieved from https://silo.tips/download/desmet-a-method-for-evaluating-software-engineering-methods-and-tools</bibtext> </blist> <blist> <bibtext> Kitchenham, B., &amp; Linkman, S. (2000). DESMET: A method for evaluating Software Engineering methods and tools. Retrieved from https://<ulink href="http://www.semanticscholar.org/paper/DESMET-%3A-A-method-for-evaluating-Software-methods-Kitchenham-Linkman/19026a49483bef1b0a68e53743b6e53f4e7a403c">www.semanticscholar.org/paper/DESMET-%3A-A-method-for-evaluating-Software-methods-Kitchenham-Linkman/19026a49483bef1b0a68e53743b6e53f4e7a403c</ulink></bibtext> </blist> <blist> <bibtext> Kohl, C., McIntosh, E. J., Unger, S., Haddaway, N. R., Kecke, S., Schiemann, J., &amp; Wilhelm, R. (2018). Online tools supporting the conduct and reporting of systematic reviews and systematic maps: A case study on CADIMA and review of existing tools. Environmental Evidence, 7 (1), 8. https://doi.org/10.1186/s13750-018-0115-5</bibtext> </blist> <blist> <bibtext> Kyndt, E., Gijbels, D., Grosemans, I., &amp; Donche, V. (2016). Teachers' everyday professional development: Mapping informal learning activities, antecedents, and learning outcomes. Review of Educational Research, 86 (4), 1111 – 1150. https://doi.org/10.3102/0034654315627864</bibtext> </blist> <blist> <bibtext> Lee, J., Sanders, T., Antczak, D., Parker, R., Noetel, M., Parker, P., &amp; Lonsdale, C. (2021). Influences on user engagement in online professional learning: A narrative synthesis and meta-analysis. Review of Educational Research, 91 (4), 518 – 576. https://doi.org/10.3102/0034654321997918</bibtext> </blist> <blist> <bibtext> McKeown, S., &amp; Mir, Z. M. (2021). Considerations for conducting systematic reviews: Evaluating the performance of different methods for de-duplicating references. Systematic Reviews, 10 (1), 38. https://doi.org/10.1186/s13643-021-01583-y</bibtext> </blist> <blist> <bibtext> Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., &amp; Galstyan, A. (2022). A survey on bias and fairness in machine learning. arXiv: 1908.09635. Retrieved from <ulink href="http://arxiv.org/abs/1908.09635">http://arxiv.org/abs/1908.09635</ulink></bibtext> </blist> <blist> <bibtext> Merrill, B. C. (2021). Configuring a construct definition of teacher working conditions in the united states: A systematic narrative review of researcher concepts. Review of Educational Research, 91 (2), 163 – 203. https://doi.org/10.3102/0034654320985611</bibtext> </blist> <blist> <bibtext> Microsoft (2021). Microsoft Excel Spreadsheet Software | Microsoft 365. Retrieved from https://<ulink href="http://www.microsoft.com/en-us/microsoft-365/excel">www.microsoft.com/en-us/microsoft-365/excel</ulink></bibtext> </blist> <blist> <bibtext> Miller, K., Howard, B. E., Phillips, J., Shah, M. R., Mav, D., &amp; Shah, R. R. (2016). SWIFT-Active Screener: Reducing literature screening effort through machine learning for systematic reviews. Poster Presentation at the Society of Toxicology's 55th Annual Meeting and ToxExpo, New Orleans, LA, USA.</bibtext> </blist> <blist> <bibtext> Moher, D., Liberati, A., Tetzlaff, J., &amp; Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ (Clinical Research ed.), 339, b2535. https://doi.org/10.1136/bmj.b2535</bibtext> </blist> <blist> <bibtext> Noetel, M., Griffith, S., Delaney, O., Sanders, T., Parker, P., del Pozo Cruz, B., &amp; Lonsdale, C. (2021). Video improves learning in higher education: A systematic review. Review of Educational Research, 91 (2), 204 – 236. https://doi.org/10.3102/0034654321990713</bibtext> </blist> <blist> <bibtext> Ødegaard, N. B., Myrhaug, H. T., Dahl-Michelsen, T., &amp; Røe, Y. (2021). Digital learning designs in physiotherapy education: A systematic review and meta-analysis. BMC Medical Education, 21 (1), 48. https://doi.org/10.1186/s12909-020-02483-w</bibtext> </blist> <blist> <bibtext> Ouzzani, M., Hammady, H., Fedorowicz, Z., &amp; Elmagarmid, A. (2016). Rayyan—A web and mobile app for systematic reviews. Systematic Reviews, 5 (1), 210. https://doi.org/10.1186/s13643-016-0384-4</bibtext> </blist> <blist> <bibtext> Pallath, A., &amp; Zhang, Q. (2023). Paperfetcher: A tool to automate handsearching and citation searching for systematic reviews. Research Synthesis Methods, 14 (2), 323 – 335. https://doi.org/10.1002/jrsm.1604</bibtext> </blist> <blist> <bibtext> Patall, E. A. (2021). Implications of the open science era for educational psychology research syntheses. Educational Psychologist, 56 (2), 142 – 160. https://doi.org/10.1080/00461520.2021.1897009</bibtext> </blist> <blist> <bibtext> Pei, L., &amp; Wu, H. (2019). Does online learning work better than offline learning in undergraduate medical education? A systematic review and meta-analysis. Medical Education Online, 24 (1), 1666538. https://doi.org/10.1080/10872981.2019.1666538</bibtext> </blist> <blist> <bibtext> Pigott, T. D., &amp; Polanin, J. R. (2020). Methodological guidance paper: High-quality meta-analysis in a systematic review. Review of Educational Research, 90 (1), 24 – 46. https://doi.org/10.3102/0034654319877153</bibtext> </blist> <blist> <bibtext> Polanin, J. R., Espelage, D. L., Grotpeter, J. K., Spinney, E., Ingram, K. M., Valido, A., El Sheikh, A., Torgal, C., &amp; Robinson, L. (2021). A meta-analysis of longitudinal partial correlations between school violence and mental health, school performance, and criminal or delinquent acts. Psychological Bulletin, 147 (2), 115 – 133. https://doi.org/10.1037/bul0000314</bibtext> </blist> <blist> <bibtext> Polanin, J. R., Pigott, T. D., Espelage, D. L., &amp; Grotpeter, J. K. (2019). Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses. Research Synthesis Methods, 10 (3), 330 – 342. https://doi.org/10.1002/jrsm.1354</bibtext> </blist> <blist> <bibtext> Rathbone, J., Hoffmann, T., &amp; Glasziou, P. (2015). Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Systematic Reviews, 4 (1), 80. https://doi.org/10.1186/s13643-015-0067-6</bibtext> </blist> <blist> <bibtext> Review Manager (2014). RevMan[Computer program] (5.3) [Computer software]. The Cochrane Collaboration.</bibtext> </blist> <blist> <bibtext> Roth, S. (2021). Research guides: Systematic reviews &amp; other review types: Systematic review tools. Retrieved from https://guides.temple.edu/systematicreviews/SRTools</bibtext> </blist> <blist> <bibtext> Rowan, L., Bourke, T., L'Estrange, L., Lunn Brownlee, J., Ryan, M., Walker, S., &amp; Churchward, P. (2021). How does initial teacher education research frame the challenge of preparing future teachers for student diversity in schools? A systematic review of literature. Review of Educational Research, 91 (1), 112 – 158. https://doi.org/10.3102/0034654320979171</bibtext> </blist> <blist> <bibtext> Sabey, C. V., Charlton, C. T., Pyle, D., Lignugaris-Kraft, B., &amp; Ross, S. W. (2017). A review of classwide or universal social, emotional, behavioral programs for students in kindergarten. Review of Educational Research, 87 (3), 512 – 543. https://doi.org/10.3102/0034654316689307</bibtext> </blist> <blist> <bibtext> Salter, S. M., Karia, A., Sanfilippo, F. M., &amp; Clifford, R. M. (2014). Effectiveness of e-learning in pharmacy education. American Journal of Pharmaceutical Education, 78 (4), 83. https://doi.org/10.5688/ajpe78483</bibtext> </blist> <blist> <bibtext> Schoot, R. van de, Bruin, J. de, Schram, R., Zahedi, P., Boer, J. de, Weijdema, F., Kramer, B., Huijts, M., Hoogerwerf, M., Ferdinands, G., Harkema, A., Willemsen, J., Ma, Y., Fang, Q., Hindriks, S., Tummers, L., &amp; Oberski, D. L. (2021). An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence, 3 (2), 125 – 133. https://doi.org/10.1038/s42256-020-00287-7</bibtext> </blist> <blist> <bibtext> Schrijvers, M., Janssen, T., Fialho, O., &amp; Rijlaarsdam, G. (2019). Gaining insight into human nature: A review of literature classroom intervention studies. Review of Educational Research, 89 (1), 3 – 45. https://doi.org/10.3102/0034654318812914</bibtext> </blist> <blist> <bibtext> Singh, G., Thomas, J., &amp; Shawe-Taylor, J. (2018). Improving active learning in systematic reviews. ArXiv:1801.09496 [ Cs]. Retrieved from <ulink href="http://arxiv.org/abs/1801.09496">http://arxiv.org/abs/1801.09496</ulink></bibtext> </blist> <blist> <bibtext> Slavin, R. E. (1986). Best-evidence synthesis: An alternative to meta-analytic and traditional reviews. Educational Researcher, 15 (9), 5 – 11. https://doi.org/10.3102/0013189X015009005</bibtext> </blist> <blist> <bibtext> Surr, C. A., Gates, C., Irving, D., Oyebode, J., Smith, S. J., Parveen, S., Drury, M., &amp; Dennison, A. (2017). Effective dementia education and training for the health and social care workforce: A systematic review of the literature. Review of Educational Research, 87 (5), 966 – 1002. https://doi.org/10.3102/0034654317723305</bibtext> </blist> <blist> <bibtext> Thomas, J., Graziosi, S., Brunton, J., Ghouze, Z., O'Driscoll, P., &amp; Bond, M. (2020). EPPI-reviewer: Advanced software for systematic reviews, maps and evidence synthesis. EPPI-Centre Software. Retrieved from https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2967</bibtext> </blist> <blist> <bibtext> Van der Mierden, S., Tsaioun, K., Bleich, A., &amp; Leenaars, C. H. C. (2019). Software tools for literature screening in systematic reviews in biomedical research. ALTEX, 36 (3), 508 – 517. https://doi.org/10.14573/altex.1902131</bibtext> </blist> <blist> <bibtext> Varghese, A., Hong, T., Hunter, C., Agyeman-Badu, G., &amp; Cawley, M. (2019). Active learning in automated text classification: A case study exploring bias in predicted model performance metrics. Environment Systems and Decisions, 39 (3), 269 – 280. https://doi.org/10.1007/s10669-019-09717-3</bibtext> </blist> <blist> <bibtext> Wallace, B. C., Small, K., Brodley, C. E., Lau, J., &amp; Trikalinos, T. A. (2012). Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium (pp. 819 – 824). https://doi.org/10.1145/2110363.2110464</bibtext> </blist> <blist> <bibtext> Wang, Z., Nayfeh, T., Tetzlaff, J., O'Blenis, P., &amp; Murad, M. H. (2020). Error rates of human reviewers during abstract screening in systematic reviews. PLOS One, 15 (1), e0227742. https://doi.org/10.1371/journal.pone.0227742</bibtext> </blist> <blist> <bibtext> Westgate, M. J. (2019). revtools: An R package to support article screening for evidence synthesis. Research Synthesis Methods, 10 (4), 606 – 614. https://doi.org/10.1002/jrsm.1374</bibtext> </blist> <blist> <bibtext> Young, T., Hazarika, D., Poria, S., &amp; Cambria, E. (2018). Recent trends in deep learning based natural language processing. ArXiv:1708.02709 [ Cs]. Retrieved from <ulink href="http://arxiv.org/abs/1708.02709">http://arxiv.org/abs/1708.02709</ulink></bibtext> </blist> <blist> <bibtext> Zhang, Q., Grant, A., Pellegrini, M., &amp; Neitzel, A. (2021). A meta-analysis of teacher salary and turnover in the United States and China. Annual Meeting of the American Educational Research Association, Virtual, AERA 2021.</bibtext> </blist> <blist> <bibtext> Zhang, Q., Wang, J., &amp; Neitzel, A. (2023). School-based mental health interventions targeting depression or anxiety: A meta-analysis of rigorous randomized controlled trials for school-aged children and adolescents. Journal of Youth and Adolescence, 52 (1), 195 – 217. https://doi.org/10.1007/s10964-022-01684-4</bibtext> </blist> </ref> <aug> <p>By Qiyang Zhang and Amanda Neitzel</p> <p>Reported by Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib11" firstref="ref11"></nolink> <nolink nlid="nl2" bibid="bib12" firstref="ref12"></nolink> <nolink nlid="nl3" bibid="bib15" firstref="ref13"></nolink> <nolink nlid="nl4" bibid="bib34" firstref="ref14"></nolink> <nolink nlid="nl5" bibid="bib45" firstref="ref15"></nolink> <nolink nlid="nl6" bibid="bib14" firstref="ref16"></nolink> <nolink nlid="nl7" bibid="bib39" firstref="ref17"></nolink> <nolink nlid="nl8" bibid="bib46" firstref="ref18"></nolink> <nolink nlid="nl9" bibid="bib31" firstref="ref19"></nolink> <nolink nlid="nl10" bibid="bib59" firstref="ref20"></nolink> <nolink nlid="nl11" bibid="bib58" firstref="ref21"></nolink> <nolink nlid="nl12" bibid="bib24" firstref="ref22"></nolink> <nolink nlid="nl13" bibid="bib55" firstref="ref26"></nolink> <nolink nlid="nl14" bibid="bib52" firstref="ref28"></nolink> <nolink nlid="nl15" bibid="bib48" firstref="ref31"></nolink> <nolink nlid="nl16" bibid="bib41" firstref="ref33"></nolink> <nolink nlid="nl17" bibid="bib40" firstref="ref34"></nolink> <nolink nlid="nl18" bibid="bib26" firstref="ref35"></nolink> <nolink nlid="nl19" bibid="bib27" firstref="ref36"></nolink> <nolink nlid="nl20" bibid="bib54" firstref="ref37"></nolink> <nolink nlid="nl21" bibid="bib43" firstref="ref39"></nolink> <nolink nlid="nl22" bibid="bib63" firstref="ref41"></nolink> <nolink nlid="nl23" bibid="bib22" firstref="ref47"></nolink> <nolink nlid="nl24" bibid="bib60" firstref="ref54"></nolink> <nolink nlid="nl25" bibid="bib16" firstref="ref55"></nolink> <nolink nlid="nl26" bibid="bib47" firstref="ref57"></nolink> <nolink nlid="nl27" bibid="bib57" firstref="ref59"></nolink> <nolink nlid="nl28" bibid="bib17" firstref="ref60"></nolink> <nolink nlid="nl29" bibid="bib53" firstref="ref62"></nolink> <nolink nlid="nl30" bibid="bib29" firstref="ref64"></nolink> <nolink nlid="nl31" bibid="bib56" firstref="ref65"></nolink> <nolink nlid="nl32" bibid="bib50" firstref="ref66"></nolink> <nolink nlid="nl33" bibid="bib37" firstref="ref67"></nolink> <nolink nlid="nl34" bibid="bib33" firstref="ref68"></nolink> <nolink nlid="nl35" bibid="bib30" firstref="ref69"></nolink> <nolink nlid="nl36" bibid="bib19" firstref="ref72"></nolink> <nolink nlid="nl37" bibid="bib23" firstref="ref73"></nolink> <nolink nlid="nl38" bibid="bib49" firstref="ref74"></nolink> <nolink nlid="nl39" bibid="bib25" firstref="ref95"></nolink> <nolink nlid="nl40" bibid="bib32" firstref="ref96"></nolink> <nolink nlid="nl41" bibid="bib18" firstref="ref97"></nolink> <nolink nlid="nl42" bibid="bib20" firstref="ref100"></nolink> <nolink nlid="nl43" bibid="bib65" firstref="ref101"></nolink> <nolink nlid="nl44" bibid="bib21" firstref="ref102"></nolink> <nolink nlid="nl45" bibid="bib61" firstref="ref103"></nolink> <nolink nlid="nl46" bibid="bib36" firstref="ref104"></nolink> |
|---|---|
| Header | DbId: eric DbLabel: ERIC An: EJ1431206 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Qiyang+Zhang%22">Qiyang Zhang</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0001-7474-2435">0000-0001-7474-2435</externalLink>)<br /><searchLink fieldCode="AR" term="%22Amanda+Neitzel%22">Amanda Neitzel</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-4676-9320">0000-0002-4676-9320</externalLink>) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Journal+of+Research+on+Educational+Effectiveness%22"><i>Journal of Research on Educational Effectiveness</i></searchLink>. 2024 17(3):513-539. – Name: Avail Label: Availability Group: Avail Data: Routledge. Available from: Taylor & Francis, Ltd. 530 Walnut Street Suite 850, Philadelphia, PA 19106. Tel: 800-354-1420; Tel: 215-625-8900; Fax: 215-207-0050; Web site: http://www.tandf.co.uk/journals – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 27 – Name: DatePubCY Label: Publication Date Group: Date Data: 2024 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Information Analyses – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Selection+Tools%22">Selection Tools</searchLink><br /><searchLink fieldCode="DE" term="%22Educational+Resources%22">Educational Resources</searchLink><br /><searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Selection+Criteria%22">Selection Criteria</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Software+Selection%22">Computer Software Selection</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Software+Evaluation%22">Computer Software Evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Literature+Reviews%22">Literature Reviews</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1080/19345747.2023.2209079 – Name: ISSN Label: ISSN Group: ISSN Data: 1934-5747<br />1934-5739 – Name: Abstract Label: Abstract Group: Ab Data: In recent years, the rapid development of artificial intelligence has enabled the launch of many new screening tools. This review aims to facilitate screening tool selection through a systematic narrative review and feature analysis. The current adoption rate of transparent tool reporting is low: by screening 191 studies published in the "Review of Educational Research" since 2015, we found that only eight studies reported screening tools. More research is needed to understand the reasons behind this phenomenon. After consulting various sources, 26 available screening tools in the market were found. Among them, we identified and evaluated 12 screening tools for educational reviewers and ranked them in descending order of feature score: Covidence (1), DistillerSR (2, tied), EPPI-Reviewer (2, tied), CADIMA (4), Swift-Active (5), Rayyan (6, tied), SysRev (6, tied), Abstrackr (8, tied), ReLiS (8, tied), RevMan (8, tied), ASReview (11), and Excel (12). In the discussion, we provide insights into the promise and bias in tools' machine learning algorithms. Our results encourage researchers to report their tool usage in publications and select tools based on suitability instead of convenience. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2024 – Name: AN Label: Accession Number Group: ID Data: EJ1431206 |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1431206 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1080/19345747.2023.2209079 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 27 StartPage: 513 Subjects: – SubjectFull: Selection Tools Type: general – SubjectFull: Educational Resources Type: general – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Selection Criteria Type: general – SubjectFull: Computer Software Selection Type: general – SubjectFull: Computer Software Evaluation Type: general – SubjectFull: Literature Reviews Type: general Titles: – TitleFull: Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Qiyang Zhang – PersonEntity: Name: NameFull: Amanda Neitzel IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2024 Identifiers: – Type: issn-print Value: 1934-5747 – Type: issn-electronic Value: 1934-5739 Numbering: – Type: volume Value: 17 – Type: issue Value: 3 Titles: – TitleFull: Journal of Research on Educational Effectiveness Type: main |
| ResultId | 1 |