View in EDS HTML Full Text PDF Full Text

A Comparison of Machine Learning Methods to Find Clinical Trials for Inclusion in New Systematic Reviews from Their PROSPERO Registrations Prior to Searching and Screening

Saved in:

Bibliographic Details
Title:	A Comparison of Machine Learning Methods to Find Clinical Trials for Inclusion in New Systematic Reviews from Their PROSPERO Registrations Prior to Searching and Screening
Language:	English
Authors:	Shifeng Liu, Florence T. Bourgeois, Claire Narang, Adam G. Dunn (ORCID 0000-0002-1720-8209)
Source:	Research Synthesis Methods. 2024 15(1):73-85.
Availability:	Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us
Peer Reviewed:	Y
Page Count:	13
Publication Date:	2024
Sponsoring Agency:	National Library of Medicine (DHHS/NIH)
Contract Number:	R01LM012976
Document Type:	Journal Articles Reports - Research
Descriptors:	Artificial Intelligence, Medical Research, Experimental Groups, Control Groups, Documentation, Computer Software Evaluation, Data Collection, Data Analysis, Journal Articles, Performance Factors
DOI:	10.1002/jrsm.1672
ISSN:	1759-2879 1759-2887
Abstract:	Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT-based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term-based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term-based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best-performing approach used a term-based representation. The results show that MetaMap and term-based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review.
Abstractor:	As Provided
Entry Date:	2024
Accession Number:	EJ1405499
Database:	ERIC
Full text is not displayed to guests. Login for full access.

FullText	Links: – Type: pdflink Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHj0k_4E0hTGH8RJwT4gCJyBsGNe_WN95AvKlDbXJGqwxwG0RIrkN1Rnfuim6lsYC00iAAAA4zCB4AYJKoZIhvcNAQcGoIHSMIHPAgEAMIHJBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDGGsUVtWhwbTGqY_JQIBEICBmxCdQuiv1xEAI4e9vgJRdXn3vOMFrPwjxa6bEmmL4thzStjNL3DaGuVEtfwInK2UQQj-r3Yds6G2DjfvLuS2Lw2_sKMyFpJn8H3Zm_pyzRQyl7aR8OLzVGcuy4xmHsZXd-4y3qT-qcyB6FfOkb20OqHqKAFqJ6d2AXBWSPSkuiY7LbuI6vOR9FGpGzL-AowXN2EV1j5PP7-A6BD2 Text: Availability: 1 Value: <anid>AN0174546113;[bdct]01jan.24;2024Jan03.05:07;v2.2.500</anid> <title id="AN0174546113-1">A comparison of machine learning methods to find clinical trials for inclusion in new systematic reviews from their PROSPERO registrations prior to searching and screening </title> <p>Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT‐based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term‐based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term‐based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best‐performing approach used a term‐based representation. The results show that MetaMap and term‐based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review.</p> <p>Keywords: clinical trials; information retrieval; systematic reviews</p> <p></p> <ulist> <item> Searching and screening is a key focus of automation for systematic reviews, but few analyses examine how to proactively assign clinical trials to systematic review questions before any trials have been screened for inclusion.</item> <p></p> <item> In scenarios where no relevant trials can be screened before or during model training, term‐based and concept‐based methods perform better than methods that restrict terms using PICO extraction methods.</item> <p></p> <item> For future scenarios where non‐experts expect to ask a question and receive an immediate summary of the available evidence, the community may wish to consider methods that can learn how to identify relevant trials based on a summary of a systematic review question such as a PROSPERO entry.</item> </ulist> <p>Highlights What is already known What is new Potential impact</p> <hd id="AN0174546113-2">BACKGROUND</hd> <p>Systematic reviews are considered among the highest levels of evidence, but they are also labor‐intensive, requiring around 880 h of work to complete and publish for health‐related questions.[<reflink idref="bib1" id="ref1">1</reflink>] Searching and screening for studies that should be included in a systematic review requires specialized expertise and make up around half of the effort involved in a systematic review. Other challenges that affect systematic reviews of health interventions include redundancy and poor targeting of systematic review questions, which can leave important clinical questions unanswered,[[<reflink idref="bib2" id="ref2">2</reflink>]] and other systematic reviews that go out of date soon after they are published.[<reflink idref="bib4" id="ref3">4</reflink>]</p> <p>Clinical trials are represented by a range of documents.[<reflink idref="bib5" id="ref4">5</reflink>] Trial registrations are mostly made available to the public before a trial has recruited participants. Published articles and structured results summaries are made available to the public mostly after a trial is completed. These and other documents related to trials should be linked using identifiers including digital object identifiers (DOIs), national clinical trial (NCT) numbers and others. In what follows, we describe systematic reviews that aim to find trials by their registrations or published articles and could be used to proactively assign trials to systematic review questions before they are completed.[<reflink idref="bib3" id="ref5">3</reflink>]</p> <p>A range of initiatives and applications have been developed to support systematic review processes with combinations of data‐driven tools and crowdsourcing.[[<reflink idref="bib6" id="ref6">6</reflink>], [<reflink idref="bib8" id="ref7">8</reflink>], [<reflink idref="bib10" id="ref8">10</reflink>]] Most of the methods and tools in the area has been focused on reducing time requirements or workloads for individual systematic review processes,[[<reflink idref="bib11" id="ref9">11</reflink>], [<reflink idref="bib13" id="ref10">13</reflink>], [<reflink idref="bib15" id="ref11">15</reflink>], [<reflink idref="bib17" id="ref12">17</reflink>], [<reflink idref="bib19" id="ref13">19</reflink>]] with fewer development efforts directed at changing how we do systematic reviews to get the results of clinical trials in forms of evidence synthesis as soon as possible after the results are first made available.[[<reflink idref="bib21" id="ref14">21</reflink>], [<reflink idref="bib23" id="ref15">23</reflink>], [<reflink idref="bib25" id="ref16">25</reflink>]] To minimize delays between the production of clinical evidence and its uptake in evidence synthesis, we need a greater understanding of the limits of assigning studies to systematic review questions, and where we need to make efficient use of human experts or crowdsourcing.</p> <p>Methods and tools for automating the searching and screening of studies for systematic reviews have focused primarily on ranking trial articles to reduce manual effort or to eliminate a proportion of the records returned by a search in bibliographic databases.[[<reflink idref="bib12" id="ref17">12</reflink>], [<reflink idref="bib16" id="ref18">16</reflink>], [<reflink idref="bib26" id="ref19">26</reflink>]] A key issue is that the best performing tools require human expertise before or during the use of the tools. Some use active learning, where machine learning models are trained and updated with experts making inclusion and exclusion decisions on published trials. Other approaches train a new model for every systematic review question or systematic review update and require examples of relevant trials to be known in advance. This could make deployment of tools limited to certain use cases and create a challenge for deployment in scenarios where there is no access to experts to identify relevant trials in advance (Figure 1).</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/BDCT/01jan24/jrsm1672-fig-0001.jpg?ephost1=dGJyMMvl7ESepq84yOvsOLCmsE6epq5Srqa4SK6WxWXS" alt="jrsm1672-fig-0001.jpg" title="1 A schematic representation of the traditional systematic review of clinical trials compared to proactively allocating trials to systematic review questions before the trials are completed (left), and the types of tools used to support automation (right), including: (a) active learning approaches where experts label trials during model training; (b) proactive identification of trial registrations and trial articles to update systematic reviews where some included trials are already known; and (c) proactive identification of trial registrations and trial articles for new systematic review questions where experts are not available to identify example trials before or during model training. [Colour figure can be viewed at wileyonlinelibrary.com]" /> </p> <p></p> <p>Less is known about machine learning methods for prospectively allocating trial registrations to new systematic review questions where no trials have been identified and potentially before the trials are completed and published. Some work has examined the number and quality of connections between trial registrations and trial articles.[[<reflink idref="bib28" id="ref20">28</reflink>]] A method that used five or more known relevant trial registrations as a seeding set to identify similar trials was able to identify half of all relevant trials by screening the first 100 registrations.[<reflink idref="bib30" id="ref21">30</reflink>] To our knowledge, there are no analyses that have examined methods that reduce the workload of screening trial registrations for inclusion in new systematic reviews prior to screening and based only on a description of the review question such as an entry in the International Prospective Register of Systematic Reviews (PROSPERO). PROSPERO is an international database of prospectively registered systematic reviews in health and social care. The expectation is that systematic reviews are registered on PROSPERO before studies are screened for inclusion, so no trials will have been screened for inclusion in a systematic review when it is newly registered.</p> <p>Concept extraction methods can be used to identify the populations, interventions or exposures, comparators, and outcomes (PICO or PECO) that are often used to describe systematic reviews of interventions including medications, devices, and other preventions and treatments. In the automation of systematic reviews, PICO extraction is used to support information extraction after deciding which studies should be included.[<reflink idref="bib31" id="ref22">31</reflink>] Taking a simplified view of systematic reviews, if all studies can be effectively represented by their PICO, it follows that it might be possible to reduce or avoid searching and screening by mapping studies directly to the PICO of a systematic review question.</p> <p>We sought to identify trial registrations and trial articles that should be included in a systematic review based on its description in a PROSPERO entry. Our aim was to evaluate the performance of a range of document representations for the task of ranking trial registrations against PROSPERO entries and systematic review articles based on relevance for inclusion. We compared methods that select document representations from terms to methods that extract concepts related to populations, interventions, and outcomes.</p> <hd id="AN0174546113-4">METHODS</hd> <p>The study is an analysis of methods for identifying trial registrations and trial articles that are relevant for inclusion in systematic reviews based on text from its PROSPERO entry. We tested the same methods for their performance when identifying relevant trial registrations and trial articles based on text from the titles and abstracts of published systematic review articles. Input data included the text extracted from trial articles, trial registrations on ClinicalTrials.gov, PROSPERO entries, and systematic review articles.</p> <p>All data associated with this study, including manually curated data, are publicly available.[<reflink idref="bib32" id="ref23">32</reflink>] Data are sourced from Medline, PROSPERO, and ClinicalTrials.gov. Synthesized records can be accessed via the ES<sups>3</sups> platform,[<reflink idref="bib25" id="ref24">25</reflink>] extracted using systematic review PubMed identifiers and using code from a publicly available repository.[<reflink idref="bib33" id="ref25">33</reflink>]</p> <hd id="AN0174546113-5">Data mined from bibliographic databases and registries</hd> <p>A large but imperfect dataset was extracted using the ES<sups>3</sups> system (https://es3-bidh.sydney.edu.au/), an ongoing surveillance project that automatically mines bibliographic data from published systematic review articles and trial registrations on ClinicalTrials.gov.[<reflink idref="bib25" id="ref26">25</reflink>] The reason the data were incomplete is because it mines data from reference lists on CrossRef. Reference lists do not always include the studies that systematic reviews use in their analyses (false negatives) and other references in systematic review article reference lists may include other clinical trials that were not used in their analyses (false positives). To assess this limitation, we include a separate manually curated dataset to examine the potential impact on performance. The manually curated dataset is described in the following subsection.</p> <p>Included systematic reviews were published between April 22, 1996 and July 2, 2021. PROSPERO entries linked to the published systematic review articles were identified by searching for PROSPERO as a word in the abstract, or identifiers matching the PROSPERO format, in the abstracts of the systematic review articles. This was followed by removal of duplicates and manual correction where the identifiers were malformed.</p> <p>For systematic review articles, PubMed identifiers (PMID), titles, and abstracts were extracted. For PROSPERO entries, the title, review questions, condition or domain being studied, intervention exposure, comparator control, participants or population, main outcomes, and additional outcomes were extracted.</p> <p>We followed a standard automated process to identify connections between systematic reviews (all PROSPERO entries were linked to a systematic review article) and included trial registrations and trial articles. First, we extracted reference lists for the systematic reviews using CrossRef, wherever they were available. Second, we searched PubMed using the DOIs from CrossRef and identified the subset of cited articles for which the publication type was a clinical trial. Third, we used the metadata on PubMed to find links to trial registrations on ClinicalTrials.gov. If a systematic review article included at least one connection via a trial article to a trial registration on ClinicalTrials.gov, the PROSPERO entry (if it existed), systematic review article (identified by its PMID), connected clinical trial articles (identified by their PMIDs) and clinical trial registrations (identified by their NCT number) were included in the dataset.</p> <p>For clinical trials identified using this process, data extracted from trial articles reporting their results included the NCT number, PMID, title, and abstract. Data extracted from the trial registrations included text from the brief title, the official title, the intervention, the eligibility, the primary outcome, and the secondary outcome sections.</p> <hd id="AN0174546113-6">Manually curated data</hd> <p>A second evaluation dataset was manually curated from a smaller set of systematic reviews. We randomly selected 100 systematic review articles from the mined data described above, then simulated a realistic scenario where each was considered a new systematic review for which trials needed to be identified by searching for trial registrations and trial articles. To identify published systematic reviews with clinical trials that were connected to a reasonable number of trial registrations and to avoid scoping reviews and network meta‐analyses, we selected systematic review articles with at least 5 and at most 10 known links to ClinicalTrials.gov trial registrations (a constraint based on our previous experience with mined links). This left us with 6856 systematic review articles, from which we then sampled 100 using a uniform distribution.</p> <p>One of the co‐authors (C.N.) then annotated the selected connections for these systematic reviews. This entailed a similar process to previous studies identifying complete lists of studies included in systematic reviews and links between sources[[<reflink idref="bib34" id="ref27">34</reflink>], [<reflink idref="bib36" id="ref28">36</reflink>]] and made use of the existing connections and recommendations from ES<sups>3</sups>. The result was a manually curated set of systematic reviews (some with linked PROSPERO entries), connected to a set of trial articles and trial registrations.</p> <hd id="AN0174546113-7">Comparison methods</hd> <p>We defined the identification of trial registrations and trial articles that are relevant for inclusion in a systematic review as a ranking problem. The aim was to rank all candidate trial registrations or trial articles by calculating a document similarity score using the concatenated text of a trial registration (selected fields), trial article (title and abstract), PROSPERO entry (selection sections) and systematic review article (title and abstract).</p> <p>We compared four different document representations. These included (a) a basic document similarity method using term frequency inverse document frequency (TF‐IDF); (b) applying SciBERT for PICO extraction to extract only terms that correspond to PICO elements[<reflink idref="bib37" id="ref29">37</reflink>]; (c) applying SciBERT for PICO extraction and then applying concept extraction via the MetaMap application[<reflink idref="bib38" id="ref30">38</reflink>]; and (d) concept extraction from the entire document via the MetaMap application.</p> <p>The TF‐IDF method assigns weights to each term in a document based on how important it is in a corpus. The term frequency is the number of times the term appears in the document, and the inverse document frequency is the logarithmically scaled inverse of the number of documents that contain the term in the corpus. SciBERT for PICO extraction is a method developed by Liu et al,[<reflink idref="bib37" id="ref31">37</reflink>] which extracts PICO terms from biomedical documents by representing documents using a bidirectional encoder representation from transformers (BERT) variant called SciBERT in a way that does not require span annotations and achieves a high level of recall. The MetaMap application[<reflink idref="bib38" id="ref32">38</reflink>] is a tool for identifying or extracting Unified Medical Language System concepts from a document and uses a range of natural language processing methods to achieve this task. The result of applying either PICO extraction, MetaMap, or normalizing PICO extracted concepts using MetaMap is that a sparser representation of the document is produced, but the assumption is that the sparser representations might improve methods for finding trials relevant to a PROSPERO registration because it focuses on the elements of the document that relate to the inclusion criteria for the systematic review.</p> <p>We used scikit‐learn (https://scikit-learn.org/) to process each corpus of text from systematic reviews, PROSPERO entries, trial articles, and trial registrations. We extracted tokens in one of four ways: (a) terms, (b) SciBERT extracted PICO terms, (c) MetaMap normalized SciBERT PICO concepts, and (d) MetaMap extracted concepts. We then calculated the token frequency of each term or concept and inverse document frequency across the corpus. The document representation of any document in the corpus was formed by multiplying its token frequency with the inverse document frequency. In cases where we needed to calculate TF‐IDF for a new document not already included in the corpus, we used the pre‐calculated IDF values. For PROSPERO entries, systematic review articles, trial registrations, and trial articles we transformed their text into a series of vector representations using the TF‐IDF method, such that the representation of all four document types had the same dimensionality.</p> <p>We then calculated the pairwise cosine similarity scores between the systematic review representations and each of the trial representations. Cosine similarity was chosen as the most appropriate method for scoring based on results of previous experiments.[[<reflink idref="bib30" id="ref33">30</reflink>], [<reflink idref="bib35" id="ref34">35</reflink>]] We then ranked all candidate studies by score from the highest similarity to lowest similarity and used these ranking results to determine the performance of each of the representations.</p> <hd id="AN0174546113-8">Experiments and evaluation metrics</hd> <p>To compare the performance of different ranking methods, we applied Recall@K as the evaluation metric. Recall@K is the proportion of the connected trial articles or trial registrations among the top K ranked candidates. For example, A Recall@100 value represents the proportion of included trial registrations or trial articles that have been identified after checking the top 100 ranked candidates. Recall@K is calculated by aggregating the set of ranks across all included trial registrations or trial articles, rather than calculating the value for each PROSPERO entry or systematic review article. The use of Recall@K is common in information retrieval studies and has been used in analysis of methods for automating searching and screening for systematic reviews.[[<reflink idref="bib39" id="ref35">39</reflink>]] We report Recall@100 and visualized all values of Recall@K from 1 to the total number of candidate trial reports. To determine whether the differences between the results are significant, we use a Mann–Whitney <emph>U</emph> test to compare the median ranks using the best performing method as the reference.</p> <p>Median rank is determined from the ranks of the aggregated set of all included trial registrations or trial articles across a set of PROSPERO entries or systematic review articles. The median rank across a set of PROSPERO entries or systematic review articles is given by the median value for the rank of all included trial registrations or trial articles in a set, rather than calculating the median rank for each PROSPERO entry or systematic review article. Median rank is then reported with an interquartile range (IQR).</p> <p>We first compared the performance of the four representations on the mined data and then on the manually curated data. For each of the four representations, we evaluated the performance for ranking (a) trial registrations against PROSPERO entries (Figure 2); (b) trial articles against PROSPERO entries; (c) trial registrations against systematic review articles; and trial articles against systematic review articles. We implemented all methods in Python 3.7. All experiments were run on a Linux machine with a V100 GPU. We used MetaMap 2018 with 2016AB USAbase Strict Data Model for normalization.</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/BDCT/01jan24/jrsm1672-fig-0002.jpg?ephost1=dGJyMMvl7ESepq84yOvsOLCmsE6epq5Srqa4SK6WxWXS" alt="jrsm1672-fig-0002.jpg" title="2 An example of the comparison framework using mined data for ranking trial registrations to PROSPERO entries, including the steps of processing (gray), numbers of trial registrations (green), PROSPERO entries (blue), and known connections (red). Other combinations of PROSPERO entries, systematic review articles, trial articles, and trial registrations are included in the Supporting Information. [Colour figure can be viewed at wileyonlinelibrary.com]" /> </p> <p></p> <hd id="AN0174546113-10">RESULTS</hd> <p>The mined study data included 3,362 PROSPERO entries, 47,371 systematic review articles, 65,662 trial registrations, and 65,834 trial articles. In the manually curated dataset, we included 78 systematic reviews, of which 6 had PROSPERO entries, which were linked to 478 trial registrations and 1466 trial articles.</p> <p>The tokenized data included 42,609 unique tokens (20,654 appearing once) for PROSPERO entries, 203,136 unique tokens (122,192 appearing once) for systematic review articles, 286,627 unique tokens (163,506 appearing once) for trial registrations, and 300,356 unique tokens (<reflink idref="bib173" id="ref36">173</reflink>,097 appearing once) for trial articles. The most common tokens remaining after pre‐processing were "studies," appearing in 79.4% of systematic review articles, "patients," appearing in 53.6% of PROSPERO entries and 65.7% of trial articles, and "study," appearing in 83.5% of trial registrations.</p> <p>After manually verifying the lists of included trials from 100 sampled systematic reviews, we excluded 22 systematic reviews that were found to have fewer than 5 or more than 50 included trials (under the assumption that reviews including more than 50 studies are likely to be scoping reviews or network meta‐analyses with broad questions) and included 78 systematic reviews. The systematic reviews had 1573 connections, including 1465 to a trial article (PMID) and 479 to a trial registration (via NCT number). For 465 of the connections, both a PMID and NCTID were available; none of the included trial had an NCT number but no PMID. For 94 trials included in the systematic reviews, we found no NCT number or PMID and these could not be included in the analyses. On average, systematic reviews had 20 connections including 19 to a trial article and 6 to a trial registration (some overlap between trials that had both an article and a registration). For 6 of the 78 systematic reviews, we identified a PROSPERO entry, and these were associated with 155 connections to trials, including 148 connections to a trial article and 39 connections to a trial registration.</p> <hd id="AN0174546113-11">Comparison methods using mined data</hd> <p>Among all the comparison methods with different targeted terms, the MetaMap and basic term‐based representations produced high and generally similar levels of performance. The SciBERT PICO extraction representation produced the lowest performance, with some improvement achieved by normalizing the extracted PICO terms using MetaMap (Figure 3).</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/BDCT/01jan24/jrsm1672-fig-0003.jpg?ephost1=dGJyMMvl7ESepq84yOvsOLCmsE6epq5Srqa4SK6WxWXS" alt="jrsm1672-fig-0003.jpg" title="3 The scatter plot of the TF‐IDF methods using different targeted terms with cosine similarity as ranking score in the mined data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article. [Colour figure can be viewed at wileyonlinelibrary.com]" /> </p> <p></p> <p>When connecting PROSPERO entries to trial registrations and trial articles, the basic term‐based representation produced the best performance for both trial registrations (median rank 268, Recall@100 35.6%) and trial articles (median rank 96, Recall@100 50.4%), but the performance difference was not significant compared to MetaMap extracted concept representation (Table 1). When connecting systematic review articles to trial registrations and trial articles, the MetaMap extracted concept representation produced the highest performance for trial registrations (median rank 331, Recall@100 34.0%) and trial articles (median rank 88, Recall@100 51.6%), but the performance difference was not significant compared to the term‐based representation. Neither of the PICO extraction representations produced comparable results.</p> <p>1 TABLE Comparison of performance metrics over four data sources using all mined connections.</p> <p> <ephtml> &lt;table&gt;&lt;thead valign="bottom"&gt;&lt;tr&gt;&lt;th align="left"&gt;Model&lt;/th&gt;&lt;th align="left"&gt;Median rank (IQR)&lt;/th&gt;&lt;th align="left"&gt;&lt;italic&gt;p&lt;/italic&gt;&amp;#8208;Values by Mann&amp;#8211;Whitney &lt;italic&gt;U&lt;/italic&gt; test&lt;/th&gt;&lt;th align="left"&gt;Recall@100 (%)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td align="left"&gt;PROSPERO entry to trial registration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;268 (43&amp;#8211;1830)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;35.6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;16619.5 (6179&amp;#8211;25,433)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;6.69&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;1451 (155&amp;#8211;15,377)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;20.6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;449 (63&amp;#8211;4121.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.1936&lt;/td&gt;&lt;td align="char" char="."&gt;30.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;PROSPERO entry to trial article&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;96 (16&amp;#8211;583)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;50.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;11,307 (185&amp;#8211;39,793)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;21.3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;312 (38&amp;#8211;2602)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;35.2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;103.5 (16&amp;#8211;673)&lt;/td&gt;&lt;td align="char" char="."&gt;1.0000&lt;/td&gt;&lt;td align="char" char="."&gt;49.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Systematic review article to trial registration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;418 (57&amp;#8211;2812)&lt;/td&gt;&lt;td align="char" char="."&gt;1.0000&lt;/td&gt;&lt;td align="char" char="."&gt;31.2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;17,775 (4955&amp;#8211;25,002)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;9.65&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;1311 (126&amp;#8211;14922.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;22.8&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;331 (56&amp;#8211;3408)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;34.0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Systematic review article to trial article&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;97 (12&amp;#8211;721)&lt;/td&gt;&lt;td align="char" char="."&gt;0.9994&lt;/td&gt;&lt;td align="char" char="."&gt;50.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;4025 (55&amp;#8211;30,511)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;29.3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;212 (25&amp;#8211;1795)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;40.4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;88 (11&amp;#8211;665)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;51.6&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>Trial articles were generally more closely connected to the PROSPERO entries and systematic review articles than compared to trial registrations (Table 1). For both systematic review articles and PROSPERO entries, over half of the relevant trial articles were identified within the first 100 ranked trial articles, and just over a third of the relevant trial registrations were identified within the first 100 ranked trial registrations.</p> <hd id="AN0174546113-13">Comparison of methods using manually annotated data</hd> <p>When comparing methods using the manually annotated data, the pattern of performance across the representations matched the pattern in the mined data (Figure 4), with the term‐based approach outperforming concepts for PROSPERO entries and concepts outperforming the term‐based approach for systematic review articles (Table 2).</p> <p> <img src="https://imageserver.ebscohost.com/img/embimages/rdk/BDCT/01jan24/jrsm1672-fig-0004.jpg?ephost1=dGJyMMvl7ESepq84yOvsOLCmsE6epq5Srqa4SK6WxWXS" alt="jrsm1672-fig-0004.jpg" title="4 The scatter plot of the TF‐IDF methods using different targeted terms with cosine similarity as ranking score in manually curated data for: (a) PROSPERO entry to trial registration; (b) PROSPERO entry to trial article; (c) systematic review article to trial registration; and (d) systematic review article to trial article. [Colour figure can be viewed at wileyonlinelibrary.com]" /> </p> <p></p> <p>2 TABLE Comparison of performance metrics on four different data sources using all manually annotated connections.</p> <p> <ephtml> &lt;table&gt;&lt;thead valign="bottom"&gt;&lt;tr&gt;&lt;th align="left"&gt;Model&lt;/th&gt;&lt;th align="left"&gt;Median rank (IQR)&lt;/th&gt;&lt;th align="left"&gt;&lt;italic&gt;p&lt;/italic&gt;&amp;#8208;Value by Mann&amp;#8211;Whitney &lt;italic&gt;U&lt;/italic&gt; test&lt;/th&gt;&lt;th align="left"&gt;Recall@100 (%)&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody valign="top"&gt;&lt;tr&gt;&lt;td align="left"&gt;PROSPERO entry to trial registration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;296 (41&amp;#8211;1503)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;38.2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;17,115 (9345&amp;#8211;21,942)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;5.88&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;17076.5 (9198&amp;#8211;21866.25)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;5.88&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;943 (176&amp;#8211;8858.25)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0209&lt;/td&gt;&lt;td align="char" char="."&gt;23.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;PROSPERO entry to trial article&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;161.5 (37.5&amp;#8211;832.5)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;41.2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;44455.5 (44055.75&amp;#8211;44774.25)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;6.08&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;44,301 (34061.5&amp;#8211;44711.25)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;7.43&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;228 (38.75&amp;#8211;1725.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0209&lt;/td&gt;&lt;td align="char" char="."&gt;41.9&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Systematic review article to trial registration&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;228 (55&amp;#8211;1494.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.8896&lt;/td&gt;&lt;td align="char" char="."&gt;35.2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;17,150 (5230.5&amp;#8211;24776.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0067&lt;/td&gt;&lt;td align="char" char="."&gt;10.9&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;15,859 (3353.5&amp;#8211;24182.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;12.8&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;194 (50&amp;#8211;1659)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;39.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Systematic review article to trial article&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;Tokens&lt;/td&gt;&lt;td align="char" char="("&gt;139 (26&amp;#8211;803)&lt;/td&gt;&lt;td align="char" char="."&gt;0.3971&lt;/td&gt;&lt;td align="char" char="."&gt;43.3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;44,360 (30916.5&amp;#8211;44,790)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;11.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap norm. SciBERT PICOs&lt;/td&gt;&lt;td align="char" char="("&gt;44,325 (30779.5&amp;#8211;44752.5)&lt;/td&gt;&lt;td align="char" char="."&gt;0.0000&lt;/td&gt;&lt;td align="char" char="."&gt;9.80&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="left"&gt;MetaMap terms&lt;/td&gt;&lt;td align="char" char="("&gt;129 (23&amp;#8211;800.5)&lt;/td&gt;&lt;td align="char" char="."&gt;&amp;#8208;&lt;/td&gt;&lt;td align="char" char="."&gt;46.4&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt; </ephtml> </p> <p>However, the results comparing the performance between the manually curated and mined data differed across the sources of data in important ways. Comparing the performance in the mined data and the manually curated data, the performance improved for manually curated trial registrations: screening the first 100 ranked trial registrations ranked against a PROSPERO entry identified 38.2% of relevant trials in the manual data compared to 35.6% in the mined data. The performance decreased for trial articles—screening the first 100 ranked trial articles against a PROSPERO entry identified 41.2% of trial articles in the manually curated trial articles compared to 50.4% in the mined data. The results indicate that under realistic scenarios using manually curated data, the performance is relatively similar across trial registrations and trial articles.</p> <hd id="AN0174546113-15">Post hoc analysis of performance and errors</hd> <p>We compared the automatically extracted connections and the manually curated connections for the subset of manually curated systematic reviews. On average, a systematic review had around 20 manually annotated connections, of which six had NCTIDs. The mined systematic reviews included an average of 6.6 trial registrations. Of those, 36.5% were not included in the manually curated data (false positives), and an average of two new connections (false negatives) were added. This means that for every true positive mined connection, we introduced approximately 0.6 false positive connections into the mined data. These false positives were typically referenced studies that were used to support the background of the systematic review or were cited despite being excluded from the synthesis.</p> <p>Overall, we found that relevant trial articles were ranked higher than relevant trial registrations. For PROSPERO entries, the highest‐performing approaches found 50.4% of relevant trial articles within the first 100 candidates, compared to 35.6% of relevant trial registrations. In the manually curated data, this difference reduced with 41.2% recall@100 (trial articles) and 38.2% recall@100 (trial registrations), which suggests that the mined data may overestimate performance for trial articles and that the performance is similar for trial articles and trial registrations.</p> <p>Methods for extracting PICO elements as a way of mapping all types of trial and systematic review documents to the same consistent structure were less effective than mappings that simply made use of more of the text information. There were 689,458 unique SciBERT PICO terms for trial articles and 470,065 unique SciBERT PICO terms for trial registrations, and 55,994 unique concepts for trial articles and 51,869 unique concepts for trial registrations. The state‐of‐the‐art PICO extraction model reported a macro F1‐score of 0.72 on a manually annotated PICO extraction dataset, using data from articles in PubMed.[<reflink idref="bib37" id="ref37">37</reflink>] However, the evaluation is at the token level rather than span level, which is how we used the PICO terms. When the model was evaluated at the span level, we found a macro F1‐score of 0.44, indicating that more than half of the PICO terms were missed by the model. This error propagates through the document similarity process for both terms and concepts.</p> <hd id="AN0174546113-16">DISCUSSION</hd> <p>In an analysis of methods for ranking trial registrations and trial articles against PROSPERO registrations of systematic reviews, we found that simple term‐based and concept‐based representations outperformed approaches that extract PICO elements as terms or concepts. The best performing approaches may help to reduce the workload associated with identifying relevant trial registrations and trial articles without needing to identify example trials or train a new model for every new systematic review, but alone cannot fully replace traditional searching and screening processes for a comprehensive review of all relevant studies.</p> <p>We tested whether it might be useful to extract and map clinical trials by their PICO elements and allocate these directly to systematic review questions also represented by their PICO elements (either as terms or concepts). When we tested this approach on both a mined dataset and a manually curated dataset, the results showed that state of the art PICO extraction methods do not improve the performance of methods for ranking relevant trial registrations relative to a PROSPERO entry, compared to simpler document representations that use terms or concepts. Including PICO extraction in the pipeline for automation of trial registration and trial article screening makes intuitive sense where systematic reviews define a PICO and identify trials using search strategies and inclusion criteria based on a PICO specification. However, it appears that using PICO extraction as part of the document representation removes contextual information that is being used to improve the ranking of relevant trials.</p> <p>Other analyses, new methods, and tools have been proposed to reduce workload associated with screening articles for inclusion in systematic reviews. Some use active learning approaches where experts are asked to label new examples to iteratively train machine learning models,[[<reflink idref="bib9" id="ref38">9</reflink>], [<reflink idref="bib16" id="ref39">16</reflink>], [<reflink idref="bib26" id="ref40">26</reflink>]] which can reduce the number of articles that need to be screened. Two studies have investigated the use of seeding sets for finding trial registrations to add to systematic review updates.[[<reflink idref="bib30" id="ref41">30</reflink>], [<reflink idref="bib36" id="ref42">36</reflink>]] The analysis we presented here differs in that it involves no additional expert effort, and no new model needs to be trained for each new systematic review, though we found this is associated with a trade‐off in performance.</p> <p>This analysis has implications for the practice of systematic reviews and systematic review automation.[<reflink idref="bib41" id="ref43">41</reflink>] If the goal of research in the area is to create simple ways to prospectively allocate new trials to relevant systematic review communities when they are developed and before they are complete and have results available,[<reflink idref="bib21" id="ref44">21</reflink>] then it is likely that expert evaluation or crowdsourcing of screening will still be needed.[<reflink idref="bib6" id="ref45">6</reflink>] A key conclusion of what has been learned in the area in recent years is that there is now a spectrum of different methods that require different amounts of expert involvement—either never (this study), before (seeding sets for review updates), or during (active learning) the training of new models for a specific systematic review question. A clever approach might consider ways to deploy all these methods at once in tools or on platforms that support systematic review processes, sharing data, or crowdsourcing.</p> <p>This study has several limitations. Data automatically mined from bibliographic databases were imperfect and included false positives and false negatives. To address this limitation, we checked for consistent results in a smaller manually curated dataset. After we discovered that the performance of the SciBERT PICO approach was much lower than the other approaches, we did not pursue additional methods for training the model to improve the matching of PICO terms and concepts across systematic reviews and study articles. The imperfect mined data may partially explain the relatively poor performance of the model, where false positives in the training data may have introduced biases into the model. We only evaluated the approach for the scenario where no relevant trials have been identified in advance, and it is possible that PICO extraction methods might be more useful in methods that have access to seeding sets of trials or can be embedded in the screening process via active learning.</p> <hd id="AN0174546113-17">CONCLUSION</hd> <p>Rapid and robust integration of new evidence into systematic reviews can help ensure that medical interventions are effective and safe. Traditionally, systematic reviews have relied on time‐consuming searching and screening of relevant published studies, and this limits opportunities to develop new ways to get trial results data into evidence synthesis as quickly as possible. To advance the field of systematic review automation, we assessed a set of methods to rank trial registrations to PROSPERO entries and found that methods that attempt to extract and match PICO information from trial registrations and PROSPERO entries do not perform as well as methods that measure document similarity across terms or concepts. We also found that in manually curated data, ranking trial registrations relative to PROSPERO entries had similar performance to ranking trial articles to systematic review abstracts. Based on the results here and in prior studies, we think that the full range of automation approaches could be combined with crowdsourcing to create a system where trial registrations are proactively allocated to PROSPERO entries as a step toward fully automated approaches to synthesizing evidence in robust ways.</p> <hd id="AN0174546113-18">AUTHOR CONTRIBUTIONS</hd> <p> <bold>Shifeng Liu:</bold> Conceptualization; data curation; formal analysis; methodology; writing—original draft. <bold>Florence Bourgeois:</bold> Conceptualization; funding acquisition; methodology; project administration; supervision; writing—review and editing. <bold>Adam Dunn:</bold> Conceptualization; funding acquisition; investigation; methodology; project administration; supervision; writing—review and editing. <bold>Claire Narang:</bold> Data curation; critically revised the manuscript.</p> <hd id="AN0174546113-19">ACKNOWLEDGMENT</hd> <p>We acknowledge Jason Dalmazzo for support with data access and management of the ES<sups>3</sups> project. Open access publishing facilitated by The University of Sydney, as part of the Wiley ‐ The University of Sydney agreement via the Council of Australian University Librarians.</p> <hd id="AN0174546113-20">FUNDING INFORMATION</hd> <p>National Library of Medicine, National Institutes of Health R01LM012976.</p> <hd id="AN0174546113-21">CONFLICT OF INTEREST STATEMENT</hd> <p>The authors declare no conflict of interest.</p> <hd id="AN0174546113-22">DATA AVAILABILITY STATEMENT</hd> <p>The data that support the findings of this study are openly available in Harvard Dataverse at https://doi.org/10.7910/DVN/VQOFCW</p> <p>GRAPH: Data S1: Supporting Information.</p> <ref id="AN0174546113-23"> <title> REFERENCES </title> <blist> <bibl id="bib1" idref="ref1" type="bt">1</bibl> <bibtext> Pham B, Bagheri E, Rios P, et al. Improving the conduct of systematic reviews: a process mining perspective. J Clin Epidemiol. 2018 ; 103 : 101 ‐ 111.</bibtext> </blist> <blist> <bibl id="bib2" idref="ref2" type="bt">2</bibl> <bibtext> Page MJ, Moher D. Mass production of systematic reviews and meta‐analyses: an exercise in mega‐silliness?: commentary: mass production of systematic reviews and meta‐analyses. Milbank Q. 2016 ; 94 (3): 515 ‐ 519.</bibtext> </blist> <blist> <bibl id="bib3" idref="ref5" type="bt">3</bibl> <bibtext> Dunn AG, Bourgeois FT. Is it time for computable evidence synthesis? J Am Med Inform Assoc. 2020 ; 27 (6): 972 ‐ 975.</bibtext> </blist> <blist> <bibl id="bib4" idref="ref3" type="bt">4</bibl> <bibtext> Pieper D, Antoine SL, Neugebauer EAM, Eikermann M. Up‐to‐dateness of reviews is often neglected in overviews: a systematic review. J Clin Epidemiol. 2014 ; 67 (12): 1302 ‐ 1308.</bibtext> </blist> <blist> <bibl id="bib5" idref="ref4" type="bt">5</bibl> <bibtext> Zarin DA, Tse T. Sharing Individual Participant Data (IPD) within the context of the Trial Reporting System (TRS). PLoS Med. 2016 ; 13 (1): e1001946.</bibtext> </blist> <blist> <bibl id="bib6" idref="ref6" type="bt">6</bibl> <bibtext> Mortensen ML, Adam GP, Trikalinos TA, Kraska T, Wallace BC. An exploration of crowdsourcing citation screening for systematic reviews. Res Syn Meth. 2017 ; 8 (3): 366 ‐ 386.</bibtext> </blist> <blist> <bibl id="bib7" type="bt">7</bibl> <bibtext> Pianta MJ, Makrai E, Verspoor KM, Cohn TA, Downie LE. Crowdsourcing critical appraisal of research evidence (CrowdCARE) was found to be a valid approach to assessing clinical research quality. J Clin Epidemiol. 2018 ; 104 : 8 ‐ 14.</bibtext> </blist> <blist> <bibl id="bib8" idref="ref7" type="bt">8</bibl> <bibtext> Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016 ; 5 (1): 210.</bibtext> </blist> <blist> <bibl id="bib9" idref="ref38" type="bt">9</bibl> <bibtext> Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploying an interactive machine learning system in an evidence‐based practice center: abstrackr. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium [Internet]. ACM; 2012:819‐824. doi: 10.1145/2110363.2110464</bibtext> </blist> <blist> <bibtext> Smalheiser NR, Swanson DR. Using ARROWSMITH: a computer‐assisted approach to formulating and assessing scientific hypotheses. Comput Methods Programs Biomed. 1998 ; 57 (3): 149 ‐ 153.</bibtext> </blist> <blist> <bibtext> Shekelle PG, Shetty K, Newberry S, Maglione M, Motala A. Machine learning versus standard techniques for updating searches for systematic reviews: a diagnostic accuracy study. Ann Intern Med. 2017 ; 167 (3): 213 ‐ 215.</bibtext> </blist> <blist> <bibtext> Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014 ; 3 (1): 74.</bibtext> </blist> <blist> <bibtext> Beller E, Clark J, Tsafnat G, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018 ; 7 (1): 77.</bibtext> </blist> <blist> <bibtext> O'Connor AM, Tsafnat G, Gilbert SB, et al. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev. 2019 ; 8 (1): 57.</bibtext> </blist> <blist> <bibtext> Arno A, Thomas J, Wallace B, Marshall IJ, McKenzie JE, Elliott JH. Accuracy and efficiency of machine learning–assisted risk‐of‐bias assessments in "real‐world" systematic reviews: a noninferiority randomized controlled trial. Ann Intern Med. 2022 ; 175 (7): 1001 ‐ 1009.</bibtext> </blist> <blist> <bibtext> Adam GP, Wallace BC, Trikalinos TA. Semi‐automated tools for systematic searches. In: Evangelou E, Veroniki AA, eds. Meta‐Research [Internet]. Springer US ; 2022 : 17 ‐ 40. doi: 10.1007/978‐1‐0716‐1566‐9_2</bibtext> </blist> <blist> <bibtext> de Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annu Symp Proc. 2008 ; 6 (2008): 141 ‐ 145.</bibtext> </blist> <blist> <bibtext> Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010 ; 10 (1): 56.</bibtext> </blist> <blist> <bibtext> Marshall IJ, Wallace BC. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev. 2019 ; 8 (1): 163.</bibtext> </blist> <blist> <bibtext> Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015 ; 4 (1): 78.</bibtext> </blist> <blist> <bibtext> Nakagawa S, Dunn AG, Lagisz M, et al. A new ecosystem for evidence synthesis. Nat Ecol Evol. 2020 ; 4 (4): 498 ‐ 501.</bibtext> </blist> <blist> <bibtext> Elliott JH, Turner T, Clavisi O, et al. Living systematic reviews: an emerging opportunity to narrow the evidence‐practice gap. PLoS Med. 2014 ; 11 (2): e1001603.</bibtext> </blist> <blist> <bibtext> Millard T, Synnot A, Elliott J, Green S, McDonald S, Turner T. Feasibility and acceptability of living systematic reviews: results from a mixed‐methods evaluation. Syst Rev. 2019 ; 8 (1): 325.</bibtext> </blist> <blist> <bibtext> Thomas J, Noel‐Storr A, Marshall I, et al. Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol. 2017 ; 91 : 31 ‐ 37.</bibtext> </blist> <blist> <bibtext> Martin P, Surian D, Bashir R, Bourgeois FT, Dunn AG. Trial2rev: combining machine learning and crowd‐sourcing to create a shared space for updating systematic reviews. JAMIA Open. 2019 ; 2 (1): 15 ‐ 22.</bibtext> </blist> <blist> <bibtext> O'Mara‐Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015 ; 4 (1): 5.</bibtext> </blist> <blist> <bibtext> Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi‐automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010 ; 11 (1): 55.</bibtext> </blist> <blist> <bibtext> Pradhan R, Hoaglin DC, Cornell M, Liu W, Wang V, Yu H. Automatic extraction of quantitative data from ClinicalTrials.gov to conduct meta‐analyses. J Clin Epidemiol. 2019 ; 105 : 92 ‐ 100.</bibtext> </blist> <blist> <bibtext> Huser V, Cimino JJ. Linking ClinicalTrials.gov and PubMed to track results of interventional human clinical trials. PloS One. 2013 ; 8 (7): e68409.</bibtext> </blist> <blist> <bibtext> Surian D, Dunn AG, Orenstein L, Bashir R, Coiera E, Bourgeois FT. A shared latent space matrix factorisation method for recommending new trial evidence for systematic review updates. J Biomed Inform. 2018 ; 79 : 32 ‐ 40.</bibtext> </blist> <blist> <bibtext> Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ. Extracting PICO sentences from clinical trial reports using supervised distant supervision. J Mach Learn Res. 2016 ; 17 : 132.</bibtext> </blist> <blist> <bibtext> Liu S. Replication data for: a comparison of machine learning methods for recommending clinical trials for inclusion in new systematic reviews [Internet]. Harvard Dataverse; 2022. doi: 10.7910/DVN/VQOFCW</bibtext> </blist> <blist> <bibtext> Liu S. Clinical trial recommendation for systematic reviews [Internet]. Zenodo ; 2022. doi: 10.5281/zenodo.7384447</bibtext> </blist> <blist> <bibtext> Chen KY, Borglund EM, Postema EC, Dunn AG, Bourgeois FT. Reporting of clinical trial safety results in ClinicalTrials.gov for FDA‐approved drugs: a cross‐sectional analysis. Clin Trials. 2022 ; 19 (4): 442 ‐ 451.</bibtext> </blist> <blist> <bibtext> Liu S, Bourgeois FT, Dunn AG. Identifying unreported links between ClinicalTrials.Gov trial registrations and their published results. Research synthesis. Methods. 2022 ; 13 (3): 342 ‐ 352.</bibtext> </blist> <blist> <bibtext> Surian D, Bourgeois FT, Dunn AG. The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.Gov registrations. BMC Med Res Methodol. 2021 ; 21 (1): 281.</bibtext> </blist> <blist> <bibtext> Liu S, Sun Y, Li B, Wang W, Bourgeois FT, Dunn AG. Sent2Span: span detection for PICO extraction in the biomedical text without span annotations. In: Findings of the Association for Computational Linguistics: EMNLP 2021 [Internet]. Association for Computational Linguistics; 2021. https://aclanthology.org/2021.findings-emnlp.147</bibtext> </blist> <blist> <bibtext> Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010 ; 17 (3): 229 ‐ 236.</bibtext> </blist> <blist> <bibtext> Bui DDA, Jonnalagadda S, Del Fiol G. Automatically finding relevant citations for clinical guideline development. J Biomed Inform. 2015 ; 57 : 436 ‐ 445.</bibtext> </blist> <blist> <bibtext> Wang S, Scells H, Mourad A, Zuccon G. Seed‐driven document ranking for systematic reviews: a reproducibility study. In: Hagen M, Verberne S, Macdonald C, et al., eds. Advances in Information Retrieval [Internet]. Springer International Publishing ; 2022 : 686 ‐ 700. doi: 10.1007/978‐3‐030‐99736‐6_46</bibtext> </blist> <blist> <bibtext> Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013 ; 10 (346): f139.</bibtext> </blist> </ref> <aug> <p>By Shifeng Liu; Florence T. Bourgeois; Claire Narang and Adam G. Dunn</p> <p>Reported by Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib10" firstref="ref8"></nolink> <nolink nlid="nl2" bibid="bib11" firstref="ref9"></nolink> <nolink nlid="nl3" bibid="bib13" firstref="ref10"></nolink> <nolink nlid="nl4" bibid="bib15" firstref="ref11"></nolink> <nolink nlid="nl5" bibid="bib17" firstref="ref12"></nolink> <nolink nlid="nl6" bibid="bib19" firstref="ref13"></nolink> <nolink nlid="nl7" bibid="bib21" firstref="ref14"></nolink> <nolink nlid="nl8" bibid="bib23" firstref="ref15"></nolink> <nolink nlid="nl9" bibid="bib25" firstref="ref16"></nolink> <nolink nlid="nl10" bibid="bib12" firstref="ref17"></nolink> <nolink nlid="nl11" bibid="bib16" firstref="ref18"></nolink> <nolink nlid="nl12" bibid="bib26" firstref="ref19"></nolink> <nolink nlid="nl13" bibid="bib28" firstref="ref20"></nolink> <nolink nlid="nl14" bibid="bib30" firstref="ref21"></nolink> <nolink nlid="nl15" bibid="bib31" firstref="ref22"></nolink> <nolink nlid="nl16" bibid="bib32" firstref="ref23"></nolink> <nolink nlid="nl17" bibid="bib33" firstref="ref25"></nolink> <nolink nlid="nl18" bibid="bib34" firstref="ref27"></nolink> <nolink nlid="nl19" bibid="bib36" firstref="ref28"></nolink> <nolink nlid="nl20" bibid="bib37" firstref="ref29"></nolink> <nolink nlid="nl21" bibid="bib38" firstref="ref30"></nolink> <nolink nlid="nl22" bibid="bib35" firstref="ref34"></nolink> <nolink nlid="nl23" bibid="bib39" firstref="ref35"></nolink> <nolink nlid="nl24" bibid="bib173" firstref="ref36"></nolink> <nolink nlid="nl25" bibid="bib41" firstref="ref43"></nolink>
Header	DbId: eric DbLabel: ERIC An: EJ1405499 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: A Comparison of Machine Learning Methods to Find Clinical Trials for Inclusion in New Systematic Reviews from Their PROSPERO Registrations Prior to Searching and Screening – Name: Language Label: Language Group: Lang Data: English – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Shifeng+Liu%22">Shifeng Liu</searchLink><br /><searchLink fieldCode="AR" term="%22Florence+T%2E+Bourgeois%22">Florence T. Bourgeois</searchLink><br /><searchLink fieldCode="AR" term="%22Claire+Narang%22">Claire Narang</searchLink><br /><searchLink fieldCode="AR" term="%22Adam+G%2E+Dunn%22">Adam G. Dunn</searchLink> (ORCID <externalLink term="https://orcid.org/0000-0002-1720-8209">0000-0002-1720-8209</externalLink>) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="SO" term="%22Research+Synthesis+Methods%22"><i>Research Synthesis Methods</i></searchLink>. 2024 15(1):73-85. – Name: Avail Label: Availability Group: Avail Data: Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www.wiley.com/en-us – Name: PeerReviewed Label: Peer Reviewed Group: SrcInfo Data: Y – Name: Pages Label: Page Count Group: Src Data: 13 – Name: DatePubCY Label: Publication Date Group: Date Data: 2024 – Name: SourceSuprt Label: Sponsoring Agency Group: SrcSuprt Data: National Library of Medicine (DHHS/NIH) – Name: NumberContract Label: Contract Number Group: NumCntrct Data: R01LM012976 – Name: TypeDocument Label: Document Type Group: TypDoc Data: Journal Articles<br />Reports - Research – Name: Subject Label: Descriptors Group: Su Data: <searchLink fieldCode="DE" term="%22Artificial+Intelligence%22">Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Medical+Research%22">Medical Research</searchLink><br /><searchLink fieldCode="DE" term="%22Experimental+Groups%22">Experimental Groups</searchLink><br /><searchLink fieldCode="DE" term="%22Control+Groups%22">Control Groups</searchLink><br /><searchLink fieldCode="DE" term="%22Documentation%22">Documentation</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Software+Evaluation%22">Computer Software Evaluation</searchLink><br /><searchLink fieldCode="DE" term="%22Data+Collection%22">Data Collection</searchLink><br /><searchLink fieldCode="DE" term="%22Data+Analysis%22">Data Analysis</searchLink><br /><searchLink fieldCode="DE" term="%22Journal+Articles%22">Journal Articles</searchLink><br /><searchLink fieldCode="DE" term="%22Performance+Factors%22">Performance Factors</searchLink> – Name: DOI Label: DOI Group: ID Data: 10.1002/jrsm.1672 – Name: ISSN Label: ISSN Group: ISSN Data: 1759-2879<br />1759-2887 – Name: Abstract Label: Abstract Group: Ab Data: Searching for trials is a key task in systematic reviews and a focus of automation. Previous approaches required knowing examples of relevant trials in advance, and most methods are focused on published trial articles. To complement existing tools, we compared methods for finding relevant trial registrations given a International Prospective Register of Systematic Reviews (PROSPERO) entry and where no relevant trials have been screened for inclusion in advance. We compared SciBERT-based (extension of Bidirectional Encoder Representations from Transformers) PICO extraction, MetaMap, and term-based representations using an imperfect dataset mined from 3632 PROSPERO entries connected to a subset of 65,662 trial registrations and 65,834 trial articles known to be included in systematic reviews. Performance was measured by the median rank and recall by rank of trials that were eventually included in the published systematic reviews. When ranking trial registrations relative to PROSPERO entries, 296 trial registrations needed to be screened to identify half of the relevant trials, and the best performing approach used a basic term-based representation. When ranking trial articles relative to PROSPERO entries, 162 trial articles needed to be screened to identify half of the relevant trials, and the best-performing approach used a term-based representation. The results show that MetaMap and term-based representations outperformed approaches that included PICO extraction for this use case. The results suggest that when starting with a PROSPERO entry and where no trials have been screened for inclusion, automated methods can reduce workload, but additional processes are still needed to efficiently identify trial registrations or trial articles that meet the inclusion criteria of a systematic review. – Name: AbstractInfo Label: Abstractor Group: Ab Data: As Provided – Name: DateEntry Label: Entry Date Group: Date Data: 2024 – Name: AN Label: Accession Number Group: ID Data: EJ1405499
PLink	https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=eric&AN=EJ1405499
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1002/jrsm.1672 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 13 StartPage: 73 Subjects: – SubjectFull: Artificial Intelligence Type: general – SubjectFull: Medical Research Type: general – SubjectFull: Experimental Groups Type: general – SubjectFull: Control Groups Type: general – SubjectFull: Documentation Type: general – SubjectFull: Computer Software Evaluation Type: general – SubjectFull: Data Collection Type: general – SubjectFull: Data Analysis Type: general – SubjectFull: Journal Articles Type: general – SubjectFull: Performance Factors Type: general Titles: – TitleFull: A Comparison of Machine Learning Methods to Find Clinical Trials for Inclusion in New Systematic Reviews from Their PROSPERO Registrations Prior to Searching and Screening Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Shifeng Liu – PersonEntity: Name: NameFull: Florence T. Bourgeois – PersonEntity: Name: NameFull: Claire Narang – PersonEntity: Name: NameFull: Adam G. Dunn IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Type: published Y: 2024 Identifiers: – Type: issn-print Value: 1759-2879 – Type: issn-electronic Value: 1759-2887 Numbering: – Type: volume Value: 15 – Type: issue Value: 1 Titles: – TitleFull: Research Synthesis Methods Type: main
ResultId	1