Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes.

Saved in:
Bibliographic Details
Title: Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes.
Authors: DeMeo, Benjamin (AUTHOR), Nesbitt, Charlotte (AUTHOR), Miller, Samuel A. (AUTHOR), Burkhardt, Daniel B. (AUTHOR), Lipchina, Inna (AUTHOR), Fu, Doris (AUTHOR), Holderrieth, Peter (AUTHOR), Kim, David (AUTHOR), Kolchenko, Sergey (AUTHOR), Szalata, Artur (AUTHOR), Gupta, Ishan (AUTHOR), Kerr, Christine (AUTHOR), Pfefer, Thomas (AUTHOR), Rojas-Rodriguez, Raziel (AUTHOR), Kuppassani, Sunil (AUTHOR), Kruidenier, Laurens (AUTHOR), Doshi, Parul B. (AUTHOR), Zamanighomi, Mahdi (AUTHOR), Collins, James J. (AUTHOR), Shalek, Alex K. (AUTHOR)
Source: Science. 11/27/2025, Vol. 390 Issue 6776, p1-13. 13p.
Subjects: Transcriptomes, Drug discovery, Active learning, Machine learning, Therapeutics, Phenotypes, Drugs
Abstract: Phenotypic drug screening remains constrained by the vastness of chemical space and the technical challenges of scaling experimental workflows. To overcome these barriers, computational methods have been developed to prioritize compounds, but they rely on either single-task models lacking generalizability or heuristic-based genomic proxies that resist optimization. We designed an active deep learning framework that leverages omics to enable scalable, optimizable identification of compounds that induce complex phenotypes. Our generalizable algorithm outperformed state-of-the-art models on classical recall, translating to a 13- to 17-fold increase in phenotypic hit rate across two hematological discovery campaigns. Combining this algorithm with a lab-in-the-loop signature refinement step, we achieved an additional twofold increase in hit rate along with molecular insights. In sum, our framework enables efficient phenotypic hit identification campaigns, with broad potential to accelerate drug discovery. Editor's summary: Despite recent advances in the methods used for high-throughput drug discovery, it remains challenging to identify effective drugs because of the vast number of potential candidate molecules and the difficulty of identifying useful drug targets. Instead of focusing on individual target proteins, DeMeo et al. took the approach of phenotypic screening, looking for compounds that induce the desired changes in cells' transcriptomic profiles, even if their specific targets are unknown. The authors developed a machine learning algorithm to effectively prioritize compounds for screening, and showed that this approach was more effective than existing methods. They also provided a demonstration of how their technique can be used to discover compounds for improving platelet production and identify their mechanisms of action. —Yevgeniya Nusinovich INTRODUCTION: Phenotypic drug discovery, powered by high-dimensional omics perturbation readouts and machine learning, could provide a practical way to address common, yet complex, multitarget responses often missed by single-target–centric strategies. However, generalizable, refinable frameworks are needed to efficiently convert omics signatures into actionable chemical leads. Our study addressed this unmet need. RATIONALE: Single-cell transcriptomic atlases capture disease states with high granularity. In parallel, transcriptomic signatures from perturbations catalog the gene-level impact of chemical interventions. We reasoned that matching perturbation-induced gene changes to gene expression differences associated with desired cellular state shifts could effectively prioritize compounds for phenotypic screening, reducing costs and allowing the use of high-fidelity, but low-throughput, assays in a drug discovery campaign. To this end, we developed DrugReflector, a deep learning model trained on the Connectivity Map (CMap; 9597 perturbations across 52 cell lines) that ranks molecules by their likelihood of inducing a user-defined change in gene expression, using gene signatures derived from single-cell atlases to prioritize chemical interventions. Additionally, we implemented an active learning approach to iteratively refine target signatures through paired transcriptional and phenotypic readouts, further potentiating hit identification. Together, these methods form a generalizable closed lab-in-the-loop cycle that could power drug discovery for complex cellular diseases. RESULTS: We introduced a perturbational single-cell RNA sequencing (scRNA-seq) dataset with 1.2 million cells spanning 88 perturbations across 10 primary and cancer cell lines. Using this dataset along with public perturbational omics data (held-out CMap and SciPlex signatures), we showed that DrugReflector robustly prioritizes compounds from transcriptional signatures even outside of its training context, consistently outperforming state-of-the-art approaches. Through two hematopoietic campaigns using single-cell atlas–defined cell state transitions as model inputs, we identified inducers of megakaryocyte and erythroid differentiation, achieving hit rates >10-fold higher than a random baseline. To assess generalizability, we additionally deployed DrugReflector in two distinct oncology indications, recovering clinical standards of care and modulators of known indication-specific pathways. To further characterize and leverage the transcriptional drivers of megakaryocyte induction, we created a time-course scRNA-seq dataset of hematopoietic stem and progenitor cells with paired flow cytometry readouts for a range of transcriptionally and phenotypically active compounds. By relating transcriptional and phenotypic changes, we refined the input megakaryopoiesis signature, yielding a further twofold hit rate improvement, in addition to a time-resolved understanding of the gene expression changes driving megakaryopoiesis and the cellular states most conducive to intervention. Follow-up CRISPR and chemogenetic studies confirmed two mechanistic classes of megakaryopoiesis inducers and offered evidence that partial inhibition of cholesterol biosynthesis alone can direct human progenitors toward the megakaryocyte lineage. CONCLUSION: Deep learning models trained on transcriptomics can efficiently bridge disease biology and chemical interventions, powering identification of molecules modulating complex cellular processes. We introduce a lab-in-the-loop active reinforcement learning framework combining machine learning with transcriptomic and phenotypic screening to iteratively improve efficiency in drug discovery campaigns. By embracing the full complexity of cellular disease, this framework uncovers druggable nodes and enables the discovery of compounds that modulate phenotype through complex mechanisms. A lab-in-the-loop framework improves efficiency in drug discovery campaigns and uncovers unexplored drug targets.: We applied deep learning to connect disease biology to chemistry using transcriptomics, markedly improving hit rates in complex phenotypic screens. Combining phenotypic measurements with transcription, we then used active learning to further improve our phenotypic hit rate. Ery, erythroid; MEP, megakaryocyte erythroid progenitor; Mk, megakaryocyte. [ABSTRACT FROM AUTHOR]
Copyright of Science is the property of American Association for the Advancement of Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Psychology and Behavioral Sciences Collection
Full text is not displayed to guests.
Description
Abstract:Phenotypic drug screening remains constrained by the vastness of chemical space and the technical challenges of scaling experimental workflows. To overcome these barriers, computational methods have been developed to prioritize compounds, but they rely on either single-task models lacking generalizability or heuristic-based genomic proxies that resist optimization. We designed an active deep learning framework that leverages omics to enable scalable, optimizable identification of compounds that induce complex phenotypes. Our generalizable algorithm outperformed state-of-the-art models on classical recall, translating to a 13- to 17-fold increase in phenotypic hit rate across two hematological discovery campaigns. Combining this algorithm with a lab-in-the-loop signature refinement step, we achieved an additional twofold increase in hit rate along with molecular insights. In sum, our framework enables efficient phenotypic hit identification campaigns, with broad potential to accelerate drug discovery. Editor's summary: Despite recent advances in the methods used for high-throughput drug discovery, it remains challenging to identify effective drugs because of the vast number of potential candidate molecules and the difficulty of identifying useful drug targets. Instead of focusing on individual target proteins, DeMeo et al. took the approach of phenotypic screening, looking for compounds that induce the desired changes in cells' transcriptomic profiles, even if their specific targets are unknown. The authors developed a machine learning algorithm to effectively prioritize compounds for screening, and showed that this approach was more effective than existing methods. They also provided a demonstration of how their technique can be used to discover compounds for improving platelet production and identify their mechanisms of action. —Yevgeniya Nusinovich INTRODUCTION: Phenotypic drug discovery, powered by high-dimensional omics perturbation readouts and machine learning, could provide a practical way to address common, yet complex, multitarget responses often missed by single-target–centric strategies. However, generalizable, refinable frameworks are needed to efficiently convert omics signatures into actionable chemical leads. Our study addressed this unmet need. RATIONALE: Single-cell transcriptomic atlases capture disease states with high granularity. In parallel, transcriptomic signatures from perturbations catalog the gene-level impact of chemical interventions. We reasoned that matching perturbation-induced gene changes to gene expression differences associated with desired cellular state shifts could effectively prioritize compounds for phenotypic screening, reducing costs and allowing the use of high-fidelity, but low-throughput, assays in a drug discovery campaign. To this end, we developed DrugReflector, a deep learning model trained on the Connectivity Map (CMap; 9597 perturbations across 52 cell lines) that ranks molecules by their likelihood of inducing a user-defined change in gene expression, using gene signatures derived from single-cell atlases to prioritize chemical interventions. Additionally, we implemented an active learning approach to iteratively refine target signatures through paired transcriptional and phenotypic readouts, further potentiating hit identification. Together, these methods form a generalizable closed lab-in-the-loop cycle that could power drug discovery for complex cellular diseases. RESULTS: We introduced a perturbational single-cell RNA sequencing (scRNA-seq) dataset with 1.2 million cells spanning 88 perturbations across 10 primary and cancer cell lines. Using this dataset along with public perturbational omics data (held-out CMap and SciPlex signatures), we showed that DrugReflector robustly prioritizes compounds from transcriptional signatures even outside of its training context, consistently outperforming state-of-the-art approaches. Through two hematopoietic campaigns using single-cell atlas–defined cell state transitions as model inputs, we identified inducers of megakaryocyte and erythroid differentiation, achieving hit rates >10-fold higher than a random baseline. To assess generalizability, we additionally deployed DrugReflector in two distinct oncology indications, recovering clinical standards of care and modulators of known indication-specific pathways. To further characterize and leverage the transcriptional drivers of megakaryocyte induction, we created a time-course scRNA-seq dataset of hematopoietic stem and progenitor cells with paired flow cytometry readouts for a range of transcriptionally and phenotypically active compounds. By relating transcriptional and phenotypic changes, we refined the input megakaryopoiesis signature, yielding a further twofold hit rate improvement, in addition to a time-resolved understanding of the gene expression changes driving megakaryopoiesis and the cellular states most conducive to intervention. Follow-up CRISPR and chemogenetic studies confirmed two mechanistic classes of megakaryopoiesis inducers and offered evidence that partial inhibition of cholesterol biosynthesis alone can direct human progenitors toward the megakaryocyte lineage. CONCLUSION: Deep learning models trained on transcriptomics can efficiently bridge disease biology and chemical interventions, powering identification of molecules modulating complex cellular processes. We introduce a lab-in-the-loop active reinforcement learning framework combining machine learning with transcriptomic and phenotypic screening to iteratively improve efficiency in drug discovery campaigns. By embracing the full complexity of cellular disease, this framework uncovers druggable nodes and enables the discovery of compounds that modulate phenotype through complex mechanisms. A lab-in-the-loop framework improves efficiency in drug discovery campaigns and uncovers unexplored drug targets.: We applied deep learning to connect disease biology to chemistry using transcriptomics, markedly improving hit rates in complex phenotypic screens. Combining phenotypic measurements with transcription, we then used active learning to further improve our phenotypic hit rate. Ery, erythroid; MEP, megakaryocyte erythroid progenitor; Mk, megakaryocyte. [ABSTRACT FROM AUTHOR]
ISSN:00368075
DOI:10.1126/science.adi8577