2025 · npj Drug Discovery · Nature · added 2026-04-21
Structure-based drug design is rapidly evolving, driven by advances in both physics-based and knowledge-based methods. These computational approaches are increasingly integrated across all stages of d Show more
Structure-based drug design is rapidly evolving, driven by advances in both physics-based and knowledge-based methods. These computational approaches are increasingly integrated across all stages of drug discovery. Despite remarkable progress, challenges remain in achieving accuracy, generalizability, computational efficiency, and chemical synthesizability. In this review, we provide a critical overview of advances, strengths, and limitations of recent methods. We also discuss synergies between the two concepts that hold promises for future advancements towards their practical applicability. Show less
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive Show more
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive research conducted in this field, it remains an emerging area,
particularly with the advent of machine learning, deep learning, and large
language models (LLMs). This systematic review examines the
integration of machine learning and deep learning techniques in drug
discovery, concentrating on three critical areas: drug−drug interactions
(DDIs), drug-target interactions (DTIs), and adverse drug reactions
(ADRs). The review analyzes over 100 papers published between 2020
and 2025, categorizing the methods into deep learning, machine learning,
graph learning, and hybrid models. It highlights the transformative impact
of natural language processing (NLP) and LLMs in extracting meaningful
insights from biomedical literature and chemical data. Furthermore, this work introduces key databases and data sets widely utilized
in drug discovery. Additionally, this review identifies gaps in the existing research, such as the lack of comprehensive studies that
simultaneously address DDI, DTI, and ADR extraction, and it proposes a more holistic approach to fill these gaps. The paper
concludes by thoroughly evaluating various models, underscoring their performance metrics. Show less
Over the past decade, collective intelligence, i.e., the intelligence that emerges from collective efforts, has transformed complex problem-solving and decision-making. In drug discovery, decision-mak Show more
Over the past decade, collective intelligence, i.e., the intelligence that emerges from collective efforts, has transformed complex problem-solving and decision-making. In drug discovery, decision-making often relies on medicinal chemistry intuition. The present study explores the application of collective intelligence in drug discovery, focusing on lead optimization. Ninety-two Sanofi researchers with diverse expertise participated anonymously in an exercise centered on ADMET-related questions. Their feedback was used to build a collective intelligence agent, which was compared to an artificial intelligence model. The study led to three major conclusions: first, collective intelligence improves decision-making in optimizing ADMET endpoints, compared to individual decisions. Second, collective intelligence outperforms artificial intelligence for all other endpoints but hERG inhibition. Finally, we observe complementarity between collective human and artificial intelligence. Overall, this research highlights the potential of collective intelligence in drug discovery and the importance of a synergistic approach combining human and artificial intelligence in project decision making. Show less
Computational metabolomics will be established in drug discovery and research on complex biological networks. This field of research enhances the detection of metabolic biomarkers and the prediction o Show more
Computational metabolomics will be established in drug discovery and research on complex biological networks. This field of research enhances the detection of metabolic biomarkers and the prediction of molecular interactions by combining multiscale analysis with in silico and molecular docking methods. These include nuclear magnetic resonance, mass spectrometry, and innovative bioinformatics, which enable the accurate generation and characterization of metabolomes. Molecular docking is a crucial tool for simulating the interaction between ligands and receptors, thereby facilitating the identification of potential therapeutics. It also discusses the potential of metabolomics to inform drug modes of action, from pharmacokinetics to forecasting toxicity, thereby streamlining drug development pipelines. We highlight applications in anticancer, antimicrobial, and antiviral drug discovery and explain how these computational models can accelerate target validation and enhance the accuracy of therapeutic strategies. In addition, this review addresses the current challenges and future directions for computational techniques in conjunction with experimental data to advance personalized medicine. In conclusion, this review aims to highlight the prospective approaches of computational metabolomics and molecular docking that identify evolutionary adaptive metabolisms of multiscale biological systems through their synergistic utilization to overcome the key hurdles involved in both drug discovery and metabolomic research. Show less
PandaOmics is a cloud-based software platform that applies artificial intelligence and bioinformatics techniques to multimodal omics and biomedical text data for therapeutic target and biomarker disco Show more
PandaOmics is a cloud-based software platform that applies artificial intelligence and bioinformatics techniques to multimodal omics and biomedical text data for therapeutic target and biomarker discovery. PandaOmics generates novel and repurposed therapeutic target and biomarker hypotheses with the desired properties and is available through licensing or collaboration. Targets and biomarkers generated by the platform were previously validated in both in vitro and in vivo studies. PandaOmics is a core component of Insilico Medicine's Pharma.ai drug discovery suite, which also includes Chemistry42 for the de novo generation of novel small molecules, and inClinico─a data-driven multimodal platform that forecasts a clinical trial's probability of successful transition from phase 2 to phase 3. In this paper, we demonstrate how the PandaOmics platform can efficiently identify novel molecular targets and biomarkers for various diseases. Show less
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously descri Show more
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements. Show less
2024 · Current Drug Targets · Bentham Science · added 2026-04-21
Background: Drug discovery is a complex and expensive procedure involving several
timely and costly phases through which new potential pharmaceutical compounds must pass to get
approved. One of these Show more
Background: Drug discovery is a complex and expensive procedure involving several
timely and costly phases through which new potential pharmaceutical compounds must pass to get
approved. One of these critical steps is the identification and optimization of lead compounds,
which has been made more accessible by the introduction of computational methods, including
deep learning (DL) techniques. Diverse DL model architectures have been put forward to learn the
vast landscape of interaction between proteins and ligands and predict their affinity, helping in the
identification of lead compounds.
ARTICLE HISTORY
Objective: This survey fills a gap in previous research by comprehensively analyzing the most
commonly used datasets and discussing their quality and limitations. It also offers a comprehensive classification of the most recent DL methods in the context of protein-ligand binding affinity
prediction (BAP), providing a fresh perspective on this evolving field.
Received: June 07, 2024
Revised: August 11, 2024
Accepted: August 19, 2024
Methods: We thoroughly examine commonly used datasets for BAP and their inherent characteristics. Our exploration extends to various preprocessing steps and DL techniques, including graph
neural networks, convolutional neural networks, and transformers, which are found in the literaDOI:
10.2174/0113894501330963240905083020 ture. We conducted extensive literature research to ensure that the most recent deep learning approaches for BAP were included by the time of writing this manuscript.
Results: The systematic approach used for the present study highlighted inherent challenges to
BAP via DL, such as data quality, model interpretability, and explainability, and proposed considerations for future research directions. We present valuable insights to accelerate the development
of more effective and reliable DL models for BAP within the research community.
Conclusion: The present study can considerably enhance future research on predicting affinity between protein and ligand molecules, hence further improving the overall drug development process. Show less
2023 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic Show more
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable researchers to develop computational-based-DTI models, which play a pivotal role in drug repurposing and discovery. However, there still needs to develop a multimodal fusion DTI model that integrates available heterogeneous data into a unified framework. Results: We developed MDTips, a multimodal-data-based DTI prediction system, by fusing the knowledge graphs, gene expression profiles, and Show less
Chemistry42 is a software platform for de novo small molecule design and optimization that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methodol Show more
Chemistry42 is a software platform for de novo small molecule design and optimization that integrates Artificial Intelligence (AI) techniques with computational and medicinal chemistry methodologies. Chemistry42 efficiently generates novel molecular structures with optimized properties validated in both in vitro and in vivo studies and is available through licensing or collaboration. Chemistry42 is the core component of Insilico Medicine's Pharma.ai drug discovery suite. Pharma.ai also includes PandaOmics for target discovery and multiomics data analysis, and inClinico─a data-driven multimodal forecast of a clinical trial's probability of success (PoS). In this paper, we demonstrate how the platform can be used to efficiently find novel molecular structures against DDR1 and CDK20. Show less
Screening new drug-target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable Show more
Screening new drug-target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable researchers to develop computational-based-DTI models, which play a pivotal role in drug repurposing and discovery. However, there still needs to develop a multimodal fusion DTI model that integrates available heterogeneous data into a unified framework. Show less
Drug discovery (DD) is a time-consuming and expensive process. Thus, the industry
employs strategies such as drug repositioning and drug repurposing, which allows the application of
already approved d Show more
Drug discovery (DD) is a time-consuming and expensive process. Thus, the industry
employs strategies such as drug repositioning and drug repurposing, which allows the application of
already approved drugs to treat a different disease, as occurred in the first months of 2020, during the
COVID-19 pandemic. The prediction of drug–target interactions is an essential part of the DD process
because it can accelerate it and reduce the required costs. DTI prediction performed in silico have used
approaches based on molecular docking simulations, including similarity-based and network- and
graph-based ones. This paper presents MPS2IT-DTI, a DTI prediction model obtained from research
conducted in the following steps: the definition of a new method for encoding molecule and protein
sequences onto images; the definition of a deep-learning approach based on a convolutional neural
network in order to create a new method for DTI prediction. Training results conducted with the
Davis and KIBA datasets show that MPS2IT-DTI is viable compared to other state-of-the-art (SOTA)
approaches in terms of performance and complexity of the neural network model. With the Davis
dataset, we obtained 0.876 for the concordance index and 0.276 for the MSE; with the KIBA dataset,
we obtained 0.836 and 0.226 for the concordance index and the MSE, respectively. Moreover, the
MPS2IT-DTI model represents molecule and protein sequences as images, instead of treating them as
an NLP task, and as such, does not employ an embedding layer, which is present in other models.
Academic Editors: Kyriakos
Kachrimanis, David Barlow, Jakub Show less
Glioblastoma (GBM) is a highly malignant brain tumor characterized by a heterogeneous population of genetically unstable and highly infiltrative cells that are resistant to chemotherapy. Although subs Show more
Glioblastoma (GBM) is a highly malignant brain tumor characterized by a heterogeneous population of genetically unstable and highly infiltrative cells that are resistant to chemotherapy. Although substantial efforts have been invested in the field of anti-GBM drug discovery in the past decade, success has primarily been confined to the preclinical level, and clinical studies have often been hampered due to efficacy-, selectivity-, or physicochemical property-related issues. Thus, expansion of the list of molecular targets coupled with a pragmatic design of new small-molecule inhibitors with central nervous system (CNS)-penetrating ability is required to steer the wheels of anti-GBM drug discovery endeavors. This Perspective presents various aspects of drug discovery (challenges in GBM drug discovery and delivery, therapeutic targets, and agents under clinical investigation). The comprehensively covered sections include the recent medicinal chemistry campaigns embarked upon to validate the potential of numerous enzymes/proteins/receptors as therapeutic targets in GBM. Show less
2022 · RSC Chemical Biology · Royal Society of Chemistry · added 2026-04-21
This review summarises different data, data resources and methods for computational mechanism of action (MoA) analysis, and highlights some case studies where integration of data types and methods ena Show more
This review summarises different data, data resources and methods for computational mechanism of action (MoA) analysis, and highlights some case studies where integration of data types and methods enabled MoA elucidation on the systems-level. Show less
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in com Show more
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years. Show less
The emergence and continued global spread of the current COVID-19 pandemic has highlighted the need for methods to identify novel or repurposed therapeutic drugs in a fast and effective way. Despite t Show more
The emergence and continued global spread of the current COVID-19 pandemic has highlighted the need for methods to identify novel or repurposed therapeutic drugs in a fast and effective way. Despite the availability of methods for the discovery of antiviral drugs, the majority tend to focus on the effects of such drugs on a given virus, its constituent proteins, or enzymatic activity, often neglecting the consequences on host cells. This may lead to partial assessment of the efficacy of the tested anti-viral compounds, as potential toxicity impacting the overall physiology of host cells may mask the effects of both viral infection and drug candidates. Here we present a method able to assess the general health of host cells based on morphological profiling, for untargeted phenotypic drug screening against viral infections. Show less
To be effective as a drug, a potent molecule must reach its target in the body in sufficient concentration, and stay there in a bioactive form long enough for the expected biologic events to occur. Dr Show more
To be effective as a drug, a potent molecule must reach its target in the body in sufficient concentration, and stay there in a bioactive form long enough for the expected biologic events to occur. Drug development involves assessment of absorption, distribution, metabolism and excretion (ADME) increasingly earlier in the discovery process, at a stage when considered compounds are numerous but access to the physical samples is limited. In that context, computer models constitute valid alternatives to experiments. Here, we present the new SwissADME web tool that gives free access to a pool of fast yet robust predictive models for physicochemical properties, pharmacokinetics, drug-likeness and medicinal chemistry friendliness, among which in-house proficient methods such as the BOILED-Egg, iLOGP and Bioavailability Radar. Easy efficient input and interpretation are ensured thanks to a user-friendly interface through the login-free website http://www.swissadme.ch . Specialists, but also nonexpert in cheminformatics or computational chemistry can predict rapidly key parameters for a collection of molecules to support their drug discovery endeavours. Show less
2016 · Nucleic Acids Research · Oxford University Press · added 2026-04-21
IID (Integrated Interactions Database) is the first database providing tissue-specific protein–protein interactions (PPIs) for model organisms and human. IID covers six species (S. cerevisiae (yeast), Show more
IID (Integrated Interactions Database) is the first database providing tissue-specific protein–protein interactions (PPIs) for model organisms and human. IID covers six species (S. cerevisiae (yeast), C. elegans (worm), D. melonogaster (fly), R. norvegicus (rat), M. musculus (mouse) and H. sapiens (human)) and up to 30 tissues per species. Users query IID by providing a set of proteins or PPIs from any of these organisms, and specifying species and tissues where IID should search for interactions. If query proteins are not from the selected species, IID enables Show less
Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although the Show more
Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Show less
Analysis of the origins of new drugs approved by the US Food and Drug Administration (FDA) from 1999 to 2008 suggested that phenotypic screening strategies had been more productive than target-based a Show more
Analysis of the origins of new drugs approved by the US Food and Drug Administration (FDA) from 1999 to 2008 suggested that phenotypic screening strategies had been more productive than target-based approaches in the discovery of first-in-class small-molecule drugs. However, given the relatively recent introduction of target-based approaches in the context of the long time frames of drug development, their full impact might not yet have become apparent. Here, we present an analysis of the origins of all 113 first-in-class drugs approved by the FDA from 1999 to 2013, which shows that the majority (78) were discovered through target-based approaches (45 small-molecule drugs and 33 biologics). In addition, of 33 drugs identified in the absence of a target hypothesis, 25 were found through a chemocentric approach in which compounds with known pharmacology served as the starting point, with only eight coming from what we define here as phenotypic screening: testing a large number of compounds in a target-agnostic assay that monitors phenotypic changes. We also discuss the implications for drug discovery strategies, including viewing phenotypic screening as a novel discipline rather than as a neoclassical approach. Show less
The significant reduction in the number of newly approved drugs in the past decade has been partially attributed to failures in discovery and validation of new targets. Evaluation of recently approved Show more
The significant reduction in the number of newly approved drugs in the past decade has been partially attributed to failures in discovery and validation of new targets. Evaluation of recently approved new drugs has revealed that the number of approved drugs discovered through phenotypic screens, an original drug screening paradigm, has exceeded those discovered through the molecular target-based approach. Phenotypic screening is thus gaining new momentum in drug discovery with the hope that this approach may revitalize drug discovery and improve the success rate of drug approval through the discovery of viable lead compounds and identification of novel drug targets. Show less
Computational modeling has been adopted in all aspects of drug research and
development, from the early phases of target identification and drug discovery to the late-stage clinical
trials. The differe Show more
Gregori-Puigjané E, Setola V, Hert J+6 more · 2012 · Proceedings of the National Academy of Sciences of the United States of America · National Academy of Sciences · added 2026-04-20
Notwithstanding their key roles in therapy and as biological probes, 7% of approved drugs are purported to have no known primary target, and up to 18% lack a well-defined mechanism of action. Using a Show more
Notwithstanding their key roles in therapy and as biological probes, 7% of approved drugs are purported to have no known primary target, and up to 18% lack a well-defined mechanism of action. Using a chemoinformatics approach, we sought to "de-orphanize" drugs that lack primary targets. Surprisingly, targets could be easily predicted for many: Whereas these targets were not known to us nor to the common databases, most could be confirmed by literature search, leaving only 13 Food and Drug Administration-approved drugs with unknown targets; the number of drugs without molecular targets likely is far fewer than reported. The number of worldwide drugs without reasonable molecular targets similarly dropped, from 352 (25%) to 44 (4%). Nevertheless, there remained at least seven drugs for which reasonable mechanism-of-action targets were unknown but could be predicted, including the antitussives clemastine, cloperastine, and nepinalone; the antiemetic benzquinamide; the muscle relaxant cyclobenzaprine; the analgesic nefopam; and the immunomodulator lobenzarit. For each, predicted targets were confirmed experimentally, with affinities within their physiological concentration ranges. Turning this question on its head, we next asked which drugs were specific enough to act as chemical probes. Over 100 drugs met the standard criteria for probes, and 40 did so by more stringent criteria. A chemical information approach to drug-target association can guide therapeutic development and reveal applications to probe biology, a focus of much current interest. Show less
Progress in an understanding of the genetic basis of cancer coupled to molecular pharmacology of potential new anticancer drugs calls for new approaches that are able to address key issues in the drug Show more
Progress in an understanding of the genetic basis of cancer coupled to molecular pharmacology of potential new anticancer drugs calls for new approaches that are able to address key issues in the drug development process, including pharmacokinetic (PK) and pharmacodynamic (PD) relationships. The incorporation of predictive preclinical PK/PD models into rationally designed early-stage clinical trials offers a promising way to relieve a significant bottleneck in the drug discovery pipeline. The aim of the current review is to discuss some considerations for how quantitative PK and PD analyses for anticancer drugs may be conducted and integrated into a global translational effort, and the importance of examining drug disposition and dynamics in target tissues to support the development of preclinical PK/PD models that can be subsequently extrapolated to predict pharmacologic characteristics in patients. In this article, we describe three different physiologically based (PB) PK modeling approaches, i.e., the whole-body PBPK model, the hybrid PBPK model, and the two-pore model for macromolecules, as well as their applications. General conclusions are that greater effort should be made to generate more clinical data that could validate scaled preclinical PB-PK/PD tumor-based models and, thus, stimulate a framework for preclinical to clinical translation. Finally, given the innovative techniques to measure tissue drug concentrations and associated biomarkers of drug responses, development of predictive PK/PD models will become a standard approach for drug discovery and development. Show less
Guy W. Bemis, Mark A. Murcko · 1996 · Journal of Medicinal Chemistry · ACS Publications · added 2026-04-20
In order to better understand the common features present in drug molecules, we use shape description methods to analyze a database of commercially available drugs and prepare a list of common drug sh Show more
In order to better understand the common features present in drug molecules, we use shape description methods to analyze a database of commercially available drugs and prepare a list of common drug shapes. A useful way of organizing this structural data is to group the atoms of each drug molecule into ring, linker, framework, and side chain atoms. On the basis of the two-dimensional molecular structures (without regard to atom type, hybridization, and bond order), there are 1179 different frameworks among the 5120 compounds analyzed. However, the shapes of half of the drugs in the database are described by the 32 most frequently occurring frameworks. This suggests that the diversity of shapes in the set of known drugs is extremely low. In our second method of analysis, in which atom type, hybridization, and bond order are considered, more diversity is seen; there are 2506 different frameworks among the 5120 compounds in the database, and the most frequently occurring 42 frameworks account for only one-fourth of the drugs. We discuss the possible interpretations of these findings and the way they may be used to guide future drug discovery research. Show less
Polypharmacology has emerged as novel means in drug discovery for improving treatment response in clinical use. However,
to really capitalize on the polypharmacological effects of drugs, there is a cr Show more
Polypharmacology has emerged as novel means in drug discovery for improving treatment response in clinical use. However,
to really capitalize on the polypharmacological effects of drugs, there is a critical need to better model and understand how the complex
interactions between drugs and their cellular targets contribute to drug efficacy and possible side effects. Network graphs provide a convenient modeling framework for dealing with the fact that most drugs act on cellular systems through targeting multiple proteins both
through on-target and off-target binding. Network pharmacology models aim at addressing questions such as how and where in the disease network should one target to inhibit disease phenotypes, such as cancer growth, ideally leading to therapies that are less vulnerable
to drug resistance and side effects by means of attacking the disease network at the systems level through synergistic and synthetic lethal
interactions. Since the exponentially increasing number of potential drug target combinations makes pure experimental approach quickly
unfeasible, this review depicts a number of computational models and algorithms that can effectively reduce the search space for determining the most promising combinations for experimental evaluation. Such computational-experimental strategies are geared toward realizing the full potential of multi-target treatments in different disease phenotypes. Our specific focus is on system-level network approaches to polypharmacology designs in anticancer drug discovery, where we give representative examples of how network-centric
modeling may offer systematic strategies toward better understanding and even predicting the phenotypic responses to multi-target therapies. Show less