2026 · European Journal of Applied Physiology · Springer · added 2026-04-21
Proteomics has matured into a discipline capable of quantifying nearly every protein encoded by the genome, yet it remains largely blind to the true operational units of physiology: proteoforms. Each Show more
Proteomics has matured into a discipline capable of quantifying nearly every protein encoded by the genome, yet it remains largely blind to the true operational units of physiology: proteoforms. Each proteoform—defined by a specific sequence and post-translationally modified state—represents a unique molecular identity with distinct chemical, functional, and structural properties. This review proposes the proteoform functor: a mathematical map between the abstract proteoform state space and the realised physiological space of biological function—and ultimately complex phenotypes. Show less
Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules-based softwa Show more
Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules-based software integrated in electronic health records (EHRs) has demonstrated proficiency in identifying DDIs present in medication regimens, large language model (LLM) based identification requires thorough benchmarking and performance evaluation using high-quality datasets for safe use. The purpose of this study was to develop a series of performance benchmarking experiments specifically for LLM performance in identification and management of DDIs using a specifically curated clinician-annotated dataset of clinically-relevant DDIs. Show less
2026 · Nucleic acids research · Oxford University Press · added 2026-04-21
Biomedical research benefits from the rapid growth and diversity of experimentally detected protein–protein interactions (PPIs) by gaining important biological insights. However, increasingly dense PP Show more
Biomedical research benefits from the rapid growth and diversity of experimentally detected protein–protein interactions (PPIs) by gaining important biological insights. However, increasingly dense PPI networks can be challenging to interpret and apply. The 2025 update of the Integrated Interactions Database (IID) enhances accessibility and utility through several new features. We identify and incorporate network structural components from co-purified protein sets, as well as curated and predicted complexes, enabling users to explore network organization Show less
Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants-particularly collecting Show more
Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants-particularly collecting literature-based evidence like ACMG/AMP PM3-remains complex and time-consuming. Show less
Computational metabolomics will be established in drug discovery and research on complex biological networks. This field of research enhances the detection of metabolic biomarkers and the prediction o Show more
Computational metabolomics will be established in drug discovery and research on complex biological networks. This field of research enhances the detection of metabolic biomarkers and the prediction of molecular interactions by combining multiscale analysis with in silico and molecular docking methods. These include nuclear magnetic resonance, mass spectrometry, and innovative bioinformatics, which enable the accurate generation and characterization of metabolomes. Molecular docking is a crucial tool for simulating the interaction between ligands and receptors, thereby facilitating the identification of potential therapeutics. It also discusses the potential of metabolomics to inform drug modes of action, from pharmacokinetics to forecasting toxicity, thereby streamlining drug development pipelines. We highlight applications in anticancer, antimicrobial, and antiviral drug discovery and explain how these computational models can accelerate target validation and enhance the accuracy of therapeutic strategies. In addition, this review addresses the current challenges and future directions for computational techniques in conjunction with experimental data to advance personalized medicine. In conclusion, this review aims to highlight the prospective approaches of computational metabolomics and molecular docking that identify evolutionary adaptive metabolisms of multiscale biological systems through their synergistic utilization to overcome the key hurdles involved in both drug discovery and metabolomic research. Show less
2025 · Therapeutic advances in drug safety · SAGE Publications · added 2026-04-21
Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural Show more
Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions. Objective: This study aimed to evaluate whether transformer-based models fine-tuned on a general ADR dataset can effectively classify ADRs from tweets related to glucagon-like peptide-1 (GLP-1) receptor agonists and to benchmark their performance against state-ofthe-art large language models (LLMs). Show less
2025 · Nucleic acids research · Oxford University Press · added 2026-04-21
LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and Show more
LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and paragraph levels. It provides unified access to 38 million PubMed abstracts and 6.6 Show less
2025 · Frontiers in pharmacology · Frontiers · added 2026-04-21
Background/ObjectivesNew computational methods, based on statistical, machine learning, and deep learning techniques using drug-related entities (e.g., genes, protein bindings, etc.), help reduce the Show more
Background/ObjectivesNew computational methods, based on statistical, machine learning, and deep learning techniques using drug-related entities (e.g., genes, protein bindings, etc.), help reduce the costs of in-vitro experiments through drug-drug interaction prediction (DDIp). This review examines recent advances in DDIp. It presents an in-depth review of the state-of-the-art studies relating to semi-supervised, supervised, self-supervised learning, and other techniques such as graph-based learning and matrix factorization methods for predicting DDIs. All possible interactions between drugs are not known, and accurately predicting interactions is even more difficult due to the complex nature of drug-drug interactions (DDI).MethodsOf the 49 papers published in Web of Science in the last 6 years, 24 papers were considered relevant based on information presented in their titles and abstracts. The included articles focus specifically on predicting DDIs using a type of machine learning algorithm. Excluded articles focused on drug discovery, drug repurposing, molecular representation, or the extraction of biomedical interactions. The methodology, results limitations, and future research directions were studied for each paper. Common challenges, limitations, and future research directions were analyzed.Results and conclusionThe main limitations are class imbalance, poor performance on new drugs, limited explainability, and the need for additional data sources. Show less
2025 · Cui et al. BioData Mining · BioMed Central · added 2026-04-21
Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. Show more
Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. With the inclusion of deep learning in PPI research, the field is undergoing transformative changes. Therefore, there is an urgent need for a comprehensive review and assessment of recent developments to improve analytical methods and open up a wider range of biomedical applications. This review meticulously assesses deep learning progress in PPI prediction from 2021 Show less
The use of multiple medications increases the risk of harmful drug-drug interactions (DDIs). Conventional DDI screening databases vary in coverage and often trigger low-relevance alerts, contributing Show more
The use of multiple medications increases the risk of harmful drug-drug interactions (DDIs). Conventional DDI screening databases vary in coverage and often trigger low-relevance alerts, contributing to alert fatigue. Large language models (LLMs) have emerged as potential tools for DDI identification, however, their performance compared to established databases using real-world patient data remains under-explored. Show less
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive Show more
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive research conducted in this field, it remains an emerging area,
particularly with the advent of machine learning, deep learning, and large
language models (LLMs). This systematic review examines the
integration of machine learning and deep learning techniques in drug
discovery, concentrating on three critical areas: drug−drug interactions
(DDIs), drug-target interactions (DTIs), and adverse drug reactions
(ADRs). The review analyzes over 100 papers published between 2020
and 2025, categorizing the methods into deep learning, machine learning,
graph learning, and hybrid models. It highlights the transformative impact
of natural language processing (NLP) and LLMs in extracting meaningful
insights from biomedical literature and chemical data. Furthermore, this work introduces key databases and data sets widely utilized
in drug discovery. Additionally, this review identifies gaps in the existing research, such as the lack of comprehensive studies that
simultaneously address DDI, DTI, and ADR extraction, and it proposes a more holistic approach to fill these gaps. The paper
concludes by thoroughly evaluating various models, underscoring their performance metrics. Show less
2025 · International journal of molecular sciences · MDPI · added 2026-04-21
Academic Editor: Sabrina Venditti Received: 22 May 2025 Revised: 5 July 2025 N6-methyladenosine (m6A) represents the most common and thoroughly investigated RNA modification and exerts essential funct Show more
Academic Editor: Sabrina Venditti Received: 22 May 2025 Revised: 5 July 2025 N6-methyladenosine (m6A) represents the most common and thoroughly investigated RNA modification and exerts essential functions in regulating gene expression through influencing the RNA stability, the translation efficiency, alternative splicing, and nuclear export processes. The rapid development of high-throughput sequencing approaches, including miCLIP and MeRIP-seq, has profoundly transformed epitranscriptomics research. Show less
2025 · Li et al. BMC Genomics · BioMed Central · added 2026-04-21
Predicting protein‒protein interactions (PPIs) plays a crucial role in understanding biological processes. Although biological experimental methods can identify PPIs, they are costly, time-consuming, Show more
Predicting protein‒protein interactions (PPIs) plays a crucial role in understanding biological processes. Although biological experimental methods can identify PPIs, they are costly, time-consuming, labor-intensive, and often lack stability. In contrast, computational approaches for PPI prediction, particularly deep learning methods, can efficiently learn representations from protein sequences. However, the generalizability, robustness, and stability of computational PPI prediction models still need improvement, especially for species with limited verified PPI Show less
2025 · Nucleic acids research · Oxford University Press · added 2026-04-21
One of the major challenges in precision oncology is the identification of pathogenic, actionable variants and the selection of personalized treatments. We present Onkopus, a variant interpretation fr Show more
One of the major challenges in precision oncology is the identification of pathogenic, actionable variants and the selection of personalized treatments. We present Onkopus, a variant interpretation framework based on a modular architecture, for interpreting and prioritizing genetic alterations in cancer patients. A multitude of tools and databases are integrated into Onkopus to provide a comprehensive overview about the consequences of a variant, each with its own semantic, including pathogenicity predictions, allele frequency, biochemical and protein features, Show less
Despite the vast number of enzymatic kinetic measurements reported across decades of biochemical literature, the majority of relational enzyme kinetic data—linking amino acid sequence, substrate ident Show more
Despite the vast number of enzymatic kinetic measurements reported across decades of biochemical literature, the majority of relational enzyme kinetic data—linking amino acid sequence, substrate identity, kinetic parameters, and assay conditions—remains uncollected and inaccessible in structured form. This constitutes a significant portion of the “dark matter” of enzymology. Unlocking these hidden data through automated extraction offers an opportunity to expand enzyme dataset diversity and size, critical Show less
2025 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has be come cost-effective, interpreting these variants—particular Show more
Motivation: Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has be come cost-effective, interpreting these variants—particularly collecting literature-based evidence like ACMG/AMP PM3—remains complex and time-consuming. Results: We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs). AutoPM3 combines a Text2SQL-based variant extractor and a retrieval-augmented generation (RAG) module, enhanced by a variantspecific retriever and fine-tuned LLM, to separately process tables and text. We curated PM3-Bench, a dataset of 1027 variant-publication Show less
2025 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a Show more
Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein question answering systems. To tackle these issues, we propose the Prot2Chat framework. Results: We modified ProteinMPNN to encode protein sequence and structural information in a unified way. We used a large language model Show less
PandaOmics is a cloud-based software platform that applies artificial intelligence and bioinformatics techniques to multimodal omics and biomedical text data for therapeutic target and biomarker disco Show more
PandaOmics is a cloud-based software platform that applies artificial intelligence and bioinformatics techniques to multimodal omics and biomedical text data for therapeutic target and biomarker discovery. PandaOmics generates novel and repurposed therapeutic target and biomarker hypotheses with the desired properties and is available through licensing or collaboration. Targets and biomarkers generated by the platform were previously validated in both in vitro and in vivo studies. PandaOmics is a core component of Insilico Medicine's Pharma.ai drug discovery suite, which also includes Chemistry42 for the de novo generation of novel small molecules, and inClinico─a data-driven multimodal platform that forecasts a clinical trial's probability of successful transition from phase 2 to phase 3. In this paper, we demonstrate how the PandaOmics platform can efficiently identify novel molecular targets and biomarkers for various diseases. Show less
2024 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Drug–target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associate Show more
Motivation: Drug–target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could Show less
Drug-drug interactions (DDIs) can produce unpredictable pharmacological effects and lead to adverse
events that have the potential to cause irreversible damage to the organism. Traditional methods to
Show more
Drug-drug interactions (DDIs) can produce unpredictable pharmacological effects and lead to adverse
events that have the potential to cause irreversible damage to the organism. Traditional methods to
detect DDIs through biological or pharmacological analysis are time-consuming and expensive, therefore,
there is an urgent need to develop computational methods to effectively predict drug-drug interactions.
Currently, deep learning and knowledge graph techniques which can effectively extract features of entities have been widely utilized to develop DDI prediction methods. In this research, we aim to systematically review DDI prediction researches applying deep learning and graph knowledge. The available
biomedical data and public databases related to drugs are firstly summarized in this review. Then, we
discuss the existing drug-drug interactions prediction methods which have utilized deep learning and
knowledge graph techniques and group them into three main classes: deep learning-based methods,
knowledge graph-based methods, and methods that combine deep learning with knowledge graph.
We comprehensively analyze the commonly used drug related data and various DDI prediction methods,
and compare these prediction methods on benchmark datasets. Finally, we briefly discuss the challenges
related to drug-drug interactions prediction, including asymmetric DDIs prediction and high-order DDI
prediction. Show less
2024 · Current Drug Targets · Bentham Science · added 2026-04-21
Background: Drug discovery is a complex and expensive procedure involving several
timely and costly phases through which new potential pharmaceutical compounds must pass to get
approved. One of these Show more
Background: Drug discovery is a complex and expensive procedure involving several
timely and costly phases through which new potential pharmaceutical compounds must pass to get
approved. One of these critical steps is the identification and optimization of lead compounds,
which has been made more accessible by the introduction of computational methods, including
deep learning (DL) techniques. Diverse DL model architectures have been put forward to learn the
vast landscape of interaction between proteins and ligands and predict their affinity, helping in the
identification of lead compounds.
ARTICLE HISTORY
Objective: This survey fills a gap in previous research by comprehensively analyzing the most
commonly used datasets and discussing their quality and limitations. It also offers a comprehensive classification of the most recent DL methods in the context of protein-ligand binding affinity
prediction (BAP), providing a fresh perspective on this evolving field.
Received: June 07, 2024
Revised: August 11, 2024
Accepted: August 19, 2024
Methods: We thoroughly examine commonly used datasets for BAP and their inherent characteristics. Our exploration extends to various preprocessing steps and DL techniques, including graph
neural networks, convolutional neural networks, and transformers, which are found in the literaDOI:
10.2174/0113894501330963240905083020 ture. We conducted extensive literature research to ensure that the most recent deep learning approaches for BAP were included by the time of writing this manuscript.
Results: The systematic approach used for the present study highlighted inherent challenges to
BAP via DL, such as data quality, model interpretability, and explainability, and proposed considerations for future research directions. We present valuable insights to accelerate the development
of more effective and reliable DL models for BAP within the research community.
Conclusion: The present study can considerably enhance future research on predicting affinity between protein and ligand molecules, hence further improving the overall drug development process. Show less
Pretrained using over 33 million single-cell RNA-sequencing profiles, scGPT is a foundation model facilitating a broad spectrum of downstream single-cell analysis tasks by transfer learning.
The versatility of cellular response arises from the communication, or crosstalk, of signaling pathways in a complex network of signaling and transcriptional regulatory interactions. Understanding the Show more
The versatility of cellular response arises from the communication, or crosstalk, of signaling pathways in a complex network of signaling and transcriptional regulatory interactions. Understanding the various mechanisms underlying crosstalk on a global scale requires untargeted computational approaches. We present a network-based statistical approach, MuXTalk, that uses high-dimensional edges called multilinks to model the unique ways in which signaling and regulatory interactions can interface. We demonstrate that the signaling-regulatory interface is located primarily in the intermediary region between signaling pathways where crosstalk occurs, and that multilinks can differentiate between distinct signaling-transcriptional mechanisms. Using statistically over-represented multilinks as proxies of crosstalk, we infer crosstalk among 60 signaling pathways, expanding currently available crosstalk databases by more than five-fold. MuXTalk surpasses existing methods in terms of model performance metrics, identifies additions to manual curation efforts, and pinpoints potential mediators of crosstalk. Moreover, it accommodates the inherent context-dependence of crosstalk, allowing future applications to cell type- and disease-specific crosstalk. Show less
2024 · Scientific Data · Nature · added 2026-04-21
11,571 — — NER 2008 SCAI33 1,206 — — NER 2012 ADE39 300 case reports 5,063 drugs — 6,821 drug adverse effects 279 drug dosage RE 2013 DDI43 1,025, including texts from DrugBank and 18,502 drugs — 5,02 Show more
11,571 — — NER 2008 SCAI33 1,206 — — NER 2012 ADE39 300 case reports 5,063 drugs — 6,821 drug adverse effects 279 drug dosage RE 2013 DDI43 1,025, including texts from DrugBank and 18,502 drugs — 5,028 drug-drug interactions RE 2015 CHEMDNER34 84,355 chemicals — — NER 2016 BC5CDR 1,500 articles 15,935 chemicals 12,850 diseases 4,409 MeSH chemically induced diseases NER, NEN, RE 2017 N-ary drug-gene-mutation 35 — — — 137,469 drug–gene 3,192 drug–mutation RE 2017 40 ChemProt 32,514 chemicals 30,922 genes Show less
2024 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally ch Show more
Motivation: Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally characterized protein activities and activities deposited in databases. This activity de position is bottlenecked by the time-consuming biocuration process. The emergence of large language models presents an opportunity to speed up the text-mining of protein activities for biocuration. Results: We developed FuncFetch—a workflow that integrates NCBI E-Utilities, OpenAI’s GPT-4, and Zotero—to screen thousands of manu Show less
2024 · Nucleic acids research · Oxford University Press · added 2026-04-21
In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platf Show more
In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platform for clinicians and researchers in the field of rare disease genetics. Supporting data was aggregated from public data sources. We compared REEV with seven other tools for clinical variant evaluation. REEV (semi-)automatically fills individual ACMG criteria facilitating variant interpretation. REEV can store disease and phenotype data related to a case to use these for phenotype Show less
PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like p Show more
PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery. Show less
Proteins and their assemblies are fundamental for living cells to function. Their complex three-dimensional architecture and its stability are attributed to the combined effect of various noncovalent Show more
Proteins and their assemblies are fundamental for living cells to function. Their complex three-dimensional architecture and its stability are attributed to the combined effect of various noncovalent interactions. It is critical to scrutinize these noncovalent interactions to understand their role in the energy landscape in folding, catalysis, and molecular recognition. This Review presents a comprehensive summary of unconventional noncovalent interactions, beyond conventional hydrogen bonds and hydrophobic interactions, which have gained prominence over the past decade. The noncovalent interactions discussed include low-barrier hydrogen bonds, C5 hydrogen bonds, C-H···π interactions, sulfur-mediated hydrogen bonds, n → π* interactions, London dispersion interactions, halogen bonds, chalcogen bonds, and tetrel bonds. This Review focuses on their chemical nature, interaction strength, and geometrical parameters obtained from X-ray crystallography, spectroscopy, bioinformatics, and computational chemistry. Also highlighted are their occurrence in proteins or their complexes and recent advances made toward understanding their role in biomolecular structure and function. Probing the chemical diversity of these interactions, we determined that the variable frequency of occurrence in proteins and the ability to synergize with one another are important not only for ab initio structure prediction but also to design proteins with new functionalities. A better understanding of these interactions will promote their utilization in designing and engineering ligands with potential therapeutic value. Show less
Screening new drug-target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable Show more
Screening new drug-target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable researchers to develop computational-based-DTI models, which play a pivotal role in drug repurposing and discovery. However, there still needs to develop a multimodal fusion DTI model that integrates available heterogeneous data into a unified framework. Show less
2023 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic Show more
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable researchers to develop computational-based-DTI models, which play a pivotal role in drug repurposing and discovery. However, there still needs to develop a multimodal fusion DTI model that integrates available heterogeneous data into a unified framework. Results: We developed MDTips, a multimodal-data-based DTI prediction system, by fusing the knowledge graphs, gene expression profiles, and Show less