2025 · International journal of molecular sciences · MDPI · added 2026-04-21
Academic Editor: Sabrina Venditti Received: 22 May 2025 Revised: 5 July 2025 N6-methyladenosine (m6A) represents the most common and thoroughly investigated RNA modification and exerts essential funct Show more
Academic Editor: Sabrina Venditti Received: 22 May 2025 Revised: 5 July 2025 N6-methyladenosine (m6A) represents the most common and thoroughly investigated RNA modification and exerts essential functions in regulating gene expression through influencing the RNA stability, the translation efficiency, alternative splicing, and nuclear export processes. The rapid development of high-throughput sequencing approaches, including miCLIP and MeRIP-seq, has profoundly transformed epitranscriptomics research. Show less
2025 · Li et al. BMC Genomics · BioMed Central · added 2026-04-21
Predicting protein‒protein interactions (PPIs) plays a crucial role in understanding biological processes. Although biological experimental methods can identify PPIs, they are costly, time-consuming, Show more
Predicting protein‒protein interactions (PPIs) plays a crucial role in understanding biological processes. Although biological experimental methods can identify PPIs, they are costly, time-consuming, labor-intensive, and often lack stability. In contrast, computational approaches for PPI prediction, particularly deep learning methods, can efficiently learn representations from protein sequences. However, the generalizability, robustness, and stability of computational PPI prediction models still need improvement, especially for species with limited verified PPI Show less
2025 · Cui et al. BioData Mining · BioMed Central · added 2026-04-21
Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. Show more
Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. With the inclusion of deep learning in PPI research, the field is undergoing transformative changes. Therefore, there is an urgent need for a comprehensive review and assessment of recent developments to improve analytical methods and open up a wider range of biomedical applications. This review meticulously assesses deep learning progress in PPI prediction from 2021 Show less
BACKGROUND: Drug repositioning is a pivotal strategy in pharmaceutical research, offering accelerated and cost-effective therapeutic discovery. However, biomedical information relevant to drug reposit Show more
BACKGROUND: Drug repositioning is a pivotal strategy in pharmaceutical research, offering accelerated and cost-effective therapeutic discovery. However, biomedical information relevant to drug repositioning is often complex, dispersed, and underutilized due to limitations in traditional extraction methods, such as reliance on annotated data and poor generalizability. Large language models (LLMs) show promise but face challenges such as hallucinations and interpretability issues.
OBJECTIVE: This study proposed long chain-of-thought for drug repositioning knowledge extraction (LCoDR-KE), a lightweight and domain-specific framework to enhance LLMs' accuracy and adaptability in extracting structured biomedical knowledge for drug repositioning.
METHODS: A domain-specific schema defined 11 entities (eg, drug, disease) and 18 relationships (eg, treats, is biomarker of). Following the established schema architecture, we constructed automatic annotation based on 10,000 PubMed abstracts via chain-of-thought prompt engineering. A total of 1000 expert-validated abstracts were curated into a drug repositioning corpus, a high-quality specialized corpus, while the remaining entries were allocated for model training purposes. Then, the proposed LCoDR-KE framework combined supervised fine-tuning of the Qwen2.5-7B-Instruct model with reinforcement learning and dual-reward mechanisms. Performance was evaluated against state-of-the-art models (eg, conditional random fields, Bidirectional Encoder Representations From Transformers, BioBERT, Qwen2.5, DeepSeek-R1, OpenBioLLM-70B, and model variants) using precision, recall, and F1-score. In addition, the convergence of the training method was assessed by analyzing performance progression across iteration steps.
RESULTS: LCoDR-KE achieved an entity F1 of 81.46% (eg, drug 95.83%, disease 90.52%) and triplet F1 of 69.04%, outperforming traditional models and rivaling larger LLMs (DeepSeek-R1: entity F1=84.64%, triplet F1=69.02%). Ablation studies confirmed the contributions of supervised fine-tuning (8.61% and 20.70% F1 drop if removed) and reinforcement learning (6.09% and 14.09% F1 drop if removed). The training process demonstrated stable convergence, validated through iterative performance monitoring. Qualitative analysis of the model's chain-of-thought outputs showed that LCoDR-KE performed structured and schema-aware reasoning by validating entity types, rejecting incompatible relations, enforcing constraints, and generating compliant JSON. Error analysis revealed 4 main types of mistakes and challenges for further improvement.
CONCLUSIONS: LCoDR-KE enhances LLMs' domain-specific adaptability for drug repositioning by offering an open-source drug repositioning corpus and a long chain-of-thought framework based on a lightweight LLM model. This framework supports drug discovery and knowledge reasoning while providing scalable, interpretable solutions applicable to broader biomedical knowledge extraction tasks. Show less
Despite the vast number of enzymatic kinetic measurements reported across decades of biochemical literature, the majority of relational enzyme kinetic data—linking amino acid sequence, substrate ident Show more
Despite the vast number of enzymatic kinetic measurements reported across decades of biochemical literature, the majority of relational enzyme kinetic data—linking amino acid sequence, substrate identity, kinetic parameters, and assay conditions—remains uncollected and inaccessible in structured form. This constitutes a significant portion of the “dark matter” of enzymology. Unlocking these hidden data through automated extraction offers an opportunity to expand enzyme dataset diversity and size, critical Show less
2025 · npj Drug Discovery · Nature · added 2026-04-21
Structure-based drug design is rapidly evolving, driven by advances in both physics-based and knowledge-based methods. These computational approaches are increasingly integrated across all stages of d Show more
Structure-based drug design is rapidly evolving, driven by advances in both physics-based and knowledge-based methods. These computational approaches are increasingly integrated across all stages of drug discovery. Despite remarkable progress, challenges remain in achieving accuracy, generalizability, computational efficiency, and chemical synthesizability. In this review, we provide a critical overview of advances, strengths, and limitations of recent methods. We also discuss synergies between the two concepts that hold promises for future advancements towards their practical applicability. Show less
2025 · Therapeutic advances in drug safety · SAGE Publications · added 2026-04-21
Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural Show more
Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions. Objective: This study aimed to evaluate whether transformer-based models fine-tuned on a general ADR dataset can effectively classify ADRs from tweets related to glucagon-like peptide-1 (GLP-1) receptor agonists and to benchmark their performance against state-ofthe-art large language models (LLMs). Show less
2025 · Frontiers in pharmacology · Frontiers · added 2026-04-21
Background/ObjectivesNew computational methods, based on statistical, machine learning, and deep learning techniques using drug-related entities (e.g., genes, protein bindings, etc.), help reduce the Show more
Background/ObjectivesNew computational methods, based on statistical, machine learning, and deep learning techniques using drug-related entities (e.g., genes, protein bindings, etc.), help reduce the costs of in-vitro experiments through drug-drug interaction prediction (DDIp). This review examines recent advances in DDIp. It presents an in-depth review of the state-of-the-art studies relating to semi-supervised, supervised, self-supervised learning, and other techniques such as graph-based learning and matrix factorization methods for predicting DDIs. All possible interactions between drugs are not known, and accurately predicting interactions is even more difficult due to the complex nature of drug-drug interactions (DDI).MethodsOf the 49 papers published in Web of Science in the last 6 years, 24 papers were considered relevant based on information presented in their titles and abstracts. The included articles focus specifically on predicting DDIs using a type of machine learning algorithm. Excluded articles focused on drug discovery, drug repurposing, molecular representation, or the extraction of biomedical interactions. The methodology, results limitations, and future research directions were studied for each paper. Common challenges, limitations, and future research directions were analyzed.Results and conclusionThe main limitations are class imbalance, poor performance on new drugs, limited explainability, and the need for additional data sources. Show less
In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the c Show more
In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the challenge of accurately detecting Serious Adverse Events (SAEs) in pharmacovigilance, a critical component in ensuring drug safety during and after clinical trials. The key problem lies in the underreporting and delayed detection of Adverse Drug Reactions (ADRs) due to the heterogeneous nature of medical data, class imbalance, and the limited scope of traditional monitoring techniques. This study proposes a hybrid AI-driven framework that integrates structured (e.g., patient demographics, lab results) and unstructured data (e.g., clinical notes) to detect ADRs using advanced deep learning and NLP methods. The objective is to outperform traditional signal detection methods and provide interpretable predictions to aid clinicians in real-time. By leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques, including Random Forests, Gradient Boosting Machines, and Convolutional Neural Networks (CNNs), our model aims to identify potential ADRs across different patient subgroups. Through meticulous feature engineering and the application of techniques to address data imbalance, our model demonstrates improved accuracy and interpretability in predicting ADRs. The CNN model achieved an accuracy of 85 %, outperforming traditional models, such as Logistic Regression (78 %) and Support Vector Machines (80 %). These findings suggest that specific demographic and clinical factors significantly influence the likelihood of adverse reactions, offering valuable insights for targeted monitoring and risk mitigation strategies[11]. This research underscores the potential of predictive modeling to enhance pharmacovigilance efforts and ensure safer clinical trial outcomes.•The research methodology includes a comparison of supervised learning algorithms, such as Logistic Regression, Random Forest, Gradient Boost, CNN, and genetic algorithms, to identify patterns and anomalies in clinical trial data. BERT and GPT, were also employed to provide the functionality of textual interactions over medical data.•Performance metrics such as accuracy, precision, recall, and F1-score were systematically applied to evaluate each model's performance. Among the models tested, the CNN model with BERT achieved the highest accuracy, providing valuable insights into the potential of deep learning for enhancing pharmacovigilance practices.•These findings suggest that an inclusion of diverse clinical data when supplied to advanced ML and NLP techniques can significantly improve the detection of ADRs, leading to better alignment with the fundamental principles of Good Clinical Practice (GCP). Show less
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive Show more
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive research conducted in this field, it remains an emerging area,
particularly with the advent of machine learning, deep learning, and large
language models (LLMs). This systematic review examines the
integration of machine learning and deep learning techniques in drug
discovery, concentrating on three critical areas: drug−drug interactions
(DDIs), drug-target interactions (DTIs), and adverse drug reactions
(ADRs). The review analyzes over 100 papers published between 2020
and 2025, categorizing the methods into deep learning, machine learning,
graph learning, and hybrid models. It highlights the transformative impact
of natural language processing (NLP) and LLMs in extracting meaningful
insights from biomedical literature and chemical data. Furthermore, this work introduces key databases and data sets widely utilized
in drug discovery. Additionally, this review identifies gaps in the existing research, such as the lack of comprehensive studies that
simultaneously address DDI, DTI, and ADR extraction, and it proposes a more holistic approach to fill these gaps. The paper
concludes by thoroughly evaluating various models, underscoring their performance metrics. Show less
2024 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Drug–target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associate Show more
Motivation: Drug–target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering the ability to use large networks and exploit available datasets and, the generalization to unseen datasets of DTI prediction methods remains unexplored, which could Show less
2023 · Bioinformatics · Oxford University Press · added 2026-04-21
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic Show more
Motivation: Screening new drug–target interactions (DTIs) by traditional experimental methods is costly and time-consuming. Recent advances in knowledge graphs, chemical linear notations, and genomic data enable researchers to develop computational-based-DTI models, which play a pivotal role in drug repurposing and discovery. However, there still needs to develop a multimodal fusion DTI model that integrates available heterogeneous data into a unified framework. Results: We developed MDTips, a multimodal-data-based DTI prediction system, by fusing the knowledge graphs, gene expression profiles, and Show less
2022 · RSC Chemical Biology · Royal Society of Chemistry · added 2026-04-21
This review summarises different data, data resources and methods for computational mechanism of action (MoA) analysis, and highlights some case studies where integration of data types and methods ena Show more
This review summarises different data, data resources and methods for computational mechanism of action (MoA) analysis, and highlights some case studies where integration of data types and methods enabled MoA elucidation on the systems-level. Show less
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in com Show more
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years. Show less
With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating h Show more
With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles. Show less
Background: Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules- Show more
Background: Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules-based software integrated in electronic health records (EHRs) has demonstrated proficiency in identifying DDIs present in medication regimens, large language model (LLM) based identification requires thorough benchmarking and performance evaluation using high-quality datasets for safe use. The purpose of this study was to develop a series of Show less