📚 Article Archive

AI-driven pharmacovigilance: Enhancing adverse drug reaction detection with deep learning and NLP

2025 · MethodsX 15 · Elsevier · added 2026-04-21

In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the c Show more

In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the challenge of accurately detecting Serious Adverse Events (SAEs) in pharmacovigilance, a critical component in ensuring drug safety during and after clinical trials. The key problem lies in the underreporting and delayed detection of Adverse Drug Reactions (ADRs) due to the heterogeneous nature of medical data, class imbalance, and the limited scope of traditional monitoring techniques. This study proposes a hybrid AI-driven framework that integrates structured (e.g., patient demographics, lab results) and unstructured data (e.g., clinical notes) to detect ADRs using advanced deep learning and NLP methods. The objective is to outperform traditional signal detection methods and provide interpretable predictions to aid clinicians in real-time. By leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques, including Random Forests, Gradient Boosting Machines, and Convolutional Neural Networks (CNNs), our model aims to identify potential ADRs across different patient subgroups. Through meticulous feature engineering and the application of techniques to address data imbalance, our model demonstrates improved accuracy and interpretability in predicting ADRs. The CNN model achieved an accuracy of 85 %, outperforming traditional models, such as Logistic Regression (78 %) and Support Vector Machines (80 %). These findings suggest that specific demographic and clinical factors significantly influence the likelihood of adverse reactions, offering valuable insights for targeted monitoring and risk mitigation strategies[11]. This research underscores the potential of predictive modeling to enhance pharmacovigilance efforts and ensure safer clinical trial outcomes.•The research methodology includes a comparison of supervised learning algorithms, such as Logistic Regression, Random Forest, Gradient Boost, CNN, and genetic algorithms, to identify patterns and anomalies in clinical trial data. BERT and GPT, were also employed to provide the functionality of textual interactions over medical data.•Performance metrics such as accuracy, precision, recall, and F1-score were systematically applied to evaluate each model's performance. Among the models tested, the CNN model with BERT achieved the highest accuracy, providing valuable insights into the potential of deep learning for enhancing pharmacovigilance practices.•These findings suggest that an inclusion of diverse clinical data when supplied to advanced ML and NLP techniques can significantly improve the detection of ADRs, leading to better alignment with the fundamental principles of Good Clinical Practice (GCP). Show less

📄 PDF DOI: 10.1016/j.mex.2025.103460 📎 SI

adverse drug reaction detection adverse drug reactions artificial intelligence bert clinical notes clinical trial data clinical trial data processing convolutional neural networks

LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery

2025 · Nucleic acids research · Oxford University Press · added 2026-04-21

LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and Show more

LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and paragraph levels. It provides unified access to 38 million PubMed abstracts and 6.6 Show less

📄 PDF DOI: 10.1093/nar/gkaf417

bioinformatics biomedical information retrieval biomedical literature search information retrieval natural language processing semantic retrieval text encoder

Transformer-based models for ADR detection: cross-drug validation and benchmarking against large language models.

Kim M, Kim KE, Kwon JH +3 more · 2025 · Therapeutic advances in drug safety · SAGE Publications · added 2026-04-20

Kim M, Kim KE, Kwon JH, Han JY, Kim JH, Kim MG Show less

Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language pro Show more

Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions. Show less

📄 PDF DOI: 10.1177/20420986251405082 📎 SI

adr detection adverse drug reactions benchmarking cross-validation natural language processing transfer learning

Transformer-based models for ADR detection: cross-drug validation and benchmarking against large language models

2025 · Therapeutic advances in drug safety · SAGE Publications · added 2026-04-21

Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural Show more

Background: Adverse drug reactions (ADRs) are harmful side effects of medications. Social media provides real-time, patient-generated data, though its unstructured format presents challenges. Natural language processing and transfer learning offer promising solutions. Objective: This study aimed to evaluate whether transformer-based models fine-tuned on a general ADR dataset can effectively classify ADRs from tweets related to glucagon-like peptide-1 (GLP-1) receptor agonists and to benchmark their performance against state-ofthe-art large language models (LLMs). Show less

📄 PDF DOI: 10.1177/20420986251405082

adr detection adverse drug reactions artificial intelligence bert bioinformatics fine-tuning glp-1 receptor agonists gpt-2

A Systematic Review of Drug-Related Interactions�Utilizing Deep

2025 · ACS Omega · ACS Publications · added 2026-04-21

Computational drug discovery is essential for screening potential treatments and reducing the costs and time associated with proposing or combining drugs for disease management. Despite the extensive Show more

Computational drug discovery is essential for screening potential treatments and reducing the costs and time associated with proposing or combining drugs for disease management. Despite the extensive research conducted in this field, it remains an emerging area, particularly with the advent of machine learning, deep learning, and large language models (LLMs). This systematic review examines the integration of machine learning and deep learning techniques in drug discovery, concentrating on three critical areas: drug−drug interactions (DDIs), drug-target interactions (DTIs), and adverse drug reactions (ADRs). The review analyzes over 100 papers published between 2020 and 2025, categorizing the methods into deep learning, machine learning, graph learning, and hybrid models. It highlights the transformative impact of natural language processing (NLP) and LLMs in extracting meaningful insights from biomedical literature and chemical data. Furthermore, this work introduces key databases and data sets widely utilized in drug discovery. Additionally, this review identifies gaps in the existing research, such as the lack of comprehensive studies that simultaneously address DDI, DTI, and ADR extraction, and it proposes a more holistic approach to fill these gaps. The paper concludes by thoroughly evaluating various models, underscoring their performance metrics. Show less

📄 PDF DOI: 10.1021/acsomega.5c04997

bioinformatic techniques bioinformatics biological target biological testing clinical trials computational drug discovery computational modeling deep learning

Large Language Model-Enhanced Drug Repositioning Knowledge Extraction via Long Chain-of-Thought: Development and Evaluation Study.

Hongyu Kang, Jiao Li, Li Hou +3 more · 2025 · JMIR medical informatics · added 2026-04-20

Hongyu Kang, Jiao Li, Li Hou, Xiaowei Xu, Si Zheng, Qin Li Show less

BACKGROUND: Drug repositioning is a pivotal strategy in pharmaceutical research, offering accelerated and cost-effective therapeutic discovery. However, biomedical information relevant to drug reposit Show more

BACKGROUND: Drug repositioning is a pivotal strategy in pharmaceutical research, offering accelerated and cost-effective therapeutic discovery. However, biomedical information relevant to drug repositioning is often complex, dispersed, and underutilized due to limitations in traditional extraction methods, such as reliance on annotated data and poor generalizability. Large language models (LLMs) show promise but face challenges such as hallucinations and interpretability issues. OBJECTIVE: This study proposed long chain-of-thought for drug repositioning knowledge extraction (LCoDR-KE), a lightweight and domain-specific framework to enhance LLMs' accuracy and adaptability in extracting structured biomedical knowledge for drug repositioning. METHODS: A domain-specific schema defined 11 entities (eg, drug, disease) and 18 relationships (eg, treats, is biomarker of). Following the established schema architecture, we constructed automatic annotation based on 10,000 PubMed abstracts via chain-of-thought prompt engineering. A total of 1000 expert-validated abstracts were curated into a drug repositioning corpus, a high-quality specialized corpus, while the remaining entries were allocated for model training purposes. Then, the proposed LCoDR-KE framework combined supervised fine-tuning of the Qwen2.5-7B-Instruct model with reinforcement learning and dual-reward mechanisms. Performance was evaluated against state-of-the-art models (eg, conditional random fields, Bidirectional Encoder Representations From Transformers, BioBERT, Qwen2.5, DeepSeek-R1, OpenBioLLM-70B, and model variants) using precision, recall, and F1-score. In addition, the convergence of the training method was assessed by analyzing performance progression across iteration steps. RESULTS: LCoDR-KE achieved an entity F1 of 81.46% (eg, drug 95.83%, disease 90.52%) and triplet F1 of 69.04%, outperforming traditional models and rivaling larger LLMs (DeepSeek-R1: entity F1=84.64%, triplet F1=69.02%). Ablation studies confirmed the contributions of supervised fine-tuning (8.61% and 20.70% F1 drop if removed) and reinforcement learning (6.09% and 14.09% F1 drop if removed). The training process demonstrated stable convergence, validated through iterative performance monitoring. Qualitative analysis of the model's chain-of-thought outputs showed that LCoDR-KE performed structured and schema-aware reasoning by validating entity types, rejecting incompatible relations, enforcing constraints, and generating compliant JSON. Error analysis revealed 4 main types of mistakes and challenges for further improvement. CONCLUSIONS: LCoDR-KE enhances LLMs' domain-specific adaptability for drug repositioning by offering an open-source drug repositioning corpus and a long chain-of-thought framework based on a lightweight LLM model. This framework supports drug discovery and knowledge reasoning while providing scalable, interpretable solutions applicable to broader biomedical knowledge extraction tasks. Show less

no PDF DOI: 10.2196/77837 📎 SI

biomedical informatics chain-of-thought prompt engineering drug repositioning knowledge extraction large language model machine learning medicinal chemistry natural language processing

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

Wei CH, Allot A, Lai PT +7 more · 2024 · Nucleic acids research · Oxford University Press · added 2026-04-20

Wei CH, Allot A, Lai PT, Leaman R, Tian S, Luo L, Jin Q, Wang Z, Chen Q, Lu Z Show less

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like p Show more

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery. Show less

📄 PDF DOI: 10.1093/nar/gkae235 📎 SI

api artificial intelligence bioinformatics biomedical literature chemicals data mining entity recognition genetic variants

EnzChemRED, a rich enzyme chemistry relation extraction dataset

2024 · Scientific Data · Nature · added 2026-04-21

11,571 — — NER 2008 SCAI33 1,206 — — NER 2012 ADE39 300 case reports 5,063 drugs — 6,821 drug adverse effects 279 drug dosage RE 2013 DDI43 1,025, including texts from DrugBank and 18,502 drugs — 5,02 Show more

11,571 — — NER 2008 SCAI33 1,206 — — NER 2012 ADE39 300 case reports 5,063 drugs — 6,821 drug adverse effects 279 drug dosage RE 2013 DDI43 1,025, including texts from DrugBank and 18,502 drugs — 5,028 drug-drug interactions RE 2015 CHEMDNER34 84,355 chemicals — — NER 2016 BC5CDR 1,500 articles 15,935 chemicals 12,850 diseases 4,409 MeSH chemically induced diseases NER, NEN, RE 2017 N-ary drug-gene-mutation 35 — — — 137,469 drug–gene 3,192 drug–mutation RE 2017 40 ChemProt 32,514 chemicals 30,922 genes Show less

📄 PDF DOI: 10.1038/s41597-024-03835-7

bioinformatics chebi chemical ontology chemical reactions cheminformatics coordination chemistry enzyme curation enzymes

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge

2024 · Nucleic acids research · Oxford University Press · added 2026-04-21

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like p Show more

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0’s online interface and API utilize these precomputed entity relations and synonyms to provide advanced Show less

📄 PDF DOI: 10.1093/nar/gkae235

ai api biomedical literature biomedical research chatgpt chemicals diseases entity annotation

FullMeSH: improving large-scale MeSH indexing with full text.

Dai S, You R, Lu Z +3 more · 2020 · Bioinformatics · Oxford University Press · added 2026-04-20

Dai S, You R, Lu Z, Huang X, Mamitsuka H, Zhu S Show less

With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating h Show more

With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles. Show less

📄 PDF DOI: 10.1093/bioinformatics/btz756 📎 SI

bioinformatics biomedical literature information retrieval machine learning medical subject heading mesh indexing natural language processing text analysis

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.

Shengwen Peng, Hiroshi Mamitsuka, Shanfeng Zhu · 2018 · Methods in molecular biology (Clifton, N.J.) · Springer · added 2026-04-20

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biome Show more

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing "learning to rank" (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh . Show less

no PDF DOI: 10.1007/978-1-4939-8561-6_15

bioinformatics deep learning information retrieval learning to rank natural language processing text analysis text mining

Drug-drug interaction identification using large language models - PMC

· added 2026-04-20

Background: Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules- Show more

Background: Drug-drug interactions (DDIs) are a significant source of morbidity and adverse drug events (ADEs), particularly in situations of polypharmacy and complex medication regimens. While rules-based software integrated in electronic health records (EHRs) has demonstrated proficiency in identifying DDIs present in medication regimens, large language model (LLM) based identification requires thorough benchmarking and performance evaluation using high-quality datasets for safe use. The purpose of this study was to develop a series of Show less

📄 PDF DOI: 10.64898/2025.12.03.25341549; 📎 SI

adverse drug reactions drug interactions drug safety assessment drug-drug interaction identification large language models machine learning medication safety natural language processing

📋 Browse Articles

🔍 Filters