📚 Article Archive

AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature

2025 · Bioinformatics · Oxford University Press · added 2026-04-21

Motivation: Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has be come cost-effective, interpreting these variants—particularly collecting literature-based evidence like ACMG/AMP PM3—remains complex and time-consuming. Results: We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs). AutoPM3 combines a Text2SQL-based variant extractor and a retrieval-augmented generation (RAG) module, enhanced by a variantspecific retriever and fine-tuned LLM, to separately process tables and text. We curated PM3-Bench, a dataset of 1027 variant-publication Show less

📄 PDF DOI: 10.1093/bioinformatics/btaf382

bioinformatics computational biology evidence extraction evidence-based medicine genetic variants genomics large language models literature-based evidence

FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme–substrate interactions from published manuscripts

2024 · Bioinformatics · Oxford University Press · added 2026-04-21

Motivation: Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally characterized protein activities and activities deposited in databases. This activity de position is bottlenecked by the time-consuming biocuration process. The emergence of large language models presents an opportunity to speed up the text-mining of protein activities for biocuration. Results: We developed FuncFetch—a workflow that integrates NCBI E-Utilities, OpenAI’s GPT-4, and Zotero—to screen thousands of manu Show less

📄 PDF DOI: 10.1093/bioinformatics/btae756

biocuration bioinformatics data mining enzyme enzyme activity large language models ncbi e-utilities protein

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

Wei CH, Allot A, Lai PT +7 more · 2024 · Nucleic acids research · Oxford University Press · added 2026-04-20

Wei CH, Allot A, Lai PT, Leaman R, Tian S, Luo L, Jin Q, Wang Z, Chen Q, Lu Z Show less

PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text articles from the PMC open access subset, updated weekly. PubTator 3.0's online interface and API utilize these precomputed entity relations and synonyms to provide advanced search capabilities and enable large-scale analyses, streamlining many complex information needs. We showcase the retrieval quality of PubTator 3.0 using a series of entity pair queries, demonstrating that PubTator 3.0 retrieves a greater number of articles than either PubMed or Google Scholar, with higher precision in the top 20 results. We further show that integrating ChatGPT (GPT-4) with PubTator APIs dramatically improves the factuality and verifiability of its responses. In summary, PubTator 3.0 offers a comprehensive set of features and tools that allow researchers to navigate the ever-expanding wealth of biomedical literature, expediting research and unlocking valuable insights for scientific discovery. Show less

📄 PDF DOI: 10.1093/nar/gkae235 📎 SI

api artificial intelligence bioinformatics biomedical literature chemicals data mining entity recognition genetic variants

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.

Shengwen Peng, Hiroshi Mamitsuka, Shanfeng Zhu · 2018 · Methods in molecular biology (Clifton, N.J.) · Springer · added 2026-04-20

The US National Library of Medicine (NLM) uses the Medical Subject Headings (MeSH) (see Note 1 ) to index almost all 24 million citations in MEDLINE, which greatly facilitates the application of biomedical information retrieval and text mining. Large-scale automatic MeSH indexing has two challenging aspects: the MeSH side and citation side. For the MeSH side, each citation is annotated by only 12 (on average) out of all 28,000 MeSH terms. For the citation side, all existing methods, including Medical Text Indexer (MTI) by NLM, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. To solve these two challenges, we developed the MeSHLabeler and DeepMeSH. By utilizing "learning to rank" (LTR) framework, MeSHLabeler integrates multiple types of information to solve the challenge in the MeSH side, while DeepMeSH integrates deep semantic representation to solve the challenge in the citation side. MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3, and DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenges. DeepMeSH is available at http://datamining-iip.fudan.edu.cn/deepmesh . Show less

no PDF DOI: 10.1007/978-1-4939-8561-6_15

bioinformatics deep learning information retrieval learning to rank natural language processing text analysis text mining

📋 Browse Articles

🔍 Filters

AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature

FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme–substrate interactions from published manuscripts

PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge.

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.