Understanding ligand properties is essential for computational high-throughput screening of transition metal complexes. However, ligand properties such as net charge and other information such as thei Show more
Understanding ligand properties is essential for computational high-throughput screening of transition metal complexes. However, ligand properties such as net charge and other information such as their application area are often absent or inconsistently recorded in crystallographic datasets. Here, we construct a ligand dataset from 126,985 mononuclear transition metal complexes curated from the Cambridge Structural Database. Using an iterative charge-balancing workflow that combines complex charges, metal oxidation states, and consensus across crystallographic observations, we confidently assign net charges to 66,810 ligands among 94,581 identified unique ligand structures to curate the Boston Open-Shell Ligand (BOS-Lig) dataset. The workflow assigns ligand charges in homoleptic complexes first and then iteratively propagates these assignments across heteroleptic environments, allowing charges to be inferred even when direct charge information is unavailable. We analyze cases where simple heuristics such as the octet rule would have failed and introduce a purity metric to identify when our charge assignments may be incorrect. Each ligand is also classified in terms of its metal coordinating atoms and whether there are multiple variants (i.e., hemilability). We then link complexes to their associated journal abstracts and apply a topic-modeling workflow to link 25,146 ligands with functional application areas spanning reactivity, redox chemistry, biological chemistry, and photophysical chemistry. Together, we provide an experimentally grounded dataset of ligand chemical space that connects charge and functional application as a foundation for computational screening and data-driven ligand design. Show less
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive Show more
Computational drug discovery is essential for screening
potential treatments and reducing the costs and time associated with
proposing or combining drugs for disease management. Despite the
extensive research conducted in this field, it remains an emerging area,
particularly with the advent of machine learning, deep learning, and large
language models (LLMs). This systematic review examines the
integration of machine learning and deep learning techniques in drug
discovery, concentrating on three critical areas: drug−drug interactions
(DDIs), drug-target interactions (DTIs), and adverse drug reactions
(ADRs). The review analyzes over 100 papers published between 2020
and 2025, categorizing the methods into deep learning, machine learning,
graph learning, and hybrid models. It highlights the transformative impact
of natural language processing (NLP) and LLMs in extracting meaningful
insights from biomedical literature and chemical data. Furthermore, this work introduces key databases and data sets widely utilized
in drug discovery. Additionally, this review identifies gaps in the existing research, such as the lack of comprehensive studies that
simultaneously address DDI, DTI, and ADR extraction, and it proposes a more holistic approach to fill these gaps. The paper
concludes by thoroughly evaluating various models, underscoring their performance metrics. Show less