A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research

Zhang, Yuan; Sui, Xin; Pan, Feng; Yu, Kaixian; Li, Keqiao; Tian, Shubo; Erdengasileng, Arslan; Han, Qing; Wang, Wanjing; Wang, Jianan; Wang, Jian; Sun, Donghu; Chung, Henry; Zhou, Jun; Zhou, Eric; Lee, Ben; Zhang, Peili; Qiu, Xing; Zhao, Tingting; Zhang, Jinfeng

doi:10.1038/s42256-025-01014-w

Citation Details

A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research

To address the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have become a critical tool for integrating large volumes of heterogeneous data to enable efficient information retrieval and automated knowledge discovery. However, transforming unstructured scientific literature into KGs remains a significant challenge, with previous methods unable to achieve human-level accuracy. Here we used an information extraction pipeline that won first place in the LitCoin Natural Language Processing Challenge (2022) to construct a large-scale KG named iKraph using all PubMed abstracts. The extracted information matches human expert annotations and significantly exceeds the content of manually curated public databases. To enhance the KG’s comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. This KG facilitates rigorous performance evaluation of automated knowledge discovery, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and applied it to real-time COVID-19 drug repurposing from March 2020 to May 2023. Our method identified around 1,200 candidate drugs in the first 4 months, with one-third of those discovered in the first 2 months later supported by clinical trials or PubMed publications. These outcomes are very challenging to attain through alternative approaches that lack a thorough understanding of the existing literature. A cloud-based platform (https://biokde.insilicom.com) was developed for academic users to access this rich structured data and associated tools. more »

Award ID(s):: 2335357

PAR ID:: 10589820

Author(s) / Creator(s):: Zhang, Yuan; Sui, Xin; Pan, Feng; Yu, Kaixian; Li, Keqiao; Tian, Shubo; Erdengasileng, Arslan; Han, Qing; Wang, Wanjing; Wang, Jianan; Wang, Jian; Sun, Donghu; Chung, Henry; Zhou, Jun; Zhou, Eric; Lee, Ben; Zhang, Peili; Qiu, Xing; Zhao, Tingting; Zhang, Jinfeng

Corporate Creator(s):: Insilicom_LLC

Publisher / Repository:: Springer Nature

Date Published:: 2025-04-01

Journal Name:: Nature Machine Intelligence

Volume:: 7

Issue:: 4

ISSN:: 2522-5839

Page Range / eLocation ID:: 602 to 614

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1038/s42256-025-01014-w

More Like this