NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A review on knowledge graphs for healthcare: Resources, applications, and promises

https://doi.org/10.1016/j.jbi.2025.104861

Cui, Hejie; Lu, Jiaying; Xu, Ran; Wang, Shiyu; Ma, Wenjing; Yu, Yue; Yu, Shaojun; Kan, Xuan; Ling, Chen; Zhao, Liang; et al (September 2025, Journal of Biomedical Informatics)

Free, publicly-accessible full text available September 1, 2026
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations

https://doi.org/10.1609/aaai.v39i1.32013

Jiang, Pengcheng; Xiao, Cao; Fu, Tianfan; Bhatia, Parminder; Kass-Hout, Taha; Sun, Jimeng; Han, Jiawei (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Molecular representation learning is vital for various downstream applications, including the analysis and prediction of molecular properties and side effects. While Graph Neural Networks (GNNs) have been a popular framework for modeling molecular data, they often struggle to capture the full complexity of molecular representations. In this paper, we introduce a novel method called Gode, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. Gode integrates individual molecular graph representations with multi-domain biochemical data from knowledge graphs. By pre-training two GNNs on different graph structures and employing contrastive learning, Gode effectively fuses molecular structures with their corresponding knowledge graph substructures. This fusion yields a more robust and informative representation, enhancing molecular property predictions by leveraging both chemical and biological information. When fine-tuned across 11 chemical property tasks, our model significantly outperforms existing benchmarks, achieving an average ROC-AUC improvement of 12.7% for classification tasks and an average RMSE/MAE improvement of 34.4% for regression tasks. Notably, Gode surpasses the current leading model in property prediction, with advancements of 2.2% in classification and 7.2% in regression tasks.
more » « less
Free, publicly-accessible full text available April 11, 2026
Graph Adversarial Diffusion Convolution

Liu, Songtao; Chen, Jinghui; Fu, Tianfan; Lin, Lu; Zitnik, Marinka; Wu, Dinghao (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML))

This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporateing an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
more » « less
Full Text Available
Graph Adversarial Diffusion Convolution

Liu, Songtao; Chen, Jinghui; Fu, Tianfan; Lin, Lu; Zitnik, Marinka; Wu, Dinghao (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML))

This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
more » « less
Full Text Available
Artificial intelligence foundation for therapeutic science

https://doi.org/10.1038/s41589-022-01131-2

Huang, Kexin; Fu, Tianfan; Gao, Wenhao; Zhao, Yue; Roohani, Yusuf; Leskovec, Jure; Coley, Connor W.; Xiao, Cao; Sun, Jimeng; Zitnik, Marinka (October 2022, Nature Chemical Biology)

Full Text Available
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

https://doi.org/10.1561/2200000115

Zhang, Xuan; Wang, Limei; Helwig, Jacob; Luo, Youzhi; Fu, Cong; Xie, Yaochen; Liu, Meng; Lin, Yuchao; Xu, Zhao; Yan, Keqiang; et al (January 2025, Foundations and Trends® in Machine Learning)

Full Text Available
DeepPurpose: a deep learning library for drug–target interaction prediction

https://doi.org/10.1093/bioinformatics/btaa1005

Huang, Kexin; Fu, Tianfan; Glass, Lucas M; Zitnik, Marinka; Xiao, Cao; Sun, Jimeng (December 2020, Bioinformatics)
Wren, Jonathan (Ed.)
Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development

Huang, Kexin; Fu, Tianfan; Gao, Wenhao; Zhao, Yue; Roohani, Yusuf; Leskovec, Jure; Coley, Connor; Xiao, Cao; Sun, Jimeng; Zitnik, Marinka (January 2021, Advances in neural information processing systems)
null (Ed.)
Therapeutics machine learning is an emerging field with incredible opportunities for innovatiaon and impact. However, advancement in this field requires formulation of meaningful learning tasks and careful curation of datasets. Here, we introduce Therapeutics Data Commons (TDC), the first unifying platform to systematically access and evaluate machine learning across the entire range of therapeutics. To date, TDC includes 66 AI-ready datasets spread across 22 learning tasks and spanning the discovery and development of safe and effective medicines. TDC also provides an ecosystem of tools and community resources, including 33 data functions and types of meaningful data splits, 23 strategies for systematic model evaluation, 17 molecule generation oracles, and 29 public leaderboards. All resources are integrated and accessible via an open Python library. We carry out extensive experiments on selected datasets, demonstrating that even the strongest algorithms fall short of solving key therapeutics challenges, including real dataset distributional shifts, multi-scale modeling of heterogeneous data, and robust generalization to novel data points. We envision that TDC can facilitate algorithmic and scientific advances and considerably accelerate machine-learning model development, validation and transition into biomedical and clinical implementation. TDC is an open-science initiative available at this https://tdcommons.ai.
more » « less
Full Text Available

Search for: All records