NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generalized Protein Pocket Generation with Prior-Informed Flow Matching

Zhang, Zaixi; Zitnik, Marinka; Qi, Liu (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
Designing ligand-binding proteins, such as enzymes and biosensors, is essential in bioengineering and protein biology. One critical step in this process involves designing protein pockets, the protein interface binding with the ligand. Current approaches to pocket generation often suffer from time-intensive physical computations or template-based methods, as well as compromised generation quality due to the overlooking of domain knowledge. To tackle these challenges, we propose PocketFlow, a generative model that incorporates protein-ligand interaction priors based on flow matching. During training, PocketFlow learns to model key types of protein-ligand interactions, such as hydrogen bonds. In the sampling, PocketFlow leverages multi-granularity guidance (overall binding affinity and interaction geometry constraints) to facilitate generating high-affinity and valid pockets. Extensive experiments show that PocketFlow outperforms baselines on multiple benchmarks, e.g., achieving an average improvement of 1.29 in Vina Score and 0.05 in scRMSD. Moreover, modeling interactions make PocketFlow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA.
more » « less
Full Text Available
Efficient generation of protein pockets with PocketGen

https://doi.org/10.1038/s42256-024-00920-9

Zhang, Zaixi; Shen, Wan Xiang; Liu, Qi; Zitnik, Marinka (November 2024, Nature Machine Intelligence)

Abstract Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such proteins is challenging due to the complexity of protein–ligand interactions, the flexibility of ligand molecules and amino acid side chains, and sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence and atomic structure of the protein regions in which ligand interactions occur. PocketGen promotes consistency between protein sequence and structure by using a graph transformer for structural encoding and a sequence refinement module based on a protein language model. The graph transformer captures interactions at multiple scales, including atom, residue and ligand levels. For sequence refinement, PocketGen integrates a structural adapter into the protein language model, ensuring that structure-based predictions align with sequence-based predictions. PocketGen can generate high-fidelity protein pockets with enhanced binding affinity and structural validity. It operates ten times faster than physics-based methods and achieves a 97% success rate, defined as the percentage of generated pockets with higher binding affinity than reference pockets. Additionally, it attains an amino acid recovery rate exceeding 63%.
more » « less
Full Text Available
Evaluating generalizability of artificial intelligence models for molecular datasets

https://doi.org/10.1038/s42256-024-00931-6

Ektefaie, Yasha; Shen, Andrew; Bykova, Daria; Marin, Maximillian G; Zitnik, Marinka; Farhat, Maha (December 2024, Nature Machine Intelligence)

Full Text Available
Contextual AI models for context-specific prediction in biology

https://doi.org/10.1038/s41592-024-02342-2

Li, Michelle; Zitnik, Marinka (August 2024, Nature Methods)

Full Text Available
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Yan, Keqiang; Li, Xiner; Ling, Hongyi; Ashen, Kenna; Edwards, Carl; Arróyave, Raymundo; Zitnik, Marinka; Ji, Heng; Qian, Xiaofeng; Qian, Xiaoning; et al (December 2024, ArXiv)

We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.
more » « less
Full Text Available
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Yan, Keqiang; Li, Xiner; Ling, Hongyi; Ashen, Kenna; Edwards, Carl; Arróyave, Raymundo; Zitnik, Marinka; Ji, Heng; Qian, Xiaofeng; Qian, Xiaoning; et al (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)
Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
We consider the problem of crystal materials generation using language models (LMs). A key step is to convert 3D crystal structures into 1D sequences to be processed by LMs. Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. Mat2Seq converts 3D crystal structures into 1D sequences and ensures that different mathematical descriptions of the same crystal are represented in a single unique sequence, thereby provably achieving SE(3) and periodic invariance. Experimental results show that, with language models, Mat2Seq achieves promising performance in crystal structure generation as compared with prior methods.
more » « less
Full Text Available
A foundation model for clinician-centered drug repurposing

https://doi.org/10.1038/s41591-024-03233-x

Huang, Kexin; Chandak, Payal; Wang, Qianwen; Havaldar, Shreyas; Vaid, Akhil; Leskovec, Jure; Nadkarni, Girish N; Glicksberg, Benjamin S; Gehlenborg, Nils; Zitnik, Marinka (December 2024, Nature Medicine)

Full Text Available
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Yan, Keqiang; Li, Xiner; Ling, Hongyi; Ashen, Kenna; Edwards, Carl; Arróyave, Raymundo; Zitnik, Marinka; Ji, Heng; Qian, Xiaofeng; Qian, Xiaoning (November 2024, NeurIPS Foundation/OpenReview)

Full Text Available
Graph Adversarial Diffusion Convolution

Liu, Songtao; Chen, Jinghui; Fu, Tianfan; Lin, Lu; Zitnik, Marinka; Wu, Dinghao (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML))

This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporateing an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
more » « less
Full Text Available
Graph Adversarial Diffusion Convolution

Liu, Songtao; Chen, Jinghui; Fu, Tianfan; Lin, Lu; Zitnik, Marinka; Wu, Dinghao (July 2024, Proceedings of the 41st International Conference on Machine Learning (ICML))

This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.
more » « less
Full Text Available

« Prev Next »

Search for: All records