skip to main content


Title: Diffusion models in protein structure and docking
Abstract

Generative AI is rapidly transforming the frontier of research in computational structural biology. Indeed, recent successes have substantially advanced protein design and drug discovery. One of the key methodologies underlying these advances is diffusion models (DM). Diffusion models originated in computer vision, rapidly taking over image generation and offering superior quality and performance. These models were subsequently extended and modified for uses in other areas including computational structural biology. DMs are well equipped to model high dimensional, geometric data while exploiting key strengths of deep learning. In structural biology, for example, they have achieved state‐of‐the‐art results on protein 3D structure generation and small molecule docking. This review covers the basics of diffusion models, associated modeling choices regarding molecular representations, generation capabilities, prevailing heuristics, as well as key limitations and forthcoming refinements. We also provide best practices around evaluation procedures to help establish rigorous benchmarking and evaluation. The review is intended to provide a fresh view into the state‐of‐the‐art as well as highlight its potentials and current challenges of recent generative techniques in computational structural biology.

This article is categorized under:

Data Science > Artificial Intelligence/Machine Learning

Structure and Mechanism > Molecular Structures

Software > Molecular Modeling

 
more » « less
NSF-PAR ID:
10499204
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Molecular Science
Volume:
14
Issue:
2
ISSN:
1759-0876
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Cryo‐electron microscopy (cryo‐EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by the 2017 Nobel Prize. Although cryo‐EM has been drastically improved to generate high‐resolution three‐dimensional maps that contain detailed structural information about macromolecules, the computational methods for using the data to automatically build structure models are lagging far behind. The traditional cryo‐EM model building approach is template‐based homology modeling. Manual de novo modeling is very time‐consuming when no template model is found in the database. In recent years, de novo cryo‐EM modeling using machine learning (ML) and deep learning (DL) has ranked among the top‐performing methods in macromolecular structure modeling. DL‐based de novo cryo‐EM modeling is an important application of artificial intelligence, with impressive results and great potential for the next generation of molecular biomedicine. Accordingly, we systematically review the representative ML/DL‐based de novo cryo‐EM modeling methods. Their significances are discussed from both practical and methodological viewpoints. We also briefly describe the background of cryo‐EM data processing workflow. Overall, this review provides an introductory guide to modern research on artificial intelligence for de novo molecular structure modeling and future directions in this emerging field.

    This article is categorized under:

    Structure and Mechanism > Molecular Structures

    Structure and Mechanism > Computational Biochemistry and Biophysics

    Data Science > Artificial Intelligence/Machine Learning

     
    more » « less
  2. Abstract

    Brownian dynamics (BD) is a computational method to simulate molecular diffusion processes. Although the BD method has been developed over several decades and is well established, new methodological developments are improving its accuracy, widening its scope, and increasing its application. In biological applications, BD is used to investigate the diffusive behavior of molecules subject to forces due to intermolecular interactions or interactions with material surfaces. BD can be used to compute rate constants for diffusional association, generate structures of encounter complexes for molecular binding partners, and examine the transport properties of geometrically complex molecules. Often, a series of simulations is performed, for example, for different protein mutants or environmental conditions, so that the effects of the changes on diffusional properties can be estimated. While biomolecules are commonly described at atomic resolution and internal molecular motions are typically neglected, coarse‐graining and the treatment of conformational flexibility are increasingly employed. Software packages for BD simulations of biomolecules are growing in capabilities, with several new packages providing novel features that expand the range of questions that can be addressed. These advances, when used in concert with experiment or other simulation methods, such as molecular dynamics, open new opportunities for application to biochemical and biological systems. Here, we review some of the latest developments in the theory, methods, software, and applications of BD simulations to study biomolecular diffusional association processes and provide a perspective on their future use and application to outstanding challenges in biology, bioengineering, and biomedicine.

    This article is categorized under:

    Structure and Mechanism > Computational Biochemistry and Biophysics

    Molecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo Methods

    Software > Simulation Methods

     
    more » « less
  3. Abstract Motivation

    Modeling the structural plasticity of protein molecules remains challenging. Most research has focused on obtaining one biologically active structure. This includes the recent AlphaFold2 that has been hailed as a breakthrough for protein modeling. Computing one structure does not suffice to understand how proteins modulate their interactions and even evade our immune system. Revealing the structure space available to a protein remains challenging. Data-driven approaches that learn to generate tertiary structures are increasingly garnering attention. These approaches exploit the ability to represent tertiary structures as contact or distance maps and make direct analogies with images to harness convolution-based generative adversarial frameworks from computer vision. Since such opportunistic analogies do not allow capturing highly structured data, current deep models struggle to generate physically realistic tertiary structures.

    Results

    We present novel deep generative models that build upon the graph variational autoencoder framework. In contrast to existing literature, we represent tertiary structures as ‘contact’ graphs, which allow us to leverage graph-generative deep learning. Our models are able to capture rich, local and distal constraints and additionally compute disentangled latent representations that reveal the impact of individual latent factors. This elucidates what the factors control and makes our models more interpretable. Rigorous comparative evaluation along various metrics shows that the models, we propose advance the state-of-the-art. While there is still much ground to cover, the work presented here is an important first step, and graph-generative frameworks promise to get us to our goal of unraveling the exquisite structural complexity of protein molecules.

    Availability and implementation

    Code is available at https://github.com/anonymous1025/CO-VAE.

    Supplementary information

    Supplementary data are available at Bioinformatics Advances online.

     
    more » « less
  4. Abstract

    The Institute for Foundations of Machine Learning (IFML) focuses on core foundational tools to power the next generation of machine learning models. Its research underpins the algorithms and data sets that make generative artificial intelligence (AI) more accurate and reliable. Headquartered at The University of Texas at Austin, IFML researchers collaborate across an ecosystem that spans University of Washington, Stanford, UCLA, Microsoft Research, the Santa Fe Institute, and Wichita State University. Over the past year, we have witnessed incredible breakthroughs in AI on topics that are at the heart of IFML's agenda, such as foundation models, LLMs, fine‐tuning, and diffusion with game‐changing applications influencing almost every area of science and technology. In this article, we seek to highlight seek to highlight the application of foundational machine learning research on key use‐inspired topics:

    Fairness in Imaging with Deep Learning: designing the correct metrics and algorithms to make deep networks less biased.

    Deep proteins: using foundational machine learning techniques to advance protein engineering and launch a biomanufacturing revolution.

    Sounds and Space for Audio‐Visual Learning: building agents capable of audio‐visual navigation in complex 3D environments via new data augmentations.

    Improving Speed and Robustness of Magnetic Resonance Imaging: using deep learning algorithms to develop fast and robust MRI methods for clinical diagnostic imaging.

    IFML is also responding to explosive industry demand for an AI‐capable workforce. We have launched an accessible, affordable, and scalable new degree program—the MSAI—that looks to wholly reshape the AI/ML workforce pipeline.

     
    more » « less
  5. Abstract

    The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability.

    This article is categorized under:

    Data Science > Artificial Intelligence/Machine Learning

    Molecular and Statistical Mechanics > Molecular Interactions

    Software > Molecular Modeling

     
    more » « less