skip to main content


Title: Diffusion models in protein structure and docking
Abstract

Generative AI is rapidly transforming the frontier of research in computational structural biology. Indeed, recent successes have substantially advanced protein design and drug discovery. One of the key methodologies underlying these advances is diffusion models (DM). Diffusion models originated in computer vision, rapidly taking over image generation and offering superior quality and performance. These models were subsequently extended and modified for uses in other areas including computational structural biology. DMs are well equipped to model high dimensional, geometric data while exploiting key strengths of deep learning. In structural biology, for example, they have achieved state‐of‐the‐art results on protein 3D structure generation and small molecule docking. This review covers the basics of diffusion models, associated modeling choices regarding molecular representations, generation capabilities, prevailing heuristics, as well as key limitations and forthcoming refinements. We also provide best practices around evaluation procedures to help establish rigorous benchmarking and evaluation. The review is intended to provide a fresh view into the state‐of‐the‐art as well as highlight its potentials and current challenges of recent generative techniques in computational structural biology.

This article is categorized under:

Data Science > Artificial Intelligence/Machine Learning

Structure and Mechanism > Molecular Structures

Software > Molecular Modeling

 
more » « less
PAR ID:
10499204
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Molecular Science
Volume:
14
Issue:
2
ISSN:
1759-0876
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Cryo‐electron microscopy (cryo‐EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by the 2017 Nobel Prize. Although cryo‐EM has been drastically improved to generate high‐resolution three‐dimensional maps that contain detailed structural information about macromolecules, the computational methods for using the data to automatically build structure models are lagging far behind. The traditional cryo‐EM model building approach is template‐based homology modeling. Manual de novo modeling is very time‐consuming when no template model is found in the database. In recent years, de novo cryo‐EM modeling using machine learning (ML) and deep learning (DL) has ranked among the top‐performing methods in macromolecular structure modeling. DL‐based de novo cryo‐EM modeling is an important application of artificial intelligence, with impressive results and great potential for the next generation of molecular biomedicine. Accordingly, we systematically review the representative ML/DL‐based de novo cryo‐EM modeling methods. Their significances are discussed from both practical and methodological viewpoints. We also briefly describe the background of cryo‐EM data processing workflow. Overall, this review provides an introductory guide to modern research on artificial intelligence for de novo molecular structure modeling and future directions in this emerging field.

    This article is categorized under:

    Structure and Mechanism > Molecular Structures

    Structure and Mechanism > Computational Biochemistry and Biophysics

    Data Science > Artificial Intelligence/Machine Learning

     
    more » « less
  2. Designing molecules with specific structural and functional properties (e.g., drug-likeness and water solubility) is central to advancing drug discovery and material science, but it poses outstanding challenges both in wet and dry laboratories. The search space is vast and rugged. Recent advances in deep generative models are motivating new computational approaches building over deep learning to tackle the molecular space. Despite rapid advancements, state-of-the-art deep generative models for molecule generation have many limitations, including lack of interpretability. In this paper we address this limitation by proposing a generic framework for interpretable molecule generation based on novel disentangled deep graph generative models with property control. Specifically, we propose a disentanglement enhancement strategy for graphs. We also propose new deep neural architecture to achieve the above learning objective for inference and generation for variable-size graphs efficiently. Extensive experimental evaluation demonstrates the superiority of our approach in various critical aspects, such as accuracy, novelty, and disentanglement. 
    more » « less
  3. Abstract

    The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.

    This article is categorized under:

    Data: Types and Structure > Traditional Statistical Data

    Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods

    Statistical and Graphical Methods of Data Analysis > Information Theoretic Methods

    Statistical Models > Model Selection

     
    more » « less
  4. Abstract

    The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability.

    This article is categorized under:

    Data Science > Artificial Intelligence/Machine Learning

    Molecular and Statistical Mechanics > Molecular Interactions

    Software > Molecular Modeling

     
    more » « less
  5. Abstract

    Brownian dynamics (BD) is a computational method to simulate molecular diffusion processes. Although the BD method has been developed over several decades and is well established, new methodological developments are improving its accuracy, widening its scope, and increasing its application. In biological applications, BD is used to investigate the diffusive behavior of molecules subject to forces due to intermolecular interactions or interactions with material surfaces. BD can be used to compute rate constants for diffusional association, generate structures of encounter complexes for molecular binding partners, and examine the transport properties of geometrically complex molecules. Often, a series of simulations is performed, for example, for different protein mutants or environmental conditions, so that the effects of the changes on diffusional properties can be estimated. While biomolecules are commonly described at atomic resolution and internal molecular motions are typically neglected, coarse‐graining and the treatment of conformational flexibility are increasingly employed. Software packages for BD simulations of biomolecules are growing in capabilities, with several new packages providing novel features that expand the range of questions that can be addressed. These advances, when used in concert with experiment or other simulation methods, such as molecular dynamics, open new opportunities for application to biochemical and biological systems. Here, we review some of the latest developments in the theory, methods, software, and applications of BD simulations to study biomolecular diffusional association processes and provide a perspective on their future use and application to outstanding challenges in biology, bioengineering, and biomedicine.

    This article is categorized under:

    Structure and Mechanism > Computational Biochemistry and Biophysics

    Molecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo Methods

    Software > Simulation Methods

     
    more » « less