skip to main content


Title: Artificial intelligence advances for de novo molecular structure modeling in cryo‐electron microscopy
Abstract

Cryo‐electron microscopy (cryo‐EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by the 2017 Nobel Prize. Although cryo‐EM has been drastically improved to generate high‐resolution three‐dimensional maps that contain detailed structural information about macromolecules, the computational methods for using the data to automatically build structure models are lagging far behind. The traditional cryo‐EM model building approach is template‐based homology modeling. Manual de novo modeling is very time‐consuming when no template model is found in the database. In recent years, de novo cryo‐EM modeling using machine learning (ML) and deep learning (DL) has ranked among the top‐performing methods in macromolecular structure modeling. DL‐based de novo cryo‐EM modeling is an important application of artificial intelligence, with impressive results and great potential for the next generation of molecular biomedicine. Accordingly, we systematically review the representative ML/DL‐based de novo cryo‐EM modeling methods. Their significances are discussed from both practical and methodological viewpoints. We also briefly describe the background of cryo‐EM data processing workflow. Overall, this review provides an introductory guide to modern research on artificial intelligence for de novo molecular structure modeling and future directions in this emerging field.

This article is categorized under:

Structure and Mechanism > Molecular Structures

Structure and Mechanism > Computational Biochemistry and Biophysics

Data Science > Artificial Intelligence/Machine Learning

 
more » « less
Award ID(s):
2030381
NSF-PAR ID:
10363811
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Molecular Science
Volume:
12
Issue:
2
ISSN:
1759-0876
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Generative AI is rapidly transforming the frontier of research in computational structural biology. Indeed, recent successes have substantially advanced protein design and drug discovery. One of the key methodologies underlying these advances is diffusion models (DM). Diffusion models originated in computer vision, rapidly taking over image generation and offering superior quality and performance. These models were subsequently extended and modified for uses in other areas including computational structural biology. DMs are well equipped to model high dimensional, geometric data while exploiting key strengths of deep learning. In structural biology, for example, they have achieved state‐of‐the‐art results on protein 3D structure generation and small molecule docking. This review covers the basics of diffusion models, associated modeling choices regarding molecular representations, generation capabilities, prevailing heuristics, as well as key limitations and forthcoming refinements. We also provide best practices around evaluation procedures to help establish rigorous benchmarking and evaluation. The review is intended to provide a fresh view into the state‐of‐the‐art as well as highlight its potentials and current challenges of recent generative techniques in computational structural biology.

    This article is categorized under:

    Data Science > Artificial Intelligence/Machine Learning

    Structure and Mechanism > Molecular Structures

    Software > Molecular Modeling

     
    more » « less
  2. Abstract

    The potential energy of molecular species and their conformers can be computed with a wide range of computational chemistry methods, from molecular mechanics to ab initio quantum chemistry. However, the proper choice of the computational approach based on computational cost and reliability of calculated energies is a dilemma, especially for large molecules. This dilemma is proved to be even more problematic for studies that require hundreds and thousands of calculations, such as drug discovery. On the other hand, driven by their pattern recognition capabilities, neural networks started to gain popularity in the computational chemistry community. During the last decade, many neural network potentials have been developed to predict a variety of chemical information of different systems. Neural network potentials are proved to predict chemical properties with accuracy comparable to quantum mechanical approaches but with the cost approaching molecular mechanics calculations. As a result, the development of more reliable, transferable, and extensible neural network potentials became an attractive field of study for researchers. In this review, we outlined an overview of the status of current neural network potentials and strategies to improve their accuracy. We provide recent examples of studies that prove the applicability of these potentials. We also discuss the capabilities and shortcomings of the current models and the challenges and future aspects of their development and applications. It is expected that this review would provide guidance for the development of neural network potentials and the exploitation of their applicability.

    This article is categorized under:

    Data Science > Artificial Intelligence/Machine Learning

    Molecular and Statistical Mechanics > Molecular Interactions

    Software > Molecular Modeling

     
    more » « less
  3. null (Ed.)
    Information about macromolecular structure of protein complexes and related cellular and molecular mechanisms can assist the search for vaccines and drug development processes. To obtain such structural information, we present DeepTracer, a fully automated deep learning-based method for fast de novo multichain protein complex structure determination from high-resolution cryoelectron microscopy (cryo-EM) maps. We applied DeepTracer on a previously published set of 476 raw experimental cryo-EM maps and compared the results with a current state of the art method. The residue coverage increased by over 30% using DeepTracer, and the rmsd value improved from 1.29 Å to 1.18 Å. Additionally, we applied DeepTracer on a set of 62 coronavirus-related cryo-EM maps, among them 10 with no deposited structure available in EMDataResource. We observed an average residue match of 84% with the deposited structures and an average rmsd of 0.93 Å. Additional tests with related methods further exemplify DeepTracer’s competitive accuracy and efficiency of structure modeling. DeepTracer allows for exceptionally fast computations, making it possible to trace around 60,000 residues in 350 chains within only 2 h. The web service is globally accessible at https://deeptracer.uw.edu . 
    more » « less
  4. Abstract

    ChemMLis an open machine learning (ML) and informatics program suite that is designed to support and advance the data‐driven research paradigm that is currently emerging in the chemical and materials domain.ChemMLallows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general‐purpose utility, versatility, and user‐friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community.ChemMLis also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data‐driven in silico research.

    This article is categorized under:

    Software > Simulation Methods

    Computer and Information Science > Chemoinformatics

    Structure and Mechanism > Computational Materials Science

    Software > Molecular Modeling

     
    more » « less
  5. Jez, Joseph M. ; Topp, Christopher N. (Ed.)
    Structural biologists rely on X-ray crystallography as the main technique for determining the three-dimensional structures of macromolecules; however, in recent years, new methods that go beyond X-ray-based technologies are broadening the selection of tools to understand molecular structure and function. Simultaneously, national facilities are developing programming tools and maintaining personnel to aid novice structural biologists in de novo structure determination. The combination of X-ray free electron lasers (XFELs) and serial femtosecond crystallography (SFX) now enable time-resolved structure determination that allows for capture of dynamic processes, such as reaction mechanism and conformational flexibility. XFEL and SFX, along with microcrystal electron diffraction (MicroED), help side-step the need for large crystals for structural studies. Moreover, advances in cryogenic electron microscopy (cryo-EM) as a tool for structure determination is revolutionizing how difficult to crystallize macromolecules and/or complexes can be visualized at the atomic scale. This review aims to provide a broad overview of these new methods and to guide readers to more in-depth literature of these methods. 
    more » « less