skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving Protein–Ligand Interaction Modeling with cryo-EM Data, Templates, and Deep Learning in 2021 Ligand Model Challenge
Elucidating protein–ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein–ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein–ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein–ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein–ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein–ligand interactions on cryo-EM data using prior structural information.  more » « less
Award ID(s):
1759934 1763246
PAR ID:
10469166
Author(s) / Creator(s):
;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Biomolecules
Volume:
13
Issue:
1
ISSN:
2218-273X
Page Range / eLocation ID:
132
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Cryo‐electron microscopy (cryo‐EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by the 2017 Nobel Prize. Although cryo‐EM has been drastically improved to generate high‐resolution three‐dimensional maps that contain detailed structural information about macromolecules, the computational methods for using the data to automatically build structure models are lagging far behind. The traditional cryo‐EM model building approach is template‐based homology modeling. Manual de novo modeling is very time‐consuming when no template model is found in the database. In recent years, de novo cryo‐EM modeling using machine learning (ML) and deep learning (DL) has ranked among the top‐performing methods in macromolecular structure modeling. DL‐based de novo cryo‐EM modeling is an important application of artificial intelligence, with impressive results and great potential for the next generation of molecular biomedicine. Accordingly, we systematically review the representative ML/DL‐based de novo cryo‐EM modeling methods. Their significances are discussed from both practical and methodological viewpoints. We also briefly describe the background of cryo‐EM data processing workflow. Overall, this review provides an introductory guide to modern research on artificial intelligence for de novo molecular structure modeling and future directions in this emerging field. This article is categorized under:Structure and Mechanism > Molecular StructuresStructure and Mechanism > Computational Biochemistry and BiophysicsData Science > Artificial Intelligence/Machine Learning 
    more » « less
  2. CryoDRGN is a machine learning system for heterogenous cryo-EM reconstruction of proteins and protein complexes from single particle cryo-EM data. Central to this approach is a deep generative model for heterogeneous cryo-EM density maps, which we empirically find effectively models both discrete and continuous forms of structural variability. Once trained, cryoDRGN is capable of generating an arbitrary number of 3D density maps, and thus interpreting the resulting ensemble is a challenge. Here, we showcase interactive and automated processing approaches for analyzing cryoDRGN results. Specifically, we detail a step-by-step protocol for analysis of the assembling 50S ribosome dataset (Davis et al., EMPIAR-10076), including preparation of inputs, network training, and visualization of the resulting ensemble of density maps. Additionally, we describe and implement methods to comprehensively analyze and interpret the distribution of volumes with the assistance of an associated atomic model. This protocol is appropriate for structural biologists familiar with processing single particle cryo-EM datasets and with moderate experience navigating Python and Jupyter notebooks. It requires 3-4 days to complete. 
    more » « less
  3. Using cryo-electron microscopy (cryo-EM), we determined the structure of the Escherichia coli 70S ribosome with a global resolution of 2.0 Å. The maps reveal unambiguous positioning of protein and RNA residues, their detailed chemical interactions, and chemical modifications. Notable features include the first examples of isopeptide and thioamide backbone substitutions in ribosomal proteins, the former likely conserved in all domains of life. The maps also reveal extensive solvation of the small (30S) ribosomal subunit, and interactions with A-site and P-site tRNAs, mRNA, and the antibiotic paromomycin. The maps and models of the bacterial ribosome presented here now allow a deeper phylogenetic analysis of ribosomal components including structural conservation to the level of solvation. The high quality of the maps should enable future structural analysis of the chemical basis for translation and aid the development of robust tools for cryo-EM structure modeling and refinement. 
    more » « less
  4. Abstract New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein–ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein–ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein–ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein–ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein–ligand interactions. 
    more » « less
  5. null (Ed.)
    Information about macromolecular structure of protein complexes and related cellular and molecular mechanisms can assist the search for vaccines and drug development processes. To obtain such structural information, we present DeepTracer, a fully automated deep learning-based method for fast de novo multichain protein complex structure determination from high-resolution cryoelectron microscopy (cryo-EM) maps. We applied DeepTracer on a previously published set of 476 raw experimental cryo-EM maps and compared the results with a current state of the art method. The residue coverage increased by over 30% using DeepTracer, and the rmsd value improved from 1.29 Å to 1.18 Å. Additionally, we applied DeepTracer on a set of 62 coronavirus-related cryo-EM maps, among them 10 with no deposited structure available in EMDataResource. We observed an average residue match of 84% with the deposited structures and an average rmsd of 0.93 Å. Additional tests with related methods further exemplify DeepTracer’s competitive accuracy and efficiency of structure modeling. DeepTracer allows for exceptionally fast computations, making it possible to trace around 60,000 residues in 350 chains within only 2 h. The web service is globally accessible at https://deeptracer.uw.edu . 
    more » « less