skip to main content

Title: Electron microscopy holdings of the Protein Data Bank: the impact of the resolution revolution, new validation tools, and implications for the future

As a discipline, structural biology has been transformed by the three-dimensional electron microscopy (3DEM) “Resolution Revolution” made possible by convergence of robust cryo-preservation of vitrified biological materials, sample handling systems, and measurement stages operating a liquid nitrogen temperature, improvements in electron optics that preserve phase information at the atomic level, direct electron detectors (DEDs), high-speed computing with graphics processing units, and rapid advances in data acquisition and processing software. 3DEM structure information (atomic coordinates and related metadata) are archived in the open-access Protein Data Bank (PDB), which currently holds more than 11,000 3DEM structures of proteins and nucleic acids, and their complexes with one another and small-molecule ligands (~ 6% of the archive). Underlying experimental data (3DEM density maps and related metadata) are stored in the Electron Microscopy Data Bank (EMDB), which currently holds more than 21,000 3DEM density maps. After describing the history of the PDB and the Worldwide Protein Data Bank (wwPDB) partnership, which jointly manages both the PDB and EMDB archives, this review examines the origins of the resolution revolution and analyzes its impact on structural biology viewed through the lens of PDB holdings. Six areas of focus exemplifying the impact of 3DEM across the biosciences are discussed in detail (icosahedral viruses, ribosomes, integral membrane proteins, SARS-CoV-2 spike proteins, cryogenic electron tomography, and integrative structure determination combining 3DEM with complementary biophysical measurement techniques), followed by a review of 3DEM structure validation by the wwPDB that underscores the importance of community engagement.

more » « less
Award ID(s):
1756248 2112966
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Biophysical Reviews
Page Range / eLocation ID:
p. 1281-1301
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Analyses of publicly available structural data reveal interesting insights into the impact of the three‐dimensional (3D) structures of protein targets important for discovery of new drugs (e.g., G‐protein‐coupled receptors, voltage‐gated ion channels, ligand‐gated ion channels, transporters, and E3 ubiquitin ligases). The Protein Data Bank (PDB) archive currently holds > 155,000 atomic‐level 3D structures of biomolecules experimentally determined using crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy. The PDB was established in 1971 as the first open‐access, digital‐data resource in biology, and is now managed by the Worldwide PDB partnership (wwPDB; US PDB operations are the responsibility of the Research Collaboratory for Structural Bioinformatics PDB (RCSB PDB). The RCSB PDB serves millions ofRCSB.orgusers worldwide by delivering PDB data integrated with ∼40 external biodata resources, providing rich structural views of fundamental biology, biomedicine, and energy sciences. Recently published work showed that the PDB archival holdings facilitated discovery of ∼90% of the 210 new drugs approved by the US Food and Drug Administration 2010–2016. We review user‐driven development of RCSB PDB services, examine growth of the PDB archive in terms of size and complexity, and present examples and opportunities for structure‐guided drug discovery for challenging targets (e.g., integral membrane proteins).

    more » « less
  2. Abstract

    The Protein Data Bank (PDB) is one of two archival resources for experimental data central to biomedical research and education worldwide (the other key Primary Data Archive in biology being the International Nucleotide Sequence Database Collaboration). The PDB currently houses >134,000 atomic level biomolecular structures determined by crystallography, NMR spectroscopy, and 3D electron microscopy. It was established in 1971 as the first open‐access, digital‐data resource in biology, and is managed by the Worldwide Protein Data Bank partnership (wwPDB; US PDB operations are conducted by the RCSB Protein Data Bank (RCSB PDB;; Rutgers University and UC San Diego) and funded by NSF, NIH, and DoE. The RCSB PDB serves as the global Archive Keeper for the wwPDB. During calendar 2016, >591 million structure data files were downloaded from the PDB byData Consumersworking in every sovereign nation recognized by the United Nations. During this same period, the RCSB PDB processed >5300 new atomic level biomolecular structures plus experimental data and metadata coming into the archive fromData Depositorsworking in the Americas and Oceania. In addition, RCSB PDB served >1 million users worldwide with PDB data integrated with ∼40 external data resources providing rich structural views of fundamental biology, biomedicine, and energy sciences, and >600,000 educational website users around the globe. RCSB PDB resources are described in detail together with metrics documenting the impact of access to PDB data on basic and applied research, clinical medicine, education, and the economy.

    more » « less
  3. Abstract

    The Electron Microscopy Data Bank (EMDB) is the global public archive of three-dimensional electron microscopy (3DEM) maps of biological specimens derived from transmission electron microscopy experiments. As of 2021, EMDB is managed by the Worldwide Protein Data Bank consortium (wwPDB; as a wwPDB Core Archive, and the EMDB team is a core member of the consortium. Today, EMDB houses over 30 000 entries with maps containing macromolecules, complexes, viruses, organelles and cells. Herein, we provide an overview of the rapidly growing EMDB archive, including its current holdings, recent updates, and future plans.

    more » « less
  4. Abstract

    The Protein Data Bank (PDB) archive is a rich source of information in the form of atomic‐level three‐dimensional (3D) structures of biomolecules experimentally determined using macromolecular crystallography, nuclear magnetic resonance (NMR) spectroscopy, and electron microscopy (3DEM). Originally established in 1971 as a resource for protein crystallographers to freely exchange data, today PDB data drive research and education across scientific disciplines. In 2011, the online portal PDB‐101 was launched to support teachers, students, and the general public in PDB archive exploration ( Maintained by the Research Collaboratory for Structural Bioinformatics PDB, PDB‐101 aims to help train the next generation of PDB users and to promote the overall importance of structural biology and protein science to nonexperts. Regularly published features include the highly popularMolecule of the Monthseries, 3D model activities, molecular animation videos, and educational curricula. Materials are organized into various categories (Health and Disease, Molecules of Life, Biotech and Nanotech, and Structures and Structure Determination) and searchable by keyword. A biennial health focus frames new resource creation and provides topics for annual video challenges for high school students. Web analytics document that PDB‐101 materials relating to fundamental topics (e.g., hemoglobin, catalase) are highly accessed year‐on‐year. In addition, PDB‐101 materials created in response to topical health matters (e.g., Zika, measles, coronavirus) are well received. PDB‐101 shows how learning about the diverse shapes and functions of PDB structures promotes understanding of all aspects of biology, from the central dogma of biology to health and disease to biological energy.

    more » « less
  5. Abstract

    An increasing number of protein structures are determined by cryo‐electron microscopy (cryo‐EM) and stored in the Electron Microscopy Data Bank (EMDB). To interpret determined cryo‐EM maps, several methods have been developed that model the tertiary structure of biomolecules, particularly proteins. Here we show how to use two such methods, VESPER and MAINMAST, which were developed in our group. VESPER is a method mainly for two purposes: fitting protein structure models into an EM map and aligning two EM maps locally or globally to capture their similarity. VESPER represents each EM map as a set of vectors pointing toward denser points. By considering matching the directions of vectors, in general, VESPER aligns maps better than conventional methods that only consider local densities of maps. MAINMAST is ade novoprotein modeling tool designed for EM maps with resolution of 3–5 Å or better. MAINMAST builds a protein main chain directly from a density map by tracing dense points in an EM map and connecting them using a tree‐graph structure. This article describes how to use these two tools using three illustrative modeling examples. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC.

    Basic Protocol 1: Protein structure model fitting using VESPER

    Alternate Protocol: Atomic model fitting using VESPER web server

    Basic Protocol 2: Proteinde novomodeling using MAINMAST

    more » « less