skip to main content


Title: RCSB Protein Data Bank 1D3D module: displaying positional features on macromolecular assemblies
Abstract Motivation

Mapping positional features from one-dimensional (1D) sequences onto three-dimensional (3D) structures of biological macromolecules is a powerful tool to show geometric patterns of biochemical annotations and provide a better understanding of the mechanisms underpinning protein and nucleic acid function at the atomic level.

Results

We present a new library designed to display fully customizable interactive views between 1D positional features of protein and/or nucleic acid sequences and their 3D structures as isolated chains or components of macromolecular assemblies.

Availability and implementation

https://github.com/rcsb/rcsb-saguaro-3d.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
1832184
PAR ID:
10406888
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
12
ISSN:
1367-4803
Format(s):
Medium: X Size: p. 3304-3305
Size(s):
p. 3304-3305
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Tools for pairwise alignments between 3D structures of proteins are of fundamental importance for structural biology and bioinformatics, enabling visual exploration of evolutionary and functional relationships. However, the absence of a user-friendly, browser-based tool for creating alignments and visualizing them at both 1D sequence and 3D structural levels makes this process unnecessarily cumbersome.

    Results

    We introduce a novel pairwise structure alignment tool (rcsb.org/alignment) that seamlessly integrates into the RCSB Protein Data Bank (RCSB PDB) research-focused RCSB.org web portal. Our tool and its underlying application programming interface (alignment.rcsb.org) empowers users to align several protein chains with a reference structure by providing access to established alignment algorithms (FATCAT, CE, TM-align, or Smith–Waterman 3D). The user-friendly interface simplifies parameter setup and input selection. Within seconds, our tool enables visualization of results in both sequence (1D) and structural (3D) perspectives through the RCSB PDB RCSB.org Sequence Annotations viewer and Mol* 3D viewer, respectively. Users can effortlessly compare structures deposited in the PDB archive alongside more than a million incorporated Computed Structure Models coming from the ModelArchive and AlphaFold DB. Moreover, this tool can be used to align custom structure data by providing a link/URL or uploading atomic coordinate files directly. Importantly, alignment results can be bookmarked and shared with collaborators. By bridging the gap between 1D sequence and 3D structures of proteins, our tool facilitates deeper understanding of complex evolutionary relationships among proteins through comprehensive sequence and structural analyses.

    Availability and implementation

    The alignment tool is part of the RCSB PDB research-focused RCSB.org web portal and available at rcsb.org/alignment. Programmatic access is available via alignment.rcsb.org. Frontend code has been published at github.com/rcsb/rcsb-pecos-app. Visualization is powered by the open-source Mol* viewer (github.com/molstar/molstar and github.com/molstar/rcsb-molstar) plus the Sequence Annotations in 3D Viewer (github.com/rcsb/rcsb-saguaro-3d).

     
    more » « less
  2. Abstract Motivation

    Membrane proteins are encoded by approximately one fifth of human genes but account for more than half of all US FDA approved drug targets. Thanks to new technological advances, the number of membrane proteins archived in the PDB is growing rapidly. However, automatic identification of membrane proteins or inference of membrane location is not a trivial task.

    Results

    We present recent improvements to the RCSB Protein Data Bank web portal (RCSB PDB, rcsb.org) that provide a wealth of new membrane protein annotations integrated from four external resources: OPM, PDBTM, MemProtMD and mpstruc. We have substantially enhanced the presentation of data on membrane proteins. The number of membrane proteins with annotations available on rcsb.org was increased by ∼80%. Users can search for these annotations, explore corresponding tree hierarchies, display membrane segments at the 1D amino acid sequence level, and visualize the predicted location of the membrane layer in 3D.

    Availability and implementation

    Annotations, search, tree data and visualization are available at our rcsb.org web portal. Membrane visualization is supported by the open-source Mol* viewer (molstar.org and github.com/molstar/molstar).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.

     
    more » « less
  4. Abstract Summary

    We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website.

    Availability and Implementation

    BioJava-ModFinder is available as open source (LGPL license) at (https://github.com/biojava/biojava/tree/master/biojava-modfinder). The RCSB PDB can be accessed at http://www.rcsb.org.

     
    more » « less
  5. Abstract Motivation

    Computational methods for compound–protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound–protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound–protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models.

    Results

    To overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins.

    Availability and implementation

    Data and source codes are available at https://github.com/Shen-Lab/CPAC.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less