skip to main content


Title: GlyMDB: Glycan Microarray Database and analysis toolset
Abstract Motivation

Glycan microarrays are capable of illuminating the interactions of glycan-binding proteins (GBPs) against hundreds of defined glycan structures, and have revolutionized the investigations of protein–carbohydrate interactions underlying numerous critical biological activities. However, it is difficult to interpret microarray data and identify structural determinants promoting glycan binding to glycan-binding proteins due to the ambiguity in microarray fluorescence intensity and complexity in branched glycan structures. To facilitate analysis of glycan microarray data alongside protein structure, we have built the Glycan Microarray Database (GlyMDB), a web-based resource including a searchable database of glycan microarray samples and a toolset for data/structure analysis.

Results

The current GlyMDB provides data visualization and glycan-binding motif discovery for 5203 glycan microarray samples collected from the Consortium for Functional Glycomics. The unique feature of GlyMDB is to link microarray data to PDB structures. The GlyMDB provides different options for database query, and allows users to upload their microarray data for analysis. After search or upload is complete, users can choose the criterion for binder versus non-binder classification. They can view the signal intensity graph including the binder/non-binder threshold followed by a list of glycan-binding motifs. One can also compare the fluorescence intensity data from two different microarray samples. A protein sequence-based search is performed using BLAST to match microarray data with all available PDB structures containing glycans. The glycan ligand information is displayed, and links are provided for structural visualization and redirection to other modules in GlycanStructure.ORG for further investigation of glycan-binding sites and glycan structures.

Availability and implementation

http://www.glycanstructure.org/glymdb.

Contact

wonpil@lehigh.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10131003
Author(s) / Creator(s):
 ;  ;  ;  ;  ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Membrane proteins are encoded by approximately one fifth of human genes but account for more than half of all US FDA approved drug targets. Thanks to new technological advances, the number of membrane proteins archived in the PDB is growing rapidly. However, automatic identification of membrane proteins or inference of membrane location is not a trivial task.

    Results

    We present recent improvements to the RCSB Protein Data Bank web portal (RCSB PDB, rcsb.org) that provide a wealth of new membrane protein annotations integrated from four external resources: OPM, PDBTM, MemProtMD and mpstruc. We have substantially enhanced the presentation of data on membrane proteins. The number of membrane proteins with annotations available on rcsb.org was increased by ∼80%. Users can search for these annotations, explore corresponding tree hierarchies, display membrane segments at the 1D amino acid sequence level, and visualize the predicted location of the membrane layer in 3D.

    Availability and implementation

    Annotations, search, tree data and visualization are available at our rcsb.org web portal. Membrane visualization is supported by the open-source Mol* viewer (molstar.org and github.com/molstar/molstar).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Binding-induced conformational changes challenge current computational docking algorithms by exponentially increasing the conformational space to be explored. To restrict this search to relevant space, some computational docking algorithms exploit the inherent flexibility of the protein monomers to simulate conformational selection from pre-generated ensembles. As the ensemble size expands with increased flexibility, these methods struggle with efficiency and high false positive rates.

    Results

    Here, we develop and benchmark RosettaDock 4.0, which efficiently samples large conformational ensembles of flexible proteins and docks them using a novel, six-dimensional, coarse-grained score function. A strong discriminative ability allows an eight-fold higher enrichment of near-native candidate structures in the coarse-grained phase compared to RosettaDock 3.2. It adaptively samples 100 conformations each of the ligand and the receptor backbone while increasing computational time by only 20–80%. In local docking of a benchmark set of 88 proteins of varying degrees of flexibility, the expected success rate (defined as cases with ≥50% chance of achieving 3 near-native structures in the 5 top-ranked ones) for blind predictions after resampling is 77% for rigid complexes, 49% for moderately flexible complexes and 31% for highly flexible complexes. These success rates on flexible complexes are a substantial step forward from all existing methods. Additionally, for highly flexible proteins, we demonstrate that when a suitable conformer generation method exists, the method successfully docks the complex.

    Availability and implementation

    As a part of the Rosetta software suite, RosettaDock 4.0 is available at https://www.rosettacommons.org to all non-commercial users for free and to commercial users for a fee.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted.

    Results

    Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.

    Availability and implementation

    https://github.com/largelymfs/DeepFold

     
    more » « less
  4. Abstract

    Water and ligand binding play critical roles in the structure and function of proteins, yet their binding sites and significance are difficult to predict a priori. Multiple solvent crystal structures (MSCS) is a method where several X‐ray crystal structures are solved, each in a unique solvent environment, with organic molecules that serve as probes of the protein surface for sites evolved to bind ligands, while the first hydration shell is essentially maintained. When superimposed, these structures contain a vast amount of information regarding hot spots of protein‐protein or protein‐ligand interactions, as well as conserved water‐binding sites retained with the change in solvent properties. Optimized mining of this information requires reliable structural data and a consistent, objective analysis tool. Detection of related solvent positions (DRoP) was developed to automatically organize and rank the water or small organic molecule binding sites within a given set of structures. It is a flexible tool that can also be used in conserved water analysis given multiple structures of any protein independent of the MSCS method. The DRoP output is an HTML format list of the solvent sites ordered by conservation rank in its population within the set of structures, along with renumbered and recolored PDB files for visualization and facile analysis. Here, we present a previously unpublished set of MSCS structures of bovine pancreatic ribonuclease A (RNase A) and use it together with published structures to illustrate the capabilities of DRoP.

     
    more » « less
  5. Abstract

    The Membranome database provides comprehensive structural information on single‐pass (i.e., bitopic) membrane proteins from six evolutionarily distant organisms, including protein–protein interactions, complexes, mutations, experimental structures, and models of transmembrane α‐helical dimers. We present a new version of this database, Membranome 3.0, which was significantly updated by revising the set of 5,758 bitopic proteins and incorporating models generated by AlphaFold 2 in the database. The AlphaFold models were parsed into structural domains located at the different membrane sides, modified to exclude low‐confidence unstructured terminal regions and signal sequences, validated through comparison with available experimental structures, and positioned with respect to membrane boundaries. Membranome 3.0 was re‐developed to facilitate visualization and comparative analysis of multiple 3D structures of proteins that belong to a specified family, complex, biological pathway, or membrane type. New tools for advanced search and analysis of proteins, their interactions, complexes, and mutations were included. The database is freely accessible athttps://membranome.org.

     
    more » « less