skip to main content

Title: BioLiP2: an updated structure database for biologically relevant ligand–protein interactions

With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein–ligand interactions that are biologically relevant. We developed the BioLiP2 database ( to extract biologically relevant protein–ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Glycan microarrays are capable of illuminating the interactions of glycan-binding proteins (GBPs) against hundreds of defined glycan structures, and have revolutionized the investigations of protein–carbohydrate interactions underlying numerous critical biological activities. However, it is difficult to interpret microarray data and identify structural determinants promoting glycan binding to glycan-binding proteins due to the ambiguity in microarray fluorescence intensity and complexity in branched glycan structures. To facilitate analysis of glycan microarray data alongside protein structure, we have built the Glycan Microarray Database (GlyMDB), a web-based resource including a searchable database of glycan microarray samples and a toolset for data/structure analysis.


    The current GlyMDB provides data visualization and glycan-binding motif discovery for 5203 glycan microarray samples collected from the Consortium for Functional Glycomics. The unique feature of GlyMDB is to link microarray data to PDB structures. The GlyMDB provides different options for database query, and allows users to upload their microarray data for analysis. After search or upload is complete, users can choose the criterion for binder versus non-binder classification. They can view the signal intensity graph including the binder/non-binder threshold followed by a list of glycan-binding motifs. One can also compare the fluorescence intensity data from two different microarray samples. A protein sequence-based search is performed using BLAST to match microarray data with all available PDB structures containing glycans. The glycan ligand information is displayed, and links are provided for structural visualization and redirection to other modules in GlycanStructure.ORG for further investigation of glycan-binding sites and glycan structures.

    Availability and implementation


    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  2. Abstract

    Water and ligand binding play critical roles in the structure and function of proteins, yet their binding sites and significance are difficult to predict a priori. Multiple solvent crystal structures (MSCS) is a method where several X‐ray crystal structures are solved, each in a unique solvent environment, with organic molecules that serve as probes of the protein surface for sites evolved to bind ligands, while the first hydration shell is essentially maintained. When superimposed, these structures contain a vast amount of information regarding hot spots of protein‐protein or protein‐ligand interactions, as well as conserved water‐binding sites retained with the change in solvent properties. Optimized mining of this information requires reliable structural data and a consistent, objective analysis tool. Detection of related solvent positions (DRoP) was developed to automatically organize and rank the water or small organic molecule binding sites within a given set of structures. It is a flexible tool that can also be used in conserved water analysis given multiple structures of any protein independent of the MSCS method. The DRoP output is an HTML format list of the solvent sites ordered by conservation rank in its population within the set of structures, along with renumbered and recolored PDB files for visualization and facile analysis. Here, we present a previously unpublished set of MSCS structures of bovine pancreatic ribonuclease A (RNase A) and use it together with published structures to illustrate the capabilities of DRoP.

    more » « less
  3. null (Ed.)
    The structural and regulatory elements in therapeutically relevant RNAs offer many opportunities for targeting by small molecules, yet fundamental understanding of what drives selectivity in small molecule:RNA recognition has been a recurrent challenge. In particular, RNAs tend to be more dynamic and offer less chemical functionality than proteins, and biologically active ligands must compete with the highly abundant and highly structured RNA of the ribosome. Indeed, the only small molecule drug targeting RNA other than the ribosome was just approved in August 2020, and our recent survey of the literature revealed fewer than 150 reported chemical probes that target non-ribosomal RNA in biological systems. This Feature outlines our efforts to improve small molecule targeting strategies and gain fundamental insights into small molecule:RNA recognition by analyzing patterns in both RNA-biased small molecule chemical space and RNA topological space privileged for differentiation. First, we synthesized libraries based on RNA binding scaffolds that allowed us to reveal general principles in small molecule:recognition and to ask precise chemical questions about drivers of affinity and selectivity. Elaboration of these scaffolds has led to recognition of medicinally relevant RNA targets, including viral and long noncoding RNA structures. More globally, we identified physicochemical, structural, and spatial properties of biologically active RNA ligands that are distinct from those of protein-targeted ligands, and we have provided the dataset and associated analytical tools as part of a publicly available online platform to facilitate RNA ligand discovery. At the same time, we used pattern recognition protocols to identify RNA topologies that can be differentially recognized by small molecules and have elaborated this technique to visualize conformational changes in RNA secondary structure. These fundamental insights into the drivers of RNA recognition in vitro have led to functional targeting of RNA structures in biological systems. We hope that these initial guiding principles, as well as the approaches and assays developed in their pursuit, will enable rapid progress toward the development of RNA-targeted chemical probes and ultimately new therapeutic approaches to a wide range of deadly human diseases. 
    more » « less
  4. Abstract

    Extensive efforts invested in understanding the rules of protein folding are now being applied, with good effect, in de novo design of proteins/peptides. For proteins containing standard α‐amino acids alone, knowledge derived from experimentally determined three‐dimensional (3D) structures of proteins and biologically active peptides are available from the Protein Data Bank (PDB), and the Cambridge Structural Database (CSD). These help predict and design protein structures, with reasonable confidence. However, our knowledge of 3D structures of biomolecules containing backbone modified amino acids is still evolving. A major challenge in de novo protein/peptide design concerns the engineering of conformationally constrained molecules with specific structural elements and chemical groups appropriately positioned for biological activity. This review explores four classes of amino acid modifications that constrain protein/peptide backbone structure. Systematic analysis of peptidic molecule structures (eg, bioactive peptides, inhibitors, antibiotics, and designed molecules), containing these backbone‐modified amino acids, found in the PDB and CSD are discussed. The review aims to provide structure–function insights that will guide future design of proteins/peptides.

    more » « less
  5. Abstract Motivation

    Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted.


    Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.

    Availability and implementation

    more » « less