skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data
Abstract BindingDB (bindingdb.org) is a public, web-accessible database of experimentally measured binding affinities between small molecules and proteins, which supports diverse applications including medicinal chemistry, biochemical pathway annotation, training of artificial intelligence models and computational chemistry methods development. This update reports significant growth and enhancements since our last review in 2016. Of note, the database now contains 2.9 million binding measurements spanning 1.3 million compounds and thousands of protein targets. This growth is largely attributable to our unique focus on curating data from US patents, which has yielded a substantial influx of novel binding data. Recent improvements include a remake of the website following responsive web design principles, enhanced search and filtering capabilities, new data download options and webservices and establishment of a long-term data archive replicated across dispersed sites. We also discuss BindingDB’s positioning relative to related resources, its open data sharing policies, insights gleaned from the dataset and plans for future growth and development.  more » « less
Award ID(s):
2321666
PAR ID:
10665967
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
53
Issue:
D1
ISSN:
0305-1048
Page Range / eLocation ID:
D1633 to D1644
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium‐binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium‐binding protein variants. Here, we developed a chemistry‐informed, machine‐learning algorithm that implements a game theoretic approach to explain the output of a machine‐learning model without the prerequisite of an excessively large database for high‐performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium‐binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaXML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium‐binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space. 
    more » « less
  2. Data-driven interatomic potentials (IPs) trained on large collections of first principles calculations are rapidly becoming essential tools in the fields of computational materials science and chemistry for performing atomic-scale simulations. Despite this, apart from a few notable exceptions, there is a distinct lack of well-organized, public datasets in common formats available for use with IP development. This deficiency precludes the research community from implementing widespread benchmarking, which is essential for gaining insight into model performance and transferability, and also limits the development of more general, or even universal, IPs. To address this issue, we introduce the ColabFit Exchange, the first database providing open access to a large collection of systematically organized datasets from multiple domains that is especially designed for IP development. The ColabFit Exchange is publicly available at https://colabfit.org, providing a web-based interface for exploring, downloading, and contributing datasets. Composed of data collected from the literature or provided by community researchers, the ColabFit Exchange currently (September 2023) consists of 139 datasets spanning nearly 70 000 unique chemistries, and is intended to continuously grow. In addition to outlining the software framework used for constructing and accessing the ColabFit Exchange, we also provide analyses of the data, quantifying the diversity of the database and proposing metrics for assessing the relative diversity of multiple datasets. Finally, we demonstrate an end-to-end IP development pipeline, utilizing datasets from the ColabFit Exchange, fitting tools from the KLIFF software package, and validation tests provided by the OpenKIM framework. 
    more » « less
  3. A new approach to web application development is presented, in which an application is constructed by configuring and composing concepts drawn from a catalog developed by experts. A concept is a self-contained, reusable increment of functionality. Each concept includes both front-end and back-end functionality, and exports a collection of components—full-stack GUI elements, backed by application logic and database storage. To build an app, the developer imports concepts from the catalog, tunes them to fit the application’s particular needs via configuration variables, and links concept components together to create pages. Components of different concepts may be executed independently, or bound together declaratively with dataflows and synchronization. The instantiation, configuration, linking and binding of components is all expressed in a simple template language that extends HTML. The approach has been implemented in a platform called Déjà Vu, which we outline and compare to conventional web application architectures. We describe a case study in which a collection of applications previously built as team projects for a web programming course were replicated in Déjà Vu. Preliminary results validate our hypothesis, suggesting that a variety of non-trivial applications can be built from a repository of generic concepts. 
    more » « less
  4. Abstract FatPlants, an open-access, web-based database, consolidates data, annotations, analysis results, and visualizations of lipid-related genes, proteins, and metabolic pathways in plants. Serving as a minable resource, FatPlants offers a user-friendly interface for facilitating studies into the regulation of plant lipid metabolism and supporting breeding efforts aimed at increasing crop oil content. This web resource, developed using data derived from our own research, curated from public resources, and gleaned from academic literature, comprises information on known fatty-acid-related proteins, genes, and pathways in multiple plants, with an emphasis on Glycine max, Arabidopsis thaliana, and Camelina sativa. Furthermore, the platform includes machine-learning based methods and navigation tools designed to aid in characterizing metabolic pathways and protein interactions. Comprehensive gene and protein information cards, a Basic Local Alignment Search Tool search function, similar structure search capacities from AphaFold, and ChatGPT-based query for protein information are additional features. Database URL: https://www.fatplants.net/ 
    more » « less
  5. Abstract Cucurbit[n]urils (CB[n]s) are cyclic macrocycles with rich host‐guest chemistry. In many cases, guest binding in CB[n]s results in host structural deformations. Unfortunately, measuring such deformations remains a major challenge, with only a handful of manual estimations reported in the literature. To address this challenge, we have developed the public program ElliptiCB[n], which is available on GitHub, that provides a robust and automated method for measuring the elliptical deformations in CB[n] hosts. We outline the development and validation of this approach, apply ElliptiCB[n] to measure the ellipticity of the 1113 available CB[n] structures from the Cambridge Structural Database (CSD), and directly investigate the structural deformations of CB[5], CB[6], CB[7], CB[8], and CB[10] hosts. We also report the general landscape of accessible CB[n] elliptical deformations and compare ellipticity distributions across CB[n] hosts and host‐guest complexes. We found that in almost all cases guest binding significantly impacts the distribution of host ellipticity distributions and that these distributions are dissimilar across host‐guest complexes of differently sized CB[n]s. We anticipate that this work will provide a useful approach for understanding of the flexibility of CB[n] hosts and will also enable future measurement and standardization of ellipticity measurements of CB[n]s. 
    more » « less