Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein–ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation,more »
This content will become publicly available on July 1, 2023
What Makes GPCRs from Different Families Bind to the Same Ligand?
G protein-coupled receptors (GPCRs) are the largest class of cell-surface receptor proteins with important functions in signal transduction and often serve as therapeutic drug targets. With the rapidly growing public data on three dimensional (3D) structures of GPCRs and GPCR-ligand interactions, computational prediction of GPCR ligand binding becomes a convincing option to high throughput screening and other experimental approaches during the beginning phases of ligand discovery. In this work, we set out to computationally uncover and understand the binding of a single ligand to GPCRs from several different families. Three-dimensional structural comparisons of the GPCRs that bind to the same ligand revealed local 3D structural similarities and often these regions overlap with locations of binding pockets. These pockets were found to be similar (based on backbone geometry and side-chain orientation using APoc), and they correlate positively with electrostatic properties of the pockets. Moreover, the more similar the pockets, the more likely a ligand binding to the pockets will interact with similar residues, have similar conformations, and produce similar binding affinities across the pockets. These findings can be exploited to improve protein function inference, drug repurposing and drug toxicity prediction, and accelerate the development of new drugs.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
The lack of biologically relevant protein structures can hinder rational design of small molecules to target G protein-coupled receptors (GPCRs). While ensemble docking using multiple models of the protein target is a promising technique for structure-based drug discovery, model clustering and selection still need further investigations to achieve both high accuracy and efficiency. In this work, we have developed an original ensemble docking approach, which identifies the most relevant conformations based on the essential dynamics of the protein pocket. This approach is applied to the study of small-molecule antagonists for the PAC1 receptor, a class B GPCR and a regulator of stress. As few as four representative PAC1 models are selected from simulations of a homology model and then used to screen three million compounds from the ZINC database and 23 experimentally validated compounds for PAC1 targeting. Our essential dynamics ensemble docking (EDED) approach can effectively reduce the number of false negatives in virtual screening and improve the accuracy to seek potent compounds. Given the cost and difficulties to determine membrane protein structures for all the relevant states, our methodology can be useful for future discovery of small molecules to target more other GPCRs, either with or without experimental structures.
Recently, molecular fingerprints extracted from three-dimensional (3D) structures using advanced mathematics, such as algebraic topology, differential geometry, and graph theory have been paired with efficient machine learning, especially deep learning algorithms to outperform other methods in drug discovery applications and competitions. This raises the question of whether classical 2D fingerprints are still valuable in computer-aided drug discovery. This work considers 23 datasets associated with four typical problems, namely protein–ligand binding, toxicity, solubility and partition coefficient to assess the performance of eight 2D fingerprints. Advanced machine learning algorithms including random forest, gradient boosted decision tree, single-task deep neural network and multitask deep neural network are employed to construct efficient 2D-fingerprint based models. Additionally, appropriate consensus models are built to further enhance the performance of 2D-fingerprint-based methods. It is demonstrated that 2D-fingerprint-based models perform as well as the state-of-the-art 3D structure-based models for the predictions of toxicity, solubility, partition coefficient and protein–ligand binding affinity based on only ligand information. However, 3D structure-based models outperform 2D fingerprint-based methods in complex-based protein–ligand binding affinity predictions.
In Silico Prediction and Validation of CB2 Allosteric Binding Sites to Aid the Design of Allosteric ModulatorsAlthough the 3D structures of active and inactive cannabinoid receptors type 2 (CB2) are available, neither the X-ray crystal nor the cryo-EM structure of CB2-orthosteric ligand-modulator has been resolved, prohibiting the drug discovery and development of CB2 allosteric modulators (AMs). In the present work, we mainly focused on investigating the potential allosteric binding site(s) of CB2. We applied different algorithms or tools to predict the potential allosteric binding sites of CB2 with the existing agonists. Seven potential allosteric sites can be observed for either CB2-CP55940 or CB2-WIN 55,212-2 complex, among which sites B, C, G and K are supported by the reported 3D structures of Class A GPCRs coupled with AMs. Applying our novel algorithm toolset-MCCS, we docked three known AMs of CB2 including Ec2la (C-2), trans-β-caryophyllene (TBC) and cannabidiol (CBD) to each site for further comparisons and quantified the potential binding residues in each allosteric binding site. Sequentially, we selected the most promising binding pose of C-2 in five allosteric sites to conduct the molecular dynamics (MD) simulations. Based on the results of docking studies and MD simulations, we suggest that site H is the most promising allosteric binding site. We plan to conduct bio-assay validations in the future.
Abstract There is a need for new in vitro systems that enable pharmaceutical companies to collect more physiologically-relevant information on drug response in a low-cost and high-throughput manner. For this purpose, three-dimensional (3D) spheroidal models have been established as more effective than two-dimensional models. Current commercial techniques, however, rely heavily on self-aggregation of dissociated cells and are unable to replicate key features of the native tumor microenvironment, particularly due to a lack of control over extracellular matrix components and heterogeneity in shape, size, and aggregate forming tendencies. In this study, we overcome these challenges by coupling tissue engineering toolsets with microfluidics technologies to create engineered cancer microspheres. Specifically, we employ biosynthetic hydrogels composed of conjugated poly(ethylene glycol) (PEG) and fibrinogen protein (PEG-Fb) to create engineered breast and colorectal cancer tissue microspheres for 3D culture, tumorigenic characterization, and examination of potential for high-throughput screening (HTS). MCF7 and MDA-MB-231 cell lines were used to create breast cancer microspheres and the HT29 cell line and cells from a stage II patient-derived xenograft (PDX) were encapsulated to produce colorectal cancer (CRC) microspheres. Using our previously developed microfluidic system, highly uniform cancer microspheres (intra-batch coefficient of variation (CV) ≤ 5%, inter-batch CV < 2%) withmore »