skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks
Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used S sym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between S sym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.  more » « less
Award ID(s):
1660648
PAR ID:
10381983
Author(s) / Creator(s):
; ; ;
Editor(s):
Fariselli, Piero
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
16
Issue:
11
ISSN:
1553-7358
Page Range / eLocation ID:
e1008291
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Structural information of biological macromolecules is crucial and necessary to deliver predictions about the effects of mutations—whether polymorphic or deleterious (i.e., disease causing), wherein, thermodynamic parameters, namely, folding and binding free energies potentially serve as effective biomarkers. It may be emphasized that the effect of a mutation depends on various factors, including the type of protein (globular, membrane or intrinsically disordered protein) and the structural context in which it occurs. Such information may positively aid drug-design. Furthermore, due to the intrinsic plasticity of proteins, even mutations involving radical change of the structural and physico–chemical properties of the amino acids (native vs. mutant) can still have minimal effects on protein thermodynamics. However, if a mutation causes significant perturbation by either folding or binding free energies, it is quite likely to be deleterious. Mitigating such effects is a promising alternative to the traditional approaches of designing inhibitors. This can be done by structure-based in silico screening of small molecules for which binding to the dysfunctional protein restores its wild type thermodynamics. In this review we emphasize the effects of mutations on two important biophysical properties, stability and binding affinity, and how structures can be used for structure-based drug design to mitigate the effects of disease-causing variants on the above biophysical properties. 
    more » « less
  2. Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Accurately modeling this process is key to understanding and predicting cancer evolution. Here, we introduce clone to mutation (CloMu), a flexible and low-parameter tree generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection, causality and interchangeability between mutations, and mutation fitness. Importantly, previous methods support only some of these tasks, and many suffer from overfitting on data sets with a large number of mutations. Using simulations, we show that CloMu either matches or outperforms current methods on a wide variety of prediction tasks. In particular, for simulated data with interchangeable mutations, current methods are unable to uncover causal relationships as effectively as CloMu. On breast cancer and leukemia cohorts, we show that CloMu determines similarities and causal relationships between mutations as well as the fitness of mutations. We validate CloMu's inferred mutation fitness values for the leukemia cohort by comparing them to clonal proportion data not used during training, showing high concordance. In summary, CloMu's low-parameter model facilitates a wide range of prediction tasks regarding cancer evolution on increasingly available cohort-level data sets. 
    more » « less
  3. Abstract Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non‐homology gene features. Among the eight supervised classification algorithms evaluated, random‐forest‐based prediction consistently provided the most accurate gene function prediction. Non‐homology‐based functional annotation provides complementary strengths to homology‐based annotation, with higher average performance in Biological Process GO terms, the domain where homology‐based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology‐based functional annotation is highest. GO prediction models trained with homology‐based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non‐homology‐based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology‐based functional annotations. 
    more » « less
  4. null (Ed.)
    Antibody therapeutics and vaccines are among our last resort to end the raging COVID-19 pandemic. They, however, are prone to over 5000 mutations on the spike (S) protein uncovered by a Mutation Tracker based on over 200 000 genome isolates. It is imperative to understand how mutations will impact vaccines and antibodies in development. In this work, we first study the mechanism, frequency, and ratio of mutations on the S protein which is the common target of most COVID-19 vaccines and antibody therapies. Additionally, we build a library of 56 antibody structures and analyze their 2D and 3D characteristics. Moreover, we predict the mutation-induced binding free energy (BFE) changes for the complexes of S protein and antibodies or ACE2. By integrating genetics, biophysics, deep learning, and algebraic topology, we reveal that most of the 462 mutations on the receptor-binding domain (RBD) will weaken the binding of S protein and antibodies and disrupt the efficacy and reliability of antibody therapies and vaccines. A list of 31 antibody disrupting mutants is identified, while many other disruptive mutations are detailed as well. We also unveil that about 65% of the existing RBD mutations, including those variants recently found in the United Kingdom (UK) and South Africa, will strengthen the binding between the S protein and human angiotensin-converting enzyme 2 (ACE2), resulting in more infectious COVID-19 variants. We discover the disparity between the extreme values of RBD mutation-induced BFE strengthening and weakening of the bindings with antibodies and angiotensin-converting enzyme 2 (ACE2), suggesting that SARS-CoV-2 is at an advanced stage of evolution for human infection, while the human immune system is able to produce optimized antibodies. This discovery, unfortunately, implies the vulnerability of current vaccines and antibody drugs to new mutations. Our predictions were validated by comparison with more than 1400 deep mutations on the S protein RBD. Our results show the urgent need to develop new mutation-resistant vaccines and antibodies and to prepare for seasonal vaccinations. 
    more » « less
  5. ABSTRACT During ϕX174 morphogenesis, 240 copies of the external scaffolding protein D organize 12 pentameric assembly intermediates into procapsids, a reaction reconstituted in vitro . In previous studies, ϕX174 strains resistant to exogenously expressed dominant lethal D genes were experimentally evolved. Resistance was achieved by the stepwise acquisition of coat protein mutations. Once resistance was established, a stimulatory D protein mutation that greatly increased strain fitness arose. In this study, in vitro biophysical and biochemical methods were utilized to elucidate the mechanistic details and evolutionary trade-offs created by the resistance mutations. The kinetics of procapsid formation was analyzed in vitro using wild-type, inhibitory, and experimentally evolved coat and scaffolding proteins. Our data suggest that viral fitness is correlated with in vitro assembly kinetics and demonstrate that in vivo experimental evolution can be analyzed within an in vitro biophysical context. IMPORTANCE Experimental evolution is an extremely valuable tool. Comparisons between ancestral and evolved genotypes suggest hypotheses regarding adaptive mechanisms. However, it is not always possible to rigorously test these hypotheses in vivo . We applied in vitro biophysical and biochemical methods to elucidate the mechanistic details that allowed an experimentally evolved virus to become resistant to an antiviral protein and then evolve a productive use for that protein. Moreover, our results indicate that the respective roles of scaffolding and coat proteins may have been redistributed during the evolution of a two-scaffolding-protein system. In one-scaffolding-protein virus assembly systems, coat proteins promiscuously interact to form heterogeneous aberrant structures in the absence of scaffolding proteins. Thus, the scaffolding protein controls fidelity. During ϕX174 assembly, the external scaffolding protein acts like a coat protein, self-associating into large aberrant spherical structures in the absence of coat protein, whereas the coat protein appears to control fidelity. 
    more » « less