skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs
Abstract Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein–protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.  more » « less
Award ID(s):
1937533
PAR ID:
10392064
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
23
Issue:
4
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neurexin-1 (NRXN1) is a membrane protein essential in synapse formation and cell signaling as a cell-adhesion molecule and cell-surface receptor. NRXN1 and its binding partner neuroligin have been associated with deficits in cognition. Recent genetics research has linked NRXN1 missense mutations to increased risk for brain disorders, including schizophrenia (SCZ) and autism spectrum disorder (ASD). Investigation of the structure–function relationship in NRXN1 has proven difficult due to a lack of the experimental full-length membrane protein structure. AlphaFold, a deep learning-based predictor, succeeds in high-quality protein structure prediction and offers a solution for membrane protein model construction. In the study, we applied a computational saturation mutagenesis method to analyze the systemic effects of missense mutations on protein functions in a human NRXN1 structure predicted from AlphaFold and an experimental Bos taurus structure. The folding energy changes were calculated to estimate the effects of the 29,540 mutations of AlphaFold model on protein stability. The comparative study on the experimental and computationally predicted structures shows that these energy changes are highly correlated, demonstrating the reliability of the AlphaFold structure for the downstream bioinformatics analysis. The energy calculation revealed that some target mutations associated with SCZ and ASD could make the protein unstable. The study can provide helpful information for characterizing the disease-causing mutations and elucidating the molecular mechanisms by which the variations cause SCZ and ASD. This methodology could provide the bioinformatics protocol to investigate the effects of target mutations on multiple AlphaFold structures. 
    more » « less
  2. Wallqvist, Anders (Ed.)
    Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease. 
    more » « less
  3. Myeloperoxidase (MPO) is a heme peroxidase with microbicidal properties. MPO plays a role in the host’s innate immunity by producing reactive oxygen species inside the cell against foreign organisms. However, there is little functional evidence linking missense mutations to human diseases. We utilized in silico saturation mutagenesis to generate and analyze the effects of 10,811 potential missense mutations on MPO stability. Our results showed that ~71% of the potential missense mutations destabilize MPO, and ~8% stabilize the MPO protein. We showed that G402W, G402Y, G361W, G402F, and G655Y would have the highest destabilizing effect on MPO. Meanwhile, D264L, G501M, D264H, D264M, and G501L have the highest stabilization effect on the MPO protein. Our computational tool prediction showed the destabilizing effects in 13 out of 14 MPO missense mutations that cause diseases in humans. We also analyzed putative post-translational modification (PTM) sites on the MPO protein and mapped the PTM sites to disease-associated missense mutations for further analysis. Our analysis showed that R327H associated with frontotemporal dementia and R548W causing generalized pustular psoriasis are near these PTM sites. Our results will aid further research into MPO as a biomarker for human complex diseases and a candidate for drug target discovery. 
    more » « less
  4. Abstract Control of eukaryotic cellular function is heavily reliant on the phosphorylation of proteins at specific amino acid residues, such as serine, threonine, tyrosine, and histidine. Protein kinases that are responsible for this process comprise one of the largest families of evolutionarily related proteins. Dysregulation of protein kinase signaling pathways is a frequent cause of a large variety of human diseases including cancer, autoimmune, neurodegenerative, and cardiovascular disorders. In this study, we mapped all pathogenic mutations in 497 human protein kinase domains from the ClinVar database to the reference structure of Aurora kinase A (AURKA) and grouped them by the relevance to the disease type. Our study revealed that the majority of mutation hotspots associated with cancer are situated within the catalytic and activation loops of the kinase domain, whereas non‐cancer‐related hotspots tend to be located outside of these regions. Additionally, we identified a hotspot at residue R371 of the AURKA structure that has the highest number of exclusively non‐cancer‐related pathogenic mutations (21) and has not been previously discussed. 
    more » « less
  5. Abstract Mutations in the cardiac splicing factor RBM20 lead to malignant dilated cardiomyopathy (DCM). To understand the mechanism of RBM20-associated DCM, we engineered isogenic iPSCs with DCM-associated missense mutations in RBM20 as well as RBM20 knockout (KO) iPSCs. iPSC-derived engineered heart tissues made from these cell lines recapitulate contractile dysfunction of RBM20-associated DCM and reveal greater dysfunction with missense mutations than KO. Analysis of RBM20 RNA binding by eCLIP reveals a gain-of-function preference of mutant RBM20 for 3′ UTR sequences that are shared with amyotrophic lateral sclerosis (ALS) and processing-body associated RNA binding proteins (FUS, DDX6). Deep RNA sequencing reveals that the RBM20 R636S mutant has unique gene, splicing, polyadenylation and circular RNA defects that differ from RBM20 KO. Super-resolution microscopy verifies that mutant RBM20 maintains very limited nuclear localization potential; rather, the mutant protein associates with cytoplasmic processing bodies (DDX6) under basal conditions, and with stress granules (G3BP1) following acute stress. Taken together, our results highlight a pathogenic mechanism in cardiac disease through splicing-dependent and -independent pathways. 
    more » « less