skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Biomolecular simulation based machine learning models accurately predict sites of tolerability to the unnatural amino acid acridonylalanine
Abstract The incorporation of unnatural amino acids (Uaas) has provided an avenue for novel chemistries to be explored in biological systems. However, the successful application of Uaas is often hampered by site-specific impacts on protein yield and solubility. Although previous efforts to identify features which accurately capture these site-specific effects have been unsuccessful, we have developed a set of novel Rosetta Custom Score Functions and alternative Empirical Score Functions that accurately predict the effects of acridon-2-yl-alanine (Acd) incorporation on protein yield and solubility. Acd-containing mutants were simulated in PyRosetta, and machine learning (ML) was performed using either the decomposed values of the Rosetta energy function, or changes in residue contacts and bioinformatics. Using these feature sets, which represent Rosetta score function specific and bioinformatics-derived terms, ML models were trained to predict highly abstract experimental parameters such as mutant protein yield and solubility and displayed robust performance on well-balanced holdouts. Model feature importance analyses demonstrated that terms corresponding to hydrophobic interactions, desolvation, and amino acid angle preferences played a pivotal role in predicting tolerance of mutation to Acd. Overall, this work provides evidence that the application of ML to features extracted from simulated structural models allow for the accurate prediction of diverse and abstract biological phenomena, beyond the predictivity of traditional modeling and simulation approaches.  more » « less
Award ID(s):
1708759
PAR ID:
10381586
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Scientific Reports
Volume:
11
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Acridonylalanine (Acd) is a fluorescent amino acid that is highly photostable, with a high quantum yield and long fluorescence lifetime in water. These properties make it superior to existing genetically encodable fluorescent amino acids for monitoring protein interactions and conformational changes through fluorescence polarization or lifetime experiments, including fluorescence lifetime imaging microscopy (FLIM). Here, we report the genetic incorporation of Acd using engineered pyrrolysine tRNA synthetase (RS) mutants that allow for efficient Acd incorporation in both E. coli and mammalian cells. We compare protein yields and amino acid specificity for these Acd RSs to identify an optimal construct. We also demonstrate the use of Acd in FLIM, where its long lifetime provides strong contrast compared to endogenous fluorophores and engineered fluorescent proteins, which have lifetimes less than 5 ns. 
    more » « less
  2. The ability to accurately predict protein–protein interactions is critically important for understanding major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein–protein interactions using only primary sequence information. It utilizes the concept of physicochemical similarity to determine which interactions will most likely occur. In our approach, the physicochemical features of proteins are extracted using bioinformatics tools for different organisms. Then they are utilized in a machine-learning method to identify successful protein–protein interactions via correlation analysis. It was found that the most important property that correlates most with the protein–protein interactions for all studied organisms is dipeptide amino acid composition (the frequency of specific amino acid pairs in a protein sequence). While current approaches often overlook the specificity of protein–protein interactions with different organisms, our method yields context-specific features that determine protein–protein interactions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators, as well as to the barnase–barstar complex, demonstrating the method’s versatility across different biological systems. Our approach can be applied to predict protein–protein interactions in any biological system, providing an important tool for investigating complex biological processes’ mechanisms. 
    more » « less
  3. Site-specific placement of unnatural amino acids, particularly those responsive to light, offers an elegant approach to control protein function and capture their fleeting ‘interactome’. Herein, we have resurrected 4-(trifluoromethyldiazirinyl)-phenylalanine, an underutilized photo-crosslinker, by introducing several key features including easy synthetic access, site-specific incorporation by ‘privileged’ synthetases and superior crosslinking efficiency, to develop photo-crosslinkable bromodomains suitable for ‘interactome’ profiling. 
    more » « less
  4. The conformational heterogeneity and dynamics of protein side chains contribute to function, but investigating exactly how is hindered by experimental challenges arising from the fast timescales involved and the spatial heterogeneity of protein structures. The potential of two-dimensional infrared (2D IR) spectroscopy for measuring conformational heterogeneity and dynamics with unprecedented spatial and temporal resolution has motivated extensive effort to develop amino acids with functional groups that have frequency-resolved absorptions to serve as probes of their protein microenvironments. We demonstrate the full advantage of the approach by selective incorporation of the probe p -cyanophenylalanine at six distinct sites in a Src homology 3 domain and the application of 2D IR spectroscopy to site-specifically characterize heterogeneity and dynamics and their contribution to cognate ligand binding. The approach revealed a wide range of microenvironments and distinct responses to ligand binding, including at the three adjacent, conserved aromatic residues that form the recognition surface of the protein. Molecular dynamics simulations performed for all the labeled proteins provide insight into the underlying heterogeneity and dynamics. Similar application of 2D IR spectroscopy and site-selective probe incorporation will allow for the characterization of heterogeneity and dynamics of other proteins, how heterogeneity and dynamics are affected by solvation and local structure, and how they might contribute to biological function. 
    more » « less
  5. Structural bioinformatics analyzes protein structural models with the goal of uncovering molecular drivers of food functionality. This field aims to develop tools that can rapidly extract relevant information from protein databases as well as organize this information for researchers interested in studying protein functionality. Food bioinformaticians take advantage of millions of protein amino acid sequences and structures contained within these databases, extracting features such as surface hydrophobicity that are then used to model functionality, including solubility, thermostability, and emulsification. This work is aided by a protein structure–function relationship framework, in which bioinformatic properties are linked to physicochemical experimentation. Strong bioinformatic correlations exist for protein secondary structure, electrostatic potential, and surface hydrophobicity. Modeling changes in protein structures through molecular mechanics is an increasingly accessible field that will continue to propel food science research. 
    more » « less