skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Designing proteins: Mimicking natural protein sequence heterogeneity
This study presents an enhanced protein design algorithm that aims to emulate natural heterogeneity of protein sequences. Initial analysis revealed that natural proteins exhibit a permutation composition lower than the theoretical maximum, suggesting a selective utilization of the 20-letter amino acid alphabet. By not constraining the amino acid composition of the protein sequence but instead allowing random reshuffling of the composition, the resulting design algorithm generates sequences that maintain lower permutation compositions in equilibrium, aligning closely with natural proteins. Folding free energy computations demonstrated that the designed sequences refold to their native structures with high precision, except for proteins with large disordered regions. In addition, direct coupling analysis showed a strong correlation between predicted and actual protein contacts, with accuracy exceeding 82% for a large number of top pairs (>4L). The algorithm also resolved biases in previous designs, ensuring a more accurate representation of protein interactions. Overall, it not only mimics the natural heterogeneity of proteins but also ensures correct folding, marking a significant advancement in protein design and engineering.  more » « less
Award ID(s):
2019745 2210291 1943442
PAR ID:
10597044
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
AIP Publishing
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
161
Issue:
19
ISSN:
0021-9606
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small‐large pair changing to a large‐small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for “twilight zone” protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation. 
    more » « less
  2. null (Ed.)
    The relation between amino acid (AA) sequence and biologically active conformation controls the process of polypeptide chains folding into three-dimensional (3d) protein structures. The recent achievements in the resolution achieved in cryo-electron microscopy coupled with improvements in computational methodologies have accelerated the analysis of structures and properties of proteins. However, the detailed interaction between AAs has not been fully elucidated. Herein, we present a de novo method to evaluate inter-amino acid interactions based on the concept of accurately evaluating the amino acid bond pairs (AABP). The results obtained enabled the identification of complex 3d long-range interconnected AA interacting network in proteins. The method is applied to the receptor binding domain (RBD) of the SARS-CoV-2 spike protein. We show that although nearest-neighbor AAs in the primary sequence have large AABP, other nonlocal AAs make substantial contribution to AABP with significant participation of both covalent and hydrogen bonding. Detailed analysis of AABP in RBD reveals the pivotal role they play in sequence conservation with profound implications on residue mutations and for therapeutic drug design. This approach could be easily applied to many other proteins of biomedical interest in life sciences. 
    more » « less
  3. Abstract Amino‐acid protein composition plays an important role in biology, medicine, and nutrition. Here, a groundbreaking protein analysis technique that quickly estimates amino acid composition and secondary structure across various protein sizes, while maintaining their natural states is introduced and validated. This method combines multivariate statistics and the thermostable Raman interaction profiling (TRIP) technique, eliminating the need for complex preparations. In order to validate the approach, the Raman spectra are constructed of seven proteins of varying sizes by utilizing their amino acid frequencies and the Raman spectra of individual amino acids. These constructed spectra exhibit a close resemblance to the actual measured Raman spectra. Specific vibrational modes tied to free amino and carboxyl termini of the amino acids disappear as signals linked to secondary structures emerged under TRIP conditions. Furthermore, the technique is used inversely to successfully estimate amino acid compositions and secondary structures of unknown proteins across a range of sizes, achieving impressive accuracy ranging between 1.47% and 5.77% of root mean square errors (RMSE). These results extend the uses for TRIP beyond interaction profiling, to probe amino acid composition and structure. 
    more » « less
  4. null (Ed.)
    Abstract Here we report the first recovery, sequencing, and identification of fossil biomineral proteins from a Pleistocene fossil invertebrate, the stony coral Orbicella annularis . This fossil retains total hydrolysable amino acids of a roughly similar composition to extracts from modern O. annularis skeletons, with the amino acid data rich in Asx (Asp + Asn) and Glx (Glu + Gln) typical of invertebrate skeletal proteins. It also retains several proteins, including a highly acidic protein, also known from modern coral skeletal proteomes that we sequenced by LC–MS/MS over multiple trials in the best-preserved fossil coral specimen. A combination of degradation or amino acid racemization inhibition of trypsin digestion appears to limit greater recovery. Nevertheless, our workflow determines optimal samples for effective sequencing of fossil coral proteins, allowing comparison of modern and fossil invertebrate protein sequences, and will likely lead to further improvements of the methods. Sequencing of endogenous organic molecules in fossil invertebrate biominerals provides an ancient record of composition, potentially clarifying evolutionary changes and biotic responses to paleoenvironments. 
    more » « less
  5. Many large proteins suffer from slow or inefficient folding in vitro. It has long been known that this problem can be alleviated in vivo if proteins start folding cotranslationally. However, the molecular mechanisms underlying this improvement have not been well established. To address this question, we use an all-atom simulation-based algorithm to compute the folding properties of various large protein domains as a function of nascent chain length. We find that for certain proteins, there exists a narrow window of lengths that confers both thermodynamic stability and fast folding kinetics. Beyond these lengths, folding is drastically slowed by nonnative interactions involving C-terminal residues. Thus, cotranslational folding is predicted to be beneficial because it allows proteins to take advantage of this optimal window of lengths and thus avoid kinetic traps. Interestingly, many of these proteins’ sequences contain conserved rare codons that may slow down synthesis at this optimal window, suggesting that synthesis rates may be evolutionarily tuned to optimize folding. Using kinetic modeling, we show that under certain conditions, such a slowdown indeed improves cotranslational folding efficiency by giving these nascent chains more time to fold. In contrast, other proteins are predicted not to benefit from cotranslational folding due to a lack of significant nonnative interactions, and indeed these proteins’ sequences lack conserved C-terminal rare codons. Together, these results shed light on the factors that promote proper protein folding in the cell and how biomolecular self-assembly may be optimized evolutionarily. 
    more » « less