skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Key aspects of the past 30 years of protein design
Abstract Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins’ most remarkable feature is their modularity. The large amount of information required to specify each protein’s function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.  more » « less
Award ID(s):
2019745
PAR ID:
10416883
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Reports on Progress in Physics
Volume:
85
Issue:
8
ISSN:
0034-4885
Page Range / eLocation ID:
086601
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract According to the Principle of Minimal Frustration, folded proteins can only have a minimal number of strong energetic conflicts in their native states. However, not all interactions are energetically optimized for folding but some remain in energetic conflict, i.e. they are highly frustrated. This remaining local energetic frustration has been shown to be statistically correlated with distinct functional aspects such as protein-protein interaction sites, allosterism and catalysis. Fuelled by the recent breakthroughs in efficient protein structure prediction that have made available good quality models for most proteins, we have developed a strategy to calculate local energetic frustration within large protein families and quantify its conservation over evolutionary time. Based on this evolutionary information we can identify how stability and functional constraints have appeared at the common ancestor of the family and have been maintained over the course of evolution. Here, we present FrustraEvo, a web server tool to calculate and quantify the conservation of local energetic frustration in protein families. 
    more » « less
  2. Abstract Proteins that drive processes like clathrin-mediated endocytosis (CME) are expressed at copy numbers within a cell and across cell types varying from hundreds (e.g. auxilin) to millions (e.g. clathrin). These variations contain important information about function, but without integration with the interaction network, they cannot capture how supply and demand for each protein depends on binding to shared and distinct partners. Here we construct the interface-resolved network of 82 proteins involved in CME and establish a metric, a stoichiometric balance ratio (SBR), that quantifies whether each protein in the network has an abundance that is sub- or super-stoichiometric dependent on the global competition for binding. We find that highly abundant proteins (like clathrin) are super-stoichiometric, but that not all super-stoichiometric proteins are highly abundant, across three cell populations (HeLa, fibroblast, and neuronal synaptosomes). Most strikingly, within all cells there is significant competition to bind shared sites on clathrin and the central AP-2 adaptor by other adaptor proteins, resulting in most being in excess supply. Our network and systematic analysis, including response to perturbations of network components, show how competition for shared binding sites results in functionally similar proteins having widely varying stoichiometries, due to variations in both abundance and their unique network of binding partners. 
    more » « less
  3. Abstract MotivationAs fewer than 1% of proteins have protein function information determined experimentally, computationally predicting the function of proteins is critical for obtaining functional information for most proteins and has been a major challenge in protein bioinformatics. Despite the significant progress made in protein function prediction by the community in the last decade, the general accuracy of protein function prediction is still not high, particularly for rare function terms associated with few proteins in the protein function annotation database such as the UniProt. ResultsWe introduce TransFew, a new transformer model, to learn the representations of both protein sequences and function labels [Gene Ontology (GO) terms] to predict the function of proteins. TransFew leverages a large pre-trained protein language model (ESM2-t48) to learn function-relevant representations of proteins from raw protein sequences and uses a biological natural language model (BioBert) and a graph convolutional neural network-based autoencoder to generate semantic representations of GO terms from their textual definition and hierarchical relationships, which are combined together to predict protein function via the cross-attention. Integrating the protein sequence and label representations not only enhances overall function prediction accuracy, but delivers a robust performance of predicting rare function terms with limited annotations by facilitating annotation transfer between GO terms. Availability and implementationhttps://github.com/BioinfoMachineLearning/TransFew. 
    more » « less
  4. null (Ed.)
    Chloroviruses are large, plaque-forming, dsDNA viruses that infect chlorella-like green algae that live in a symbiotic relationship with protists. Chloroviruses have genomes from 290 to 370 kb, and they encode as many as 400 proteins. One interesting feature of chloroviruses is that they encode a potassium ion (K+) channel protein named Kcv. The Kcv protein encoded by SAG chlorovirus ATCV-1 is one of the smallest known functional K+ channel proteins consisting of 82 amino acids. The KcvATCV-1 protein has similarities to the family of two transmembrane domain K+ channel proteins; it consists of two transmembrane α-helixes with a pore region in the middle, making it an ideal model for studying K+ channels. To assess their genetic diversity, kcv genes were sequenced from 103 geographically distinct SAG chlorovirus isolates. Of the 103 kcv genes, there were 42 unique DNA sequences that translated into 26 new Kcv channels. The new predicted Kcv proteins differed from KcvATCV-1 by 1 to 55 amino acids. The most conserved region of the Kcv protein was the filter, the turret and the pore helix were fairly well conserved, and the outer and the inner transmembrane domains of the protein were the most variable. Two of the new predicted channels were shown to be functional K+ channels. 
    more » « less
  5. Abstract Ultrafast folding proteins have become an important paradigm in the study of protein folding dynamics. Due to their low energetic barriers and fast kinetics, they are amenable for study by both experiment and simulation. However, single molecule force spectroscopy experiments on these systems are challenging as these proteins do not provide the mechanical fingerprints characteristic of more mechanically stable proteins, which makes it difficult to extract information about the folding dynamics of the molecule. Here, we investigate the unfolding of the ultrafast protein Engrailed Homeodomain (EnHD) by single-molecule atomic force microscopy experiments. Constant speed experiments on EnHD result in featureless transitions typical of compliant proteins. However, in the force-ramp mode we recover sigmoidal curves that we interpret as a very compliant protein that folds and unfolds many times over a marginal barrier. This is supported by a simple theoretical model and coarse-grained molecular simulations. Our results show the ability of force to modulate the unfolding dynamics of ultrafast folding proteins. 
    more » « less