skip to main content


Title: EvoEF2: accurate and fast energy function for computational protein design
Abstract Motivation

The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs.

Results

We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data.

Availability and implementation

The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
1901191
NSF-PAR ID:
10121804
Author(s) / Creator(s):
 ;  ;  ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The FastDesign protocol in the molecular modeling program Rosetta iterates between sequence optimization and structure refinement to stabilize de novo designed protein structures and complexes. FastDesign has been used previously to design novel protein folds and assemblies with important applications in research and medicine. To promote sampling of alternative conformations and sequences, FastDesign includes stages where the energy landscape is smoothened by reducing repulsive forces. Here, we discover that this process disfavors larger amino acids in the protein core because the protein compresses in the early stages of refinement. By testing alternative ramping strategies for the repulsive weight, we arrive at a scheme that produces lower energy designs with more native‐like sequence composition in the protein core. We further validate the protocol by designing and experimentally characterizing over 4000 proteins and show that the new protocol produces higher stability proteins.

     
    more » « less
  2. The design of completely synthetic proteins from first principles— de novo protein design—is challenging. This is because, despite recent advances in computational protein–structure prediction and design, we do not understand fully the sequence-to-structure relationships for protein folding, assembly, and stabilization. Antiparallel 4-helix bundles are amongst the most studied scaffolds for de novo protein design. We set out to re-examine this target, and to determine clear sequence-to-structure relationships, or design rules, for the structure. Our aim was to determine a common and robust sequence background for designing multiple de novo 4-helix bundles. In turn, this could be used in chemical and synthetic biology to direct protein–protein interactions and as scaffolds for functional protein design. Our approach starts by analyzing known antiparallel 4-helix coiled-coil structures to deduce design rules. In terms of the heptad repeat, abcdefg — i.e. , the sequence signature of many helical bundles—the key features that we identify are: a = Leu, d = Ile, e = Ala, g = Gln, and the use of complementary charged residues at b and c. Next, we implement these rules in the rational design of synthetic peptides to form antiparallel homo- and heterotetramers. Finally, we use the sequence of the homotetramer to derive in one step a single-chain 4-helix-bundle protein for recombinant production in E. coli . All of the assembled designs are confirmed in aqueous solution using biophysical methods, and ultimately by determining high-resolution X-ray crystal structures. Our route from peptides to proteins provides an understanding of the role of each residue in each design. 
    more » « less
  3. ABSTRACT

    In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a newde novoloop sampling method, the Parallely filtered Energy Targeted All‐atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side‐chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near‐native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existingde novomethods for generating an ensemble of conformations. Proteins 2017; 85:1402–1412. © 2017 Wiley Periodicals, Inc.

     
    more » « less
  4. Abstract

    The continued emergence of new SARS‐CoV‐2 variants has accentuated the growing need for fast and reliable methods for the design of potentially neutralizing antibodies (Abs) to counter immune evasion by the virus. Here, we report on the de novo computational design of high‐affinity Ab variable regions (Fv) through the recombination of VDJ genes targeting the most solvent‐exposed hACE2‐binding residues of the SARS‐CoV‐2 spike receptor binding domain (RBD) protein using the software toolOptMAVEn‐2.0. Subsequently, we carried out computational affinity maturation of the designed variable regions through amino acid substitutions for improved binding with the target epitope. Immunogenicity of designs was restricted by preferring designs that match sequences from a 9‐mer library of “human Abs” based on a human string content score. We generated 106 different antibody designs and reported in detail on the top five that trade‐off the greatest computational binding affinity for the RBD with human string content scores. We further describe computational evaluation of the top five designs produced byOptMAVEn‐2.0using a Rosetta‐based approach. We used RosettaSnugDockfor local docking of the designs to evaluate their potential to bind the spike RBD and performed “forward folding” withDeepAbto assess their potential to fold into the designed structures. Ultimately, our results identified one designed Ab variable region, P1.D1, as a particularly promising candidate for experimental testing. This effort puts forth a computational workflow for the de novo design and evaluation of Abs that can quickly be adapted to target spike epitopes of emerging SARS‐CoV‐2 variants or other antigenic targets.

     
    more » « less
  5. Abstract

    Computational membrane protein design is challenging due to the small number of high‐resolution structures available to elucidate the physical basis of membrane protein structure, multiple functionally important conformational states, and a limited number of high‐throughput biophysical assays to monitor function. However, structural determination of membrane proteins has made tremendous progress in the past years. Concurrently the field of soluble computational design has made impressive inroads. These developments allow us to tackle the formidable challenge of designing functional membrane proteins. Herein, Rosetta is benchmarked for membrane protein design. We evaluate strategies to cope with the often reduced quality of experimental membrane protein structures. Further, we test the usage of symmetry in design protocols, which is particularly important as many membrane proteins exist as homo‐oligomers. We compare a soluble scoring function with a scoring function optimized for membrane proteins, RosettaMembrane. Both scoring functions recovered around half of the native sequence when completely redesigning membrane proteins. However, RosettaMembrane recovered the most native‐like amino acid property composition. While leucine was overrepresented in the inner and outer‐hydrophobic regions of RosettaMembrane designs, it resulted in a native‐like surface hydrophobicity indicating that it is currently the best option for designing membrane proteins with Rosetta.

     
    more » « less