skip to main content


Title: AND/OR Branch-and-Bound for Computational Protein Design Optimizing K*
The importance of designing proteins, such as high affinity antibodies, has become ever more apparent. Computational Protein Design can cast such design problems as optimization tasks with the objective of maximizing K*, an approximation of binding affinity. Here we lay out a graphical model framework for K* optimization that enables use of compact AND/OR search algorithms. We designed an AND/OR branch-and-bound algorithm, AOBB-K*, for optimizing K* that is guided by a new K* heuristic and can incorporate specialized performance improvements with theoretical guarantees. As AOBB-K* is inspired by algorithms from the well studied task of Marginal MAP, this work provides a foundation for harnessing advancements in state-of-the-art mixed inference schemes and adapting them to protein design.  more » « less
Award ID(s):
2008516
NSF-PAR ID:
10376092
Author(s) / Creator(s):
; ; ;
Editor(s):
Cussens, James; Zhang, Kun
Date Published:
Journal Name:
Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence, PMLR
Volume:
180
Page Range / eLocation ID:
1602-1612
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many applications in protein engineering require optimizing multiple protein properties simultaneously, such as binding one target but not others or binding a target while maintaining stability. Such multistate design problems require navigating a high-dimensional space to find proteins with desired characteristics. A model that relates protein sequence to functional attributes can guide design to solutions that would be hard to discover via screening. In this work, we measured thousands of protein–peptide binding affinities with the high-throughput interaction assay amped SORTCERY and used the data to parameterize a model of the alpha-helical peptide-binding landscape for three members of the Bcl-2 family of proteins: Bcl-xL, Mcl-1, and Bfl-1. We applied optimization protocols to explore extremes in this landscape to discover peptides with desired interaction profiles. Computational design generated 36 peptides, all of which bound with high affinity and specificity to just one of Bcl-xL, Mcl-1, or Bfl-1, as intended. We designed additional peptides that bound selectively to two out of three of these proteins. The designed peptides were dissimilar to known Bcl-2–binding peptides, and high-resolution crystal structures confirmed that they engaged their targets as expected. Excellent results on this challenging problem demonstrate the power of a landscape modeling approach, and the designed peptides have potential uses as diagnostic tools or cancer therapeutics.

     
    more » « less
  2. Abstract Motivation

    Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design.

    Results

    We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity.

    Availability and implementation

    https://shen-lab.github.io/software/iCFN

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Three protein targets from SARS-CoV-2, the viral pathogen that causes COVID-19, are studied: the main protease, the 2′-O-RNA methyltransferase, and the nucleocapsid (N) protein. For the main protease, the nucleophilicity of the catalytic cysteine C145 is enabled by coupling to three histidine residues, H163 and H164 and catalytic dyad partner H41. These electrostatic couplings enable significant population of the deprotonated state of C145. For the RNA methyltransferase, the catalytic lysine K6968 that serves as a Brønsted base has significant population of its deprotonated state via strong coupling with K6844 and Y6845. For the main protease, Partial Order Optimum Likelihood (POOL) predicts two clusters of biochemically active residues; one includes the catalytic H41 and C145 and neighboring residues. The other surrounds a second pocket adjacent to the catalytic site and includes S1 residues F140, L141, H163, E166, and H172 and also S2 residue D187. This secondary recognition site could serve as an alternative target for the design of molecular probes. From in silico screening of library compounds, ligands with predicted affinity for the secondary site are reported. For the NSP16-NSP10 complex that comprises the RNA methyltransferase, three different sites are predicted. One is the catalytic core at the conserved K-D-K-E motif that includes catalytic residues D6928, K6968, and E7001 plus K6844. The second site surrounds the catalytic core and consists of Y6845, C6849, I6866, H6867, F6868, V6894, D6895, D6897, I6926, S6927, Y6930, and K6935. The third is located at the heterodimer interface. Ligands predicted to have high affinity for the first or second sites are reported. Three sites are also predicted for the nucleocapsid protein. This work uncovers key interactions that contribute to the function of the three viral proteins and also suggests alternative sites for ligand design. 
    more » « less
  4. Bienstock, Rachelle J. (Ed.)
    Three protein targets from SARS-CoV-2, the viral pathogen that causes COVID-19, are studied: the main protease, the 2’-O-RNA methyltransferase, and the nucleocapsid (N) protein. For the main protease, the nucleophilicity of the catalytic cysteine C145 is enabled by coupling to three histidine residues, H163 and H164 and catalytic dyad partner H41. These electrostatic couplings enable significant population of the deprotonated state of C145. For the RNA methyltransferase, the catalytic lysine K6968 that serves as a Brønsted base has significant population of its deprotonated state via strong coupling with K6844 and Y6845. For the main protease, Partial Order Optimum Likelihood (POOL) predicts two clusters of biochemically active residues; one includes the catalytic H41 and C145 and neighboring residues. The other surrounds a second pocket adjacent to the catalytic site and includes S1 residues F140, L141, H163, E166, and H172 and also S2 residue D187. This secondary recognition site could serve as an alternative target for the design of molecular probes. From in silico screening of library compounds, ligands with predicted affinity for the secondary site are reported. For the NSP16-NSP10 complex that comprises the RNA methyltransferase, three different sites are predicted. One is the catalytic core at the conserved K-D-K-E motif that includes catalytic residues D6928, K6968, and E7001 plus K6844. The second site surrounds the catalytic core and consists of Y6845, C6849, I6866, H6867, F6868, V6894, D6895, D6897, I6926, S6927, Y6930, and K6935. The third is located at the heterodimer interface. Ligands predicted to have high affinity for the first or second sites are reported. Three sites are also predicted for the nucleocapsid protein. This work uncovers key interactions that contribute to the function of the three viral proteins and also suggests alternative sites for ligand design. 
    more » « less
  5. null (Ed.)
    We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows us to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators. Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization. 
    more » « less