Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein−peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence−energetics−function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function—site-specific cleavages of the viral polyprotein—is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ∼30,000 substrates (∼1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide variety of proteases, including human-origin enzymes. Our results also suggest a possible role for residue-level energetics in shaping plateau-like functional landscapes predicted from viral quasispecies theory.
more »
« less
Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease–substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
more »
« less
- Award ID(s):
- 2226816
- PAR ID:
- 10490037
- Publisher / Repository:
- National Academy of Sciences
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 120
- Issue:
- 39
- ISSN:
- 0027-8424
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Rhomboid proteases are ubiquitous intramembrane serine proteases that can cleave transmembrane substrates within lipid bilayers. They exhibit many and diverse functions, such as but not limited to, growth factor signaling, immune and inflammatory response, protein quality control, and parasitic invasion. Human rhomboid protease RHBDL4 has been demonstrated to play a critical role in removing misfolded proteins from the endoplasmic reticulum and is implicated in severe diseases such as various cancers and Alzheimer's disease. Therefore, RHBDL4 is expected to constitute an important therapeutic target for such devastating diseases. Despite its critical role in many biological processes, the enzymatic properties of RHBDL4 remain largely unknown. To enable a comprehensive characterization of RHBDL4's kinetics, catalytic parameters, substrate specificity, and binding modality, we expressed and purified recombinant RHBDL4 and employed it in a Förster resonance energy transfer-based cleavage assay. Until now, kinetic studies have been limited mostly to bacterial rhomboid proteases. Our in vitro platform offers a new method for studying RHBDL4's enzymatic function and substrate preferences. Furthermore, we developed and tested potential inhibitors using our assay and successfully identified peptidyl α-ketoamide inhibitors of RHBDL4 that are highly effective against recombinant RHBDL4. We utilize ensemble docking and molecular dynamics simulations to explore the binding modality of substrate-derived peptides bound to RHBDL4. Our analysis focused on key interactions and dynamic movements within RHBDL4's active site that contributed to binding stability, offering valuable insights for optimizing the nonprime side of RHBDL4 ketoamide inhibitors. In summary, our study offers fundamental insights into RHBDL4's catalytic activities and substrate preferences, laying the foundation for downstream applications such as drug inhibitor screenings and structure-function studies, which will enable the identification of lead drug compounds for RHBDL4.more » « less
-
CRISPR-Cas9 (clustered regularly interspaced short palindromic repeat and associated Cas9 protein) is a molecular tool with transformative genome editing capabilities. At the molecular level, an intricate allosteric signaling is critical for DNA cleavage, but its role in the specificity enhancement of the Cas9 endonuclease is poorly understood. Here, multi-microsecond molecular dynamics is combined with solution NMR and graph theory-derived models to probe the allosteric role of key specificity-enhancing mutations. We show that mutations responsible for increasing the specificity of Cas9 alter the allosteric structure of the catalytic HNH domain, impacting the signal transmission from the DNA recognition region to the catalytic sites for cleavage. Specifically, the K855A mutation strongly disrupts the allosteric connectivity of the HNH domain, exerting the highest perturbation on the signaling transfer, while K810A and K848A result in more moderate effects on the allosteric communication. This differential perturbation of the allosteric signal correlates to the order of specificity enhancement (K855A > K848A ~ K810A) observed in biochemical studies, with the mutation achieving the highest specificity most strongly perturbing the signaling transfer. These findings suggest that alterations of the allosteric communication from DNA recognition to cleavage are critical to increasing the specificity of Cas9 and that allosteric hotspots can be targeted through mutational studies for improving the system’s function.more » « less
-
Desmoplakin (DSP) is a large (~260 kDa) protein found in the desmosome, the subcellular structure that links the intermediate filament network of one cell to its neighbor. A mutation “hot-spot” within the NH2-terminal of the DSP protein (residues 299–515) is associated with arrhythmogenic cardiomyopathy. In a subset of DSP variants, disease is linked to calpain hypersensitivity. Previous studies show that calpain hypersensitivity can be corrected in vitro through the addition of a bulky residue neighboring the cleavage site, suggesting that physically blocking calpain accessibility is a viable strategy to restore DSP levels. Here, we aim to find drug-like molecules that also block calpain-dependent degradation of DSP. To do this, we screened ~2500 small molecules to identify compounds that specifically rescue DSP protein levels in the presence of proteases. We find that several molecules, including sodium dodecyl sulfate, palmitoylethanolamide, GW0742, salirasib, eprosarten mesylate, and GSK1838705A prevent wildtype and disease-variant-carrying DSP protein degradation in the presence of both trypsin and calpain without altering protease function. Computational screenings did not predict which molecules would protect DSP, likely due to a lack of specific DSP–drug interactions. Molecular dynamic simulations of DSP–drug complexes suggest that some long hydrophobic molecules can bind in a shallow hydrophobic groove that runs alongside the protease cleavage site. Identification of these compounds lays the groundwork for pharmacological treatment for individuals harboring these hypersensitive DSP variants.more » « less
-
Reliable prediction of T cell specificity against antigenic signatures is a formidable task, complicated by the immense diversity of T cell receptor and antigen sequence space and the resulting limited availability of training sets for inferential models. Recent modeling efforts have demonstrated the advantage of incorporating structural information to overcome the need for extensive training sequence data, yet disentangling the heterogeneous TCR-antigen interface to accurately predict MHC-allele-restricted TCR-peptide interactions has remained challenging. Here, we present RACER-m, a coarse-grained structural model leveraging key biophysical information from the diversity of publicly available TCR-antigen crystal structures. Explicit inclusion of structural content substantially reduces the required number of training examples and maintains reliable predictions of TCR-recognition specificity and sensitivity across diverse biological contexts. Our model capably identifies biophysically meaningful point-mutant peptides that affect binding affinity, distinguishing its ability in predicting TCR specificity of point-mutants from alternative sequence-based methods. Its application is broadly applicable to studies involving both closely related and structurally diverse TCR-peptide pairs.more » « less
An official website of the United States government

