skip to main content


Title: The quaternary question: Determining allostery in spastin through dynamics classification learning and bioinformatics
The nanomachine from the ATPases associated with various cellular activities superfamily, called spastin, severs microtubules during cellular processes. To characterize the functionally important allostery in spastin, we employed methods from evolutionary information, to graph-based networks, to machine learning applied to atomistic molecular dynamics simulations of spastin in its monomeric and the functional hexameric forms, in the presence or absence of ligands. Feature selection, using machine learning approaches, for transitions between spastin states recognizes all the regions that have been proposed as allosteric or functional in the literature. The analysis of the composition of the Markov State Model macrostates in the spastin monomer, and the analysis of the direction of change in the top machine learning features for the transitions, indicate that the monomer favors the binding of ATP, which primes the regions involved in the formation of the inter-protomer interfaces for binding to other protomer(s). Allosteric path analysis of graph networks, built based on the cross-correlations between residues in simulations, shows that perturbations to a hub specific for the pre-hydrolysis hexamer propagate throughout the structure by passing through two obligatory regions: the ATP binding pocket, and pore loop 3, which connects the substrate binding site to the ATP binding site. Our findings support a model where the changes in the terminal protomers due to the binding of ligands play an active role in the force generation in spastin. The secondary structures in spastin, which are found to be highly degenerative within the network paths, are also critical for feature transitions of the classification models, which can guide the design of allosteric effectors to enhance or block allosteric signaling.  more » « less
Award ID(s):
1817948
NSF-PAR ID:
10428202
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
The Journal of Chemical Physics
Volume:
158
Issue:
12
ISSN:
0021-9606
Page Range / eLocation ID:
125102
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Proteolysis is essential for the control of metabolic pathways and the cell cycle. Bacterial caseinolytic proteases (Clp) use peptidase components, such as ClpP, to degrade defective substrate proteins and to regulate cellular levels of stress-response proteins. To ensure selective degradation, access to the proteolytic chamber of the double–ring ClpP tetradecamer is controlled by a critical gating mechanism of the two axial pores. The binding of conserved loops of the Clp ATPase component of the protease or small molecules, such as acyldepsipeptide (ADEP), at peripheral ClpP ring sites, triggers axial pore opening through dramatic conformational transitions of flexible N-terminal loops between disordered conformations in the “closed” pore state and ordered hairpins in the “open” pore state. In this study, we probe the allosteric communication underlying these conformational changes by comparing residue–residue couplings in molecular dynamics simulations of each configuration. Both principal component and normal mode analyses highlight large-scale conformational changes in the N-terminal loop regions and smaller amplitude motions of the peptidase core. Community network analysis reveals a switch between intra- and inter-protomer coupling in the open–closed pore transition. Allosteric pathways that connect the ADEP binding sites to N-terminal loops are rewired in this transition, with shorter network paths in the open pore configuration supporting stronger intra- and inter-ring coupling. Structural perturbations, either through the removal of ADEP molecules or point mutations, alter the allosteric network to weaken the coupling. 
    more » « less
  2. Complex mechanisms regulate the cellular distribution of cholesterol, a critical component of eukaryote membranes involved in regulation of membrane protein functions directly and through the physiochemical properties of membranes. StarD4, a member of the steroidogenic acute regulator-related lipid-transfer (StART) domain (StARD)-containing protein family, is a highly efficient sterol-specific transfer protein involved in cholesterol homeostasis. Its mechanism of cargo loading and release remains unknown despite recent insights into the key role of phosphatidylinositol phosphates in modulating its interactions with target membranes. We have used large-scale atomistic Molecular dynamics (MD) simulations to study how the dynamics of cholesterol bound to the StarD4 protein can affect interaction with target membranes, and cargo delivery. We identify the two major cholesterol (CHL) binding modes in the hydrophobic pocket of StarD4, one near S136&S147 (the Ser-mode), and another closer to the putative release gate located near W171, R92&Y117 (the Trp-mode). We show that conformational changes of StarD4 associated directly with the transition between these binding modes facilitate the opening of the gate. To understand the dynamics of this connection we apply a machine-learning algorithm for the detection of rare events in MD trajectories (RED), which reveals the structural motifs involved in the opening of a front gate and a back corridor in the StarD4 structure occurring together with the spontaneous transition of CHL from the Ser-mode of binding to the Trp-mode. Further analysis of MD trajectory data with the information-theory based NbIT method reveals the allosteric network connecting the CHL binding site to the functionally important structural components of the gate and corridor. Mutations of residues in the allosteric network are shown to affect the performance of the allosteric connection. These findings outline an allosteric mechanism which prepares the CHL-bound StarD4 to release and deliver the cargo when it is bound to the target membrane.

     
    more » « less
  3. Abstract

    Like many Gram‐negative pathogens,Shigellarely on a type three secretion system (T3SS) for injection of effector proteins directly into eukaryotic host cells to initiate and sustain infection. Protein secretion through the needle‐like type three secretion apparatus (T3SA) requires ATP hydrolysis by the T3SS ATPase Spa47, making it a likely target for in vivo regulation of T3SS activity and an attractive target for small molecule therapeutics against shigellosis. Here, we developed a model of an activated Spa47 homo‐hexamer, identifying two distinct regions at each protomer interface that we hypothesized to provide intermolecular interactions supporting Spa47 oligomerization and enzymatic activation. Mutational analysis and a series of high‐resolution crystal structures confirm the importance of these residues, as many of the engineered mutants are unable to form oligomers and efficiently hydrolyze ATP in vitro. Furthermore, in vivo evaluation ofShigellavirulence phenotype uncovered a strong correlation between T3SS effector protein secretion, host cell membrane disruption, and cellular invasion by the tested mutant strains, suggesting that perturbation of the identified interfacial residues/interactions influences Spa47 activity through preventing oligomer formation, which in turn regulatesShigellavirulence. The most impactful mutations are observed within the conserved Site 2 interface where the native residues support oligomerization and likely contribute to a complex hydrogen bonding network that organizes the active site and supports catalysis. The critical reliance on these conserved residues suggests that aspects of T3SS regulation may also be conserved, providing promise for the development of a cross‐species therapeutic that broadly targets T3SS ATPase oligomerization and activation.

     
    more » « less
  4. Abstract

    Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR‐Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well‐defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR‐Pred is available as a free downloadable package athttps://github.com/sambitmishra0628/AR-PRED_source.

     
    more » « less
  5. Skolnick, Jeffrey (Ed.)
    Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space. 
    more » « less