skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CaXML : Chemistry‐informed machine learning explains mutual changes between protein conformations and calcium ions in calcium‐binding proteins using structural and topological features
Abstract Proteins' flexibility is a feature in communicating changes in cell signaling instigated by binding with secondary messengers, such as calcium ions, associated with the coordination of muscle contraction, neurotransmitter release, and gene expression. When binding with the disordered parts of a protein, calcium ions must balance their charge states with the shape of calcium‐binding proteins and their versatile pool of partners depending on the circumstances they transmit. Accurately determining the ionic charges of those ions is essential for understanding their role in such processes. However, it is unclear whether the limited experimental data available can be effectively used to train models to accurately predict the charges of calcium‐binding protein variants. Here, we developed a chemistry‐informed, machine‐learning algorithm that implements a game theoretic approach to explain the output of a machine‐learning model without the prerequisite of an excessively large database for high‐performance prediction of atomic charges. We used the ab initio electronic structure data representing calcium ions and the structures of the disordered segments of calcium‐binding peptides with surrounding water molecules to train several explainable models. Network theory was used to extract the topological features of atomic interactions in the structurally complex data dictated by the coordination chemistry of a calcium ion, a potent indicator of its charge state in protein. Our design created a computational tool of CaXML, which provided a framework of explainable machine learning model to annotate ionic charges of calcium ions in calcium‐binding proteins in response to the chemical changes in an environment. Our framework will provide new insights into protein design for engineering functionality based on the limited size of scientific data in a genome space.  more » « less
Award ID(s):
2221824
PAR ID:
10568059
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Protein Science
Volume:
34
Issue:
2
ISSN:
0961-8368
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Calmodulin (CaM) is a calcium-binding protein that transduces signals to downstream proteins through target binding upon calcium binding in a time-dependent manner. Understanding the target binding process that tunes CaM’s affinity for the calcium ions (Ca 2+ ), or vice versa, may provide insight into how Ca 2+ -CaM selects its target binding proteins. However, modeling of Ca 2+ -CaM in molecular simulations is challenging because of the gross structural changes in its central linker regions while the two lobes are relatively rigid due to tight binding of the Ca 2+ to the calcium-binding loops where the loop forms a pentagonal bipyramidal coordination geometry with Ca 2+ . This feature that underlies the reciprocal relation between Ca 2+ binding and target binding of CaM, however, has yet to be considered in the structural modeling. Here, we presented a coarse-grained model based on the Associative memory, Water mediated, Structure, and Energy Model (AWSEM) protein force field, to investigate the salient features of CaM. Particularly, we optimized the force field of CaM and that of Ca 2+ ions by using its coordination chemistry in the calcium-binding loops to match with experimental observations. We presented a “community model” of CaM that is capable of sampling various conformations of CaM, incorporating various calcium-binding states, and carrying the memory of binding with various targets, which sets the foundation of the reciprocal relation of target binding and Ca 2+ binding in future studies. 
    more » « less
  2. One of the mechanisms by which toxic metal ions interfere with cellular functions is ionic mimicry, where they bind to protein sites in lieu of native metals Ca2+ and Zn2+. The influence of crowded intracellular environments on these interactions is not well understood. Here, we demonstrate the application of in-cell and lysate NMR spectroscopy to obtain atomic-level information on how a potent environmental toxin cadmium interacts with its protein targets. The experiments, conducted in intact E. coli cells and their lysates, revealed that Cd2+ can profoundly affect the quinary interactions of its protein partners, and can replace Zn2+ in both labile and non-labile protein structural sites without significant perturbation of the membrane binding function. Surprisingly, in crowded molecular environments Cd2+ can effectively target not only all-sulfur and mixed sulfur/nitrogen but also all-oxygen coordination sites. The sulfur-rich coordination environments show significant promise for bioremedial applications, as demonstrated by the ability of the designed protein scaffold α3DIV to sequester intracellular cadmium. Our data suggests that in-cell NMR spectroscopy is a powerful tool for probing interactions of toxic metal ions with their potential protein targets, and for the assessment of potency of sequestering agents. 
    more » « less
  3. Abstract Conformations and dynamics of an intrinsically disordered protein (IDP) depend on its composition of charged and uncharged amino acids, and their specific placement in the protein sequence. In general, the charge (positive or negative) on an amino acid residue in the protein is not a fixed quantity. Each of the ionizable groups can exist in an equilibrated distribution of fully ionized state (monopole) and an ion-pair (dipole) state formed between the ionizing group and its counterion from the background electrolyte solution. The dipole formation (counterion condensation) depends on the protein conformation, which in turn depends on the distribution of charges and dipoles on the molecule. Consequently, effective charges of ionizable groups in the IDP backbone may differ from their chemical charges in isolation—a phenomenon termed charge-regulation. Accounting for the inevitable dipolar interactions, that have so far been ignored, and using a self-consistent procedure, we present a theory of charge-regulation as a function of sequence, temperature, and ionic strength. The theory quantitatively agrees with both charge reduction and salt-dependent conformation data of Prothymosin-alpha and makes several testable predictions. We predict charged groups are less ionized in sequences where opposite charges are well mixed compared to sequences where they are strongly segregated. Emergence of dipolar interactions from charge-regulation allows spontaneous coexistence of two phases having different conformations and charge states, sensitively depending on the charge patterning. These findings highlight sequence dependent charge-regulation and its potential exploitation by biological regulators such as phosphorylation and mutations in controlling protein conformation and function. 
    more » « less
  4. S100A12 or Calgranulin C is a homodimeric antimicrobial protein of the S100 family of EF-hand calcium-modulated proteins. S100A12 is involved in many diseases like inflammation, tumor invasion, cancer and neurological disorders like Alzheimer’s disease. The binding of transition metal ions to the protein is important as the sequestering of the metal ion induces conformational changes in the protein, inhibiting the growth of various pathogenic microorganisms. In this work, we probe the Cu(II) binding properties of Calgranulin C. We demonstrate that the two Cu(II) binding sites in Calgranulin C show different coordination environments in solution. Electron spin resonance (ESR) spectra of Cu(II)-bound protein clearly show two distinct components at higher Cu(II):protein ratios, which is indicative of the two different binding environments for the Cu(II) ions. The g|| and A|| values are also different for the two components, indicating that the number of directly coordinated nitrogens in each site differs. Furthermore, we perform Continuous Wave (CW)-titrations to obtain the binding affinity of the Ca(II)-loaded protein to Cu2+ ions. We observe a positive cooperativity in binding of the two Cu(II) ions. In order to further probe the Cu2+ coordination, we also perform Electron Spin Echo Envelope Modulation (ESEEM) experiment. We perform ESEEM at two different fields where one Cu(II) binding site dominates over the other. At both sites we see distinct signatures of Cu(II)-histidine coordination. However, we clearly see that the ESEEM spectra corresponding to the two Cu2+ binding sites are significantly different. There is clear change in the intensity of the double quantum (DQ) peak with respect to the nuclear quadrupole interaction (NQI) peak at the two different fields. Furthermore, ESEEM along with Hyperfine Sublevel Correlation (HYSCORE) show that only one of the two Cu(II) binding sites has backbone coordination, confirming our previous observation. Finally, we perform Double Electron Electron Resonance (DEER) spectroscopy to probe if the difference in binding environment is due to the Cu(II) binding to different sites in the protein. We obtain a distance distribution with a sharp peak at ~ 3 nm and a broad peak at ~ 4 nm. The shorter distance agrees with the Cu(II)-Cu(II) distance expected for a dimer from the crystal structure. The longer distance is consistent with the Cu(II)-Cu(II) distance when oligomerization occurs. 
    more » « less
  5. Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure–function maps could guide design of novel proteins with desired function. 
    more » « less