Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation (DCT) to matrices of residue embeddings. Such protein-level embeddings have been applied to enable fast searches of similar proteins; however, limitations have been found; for example, PROST is good at detecting global homologs but not local homologs, and knnProtT5 excels for proteins with single domains but not multidomain proteins. Here, we propose a novel approach that first segments proteins into domains (or subdomains) and then applies the DCT to the vectorized embeddings of residues in each domain to infer domain-level contextual vectors. Our approach, called DCTdomain, uses predicted contact maps from ESM-2 for domain segmentation, which is formulated as adomain segmentationproblem and can be solved using arecursive cutalgorithm (RecCut in short) in quadratic time to the protein length; for comparison, an existing approach for domain segmentation uses a cubic-time algorithm. We show such domain-level contextual vectors (termed asDCT fingerprints) enable fast and accurate detection of similarity between proteins that share global similarities but with undefined extended regions between shared domains, and those that only share local similarities. In addition, tests on a database search benchmark show that the DCTdomain is able to detect distant homologs by leveraging the structural information in the contextual embeddings.
more »
« less
Accurate protein structure prediction with hydroxyl radical protein footprinting data
Abstract Hydroxyl radical protein footprinting (HRPF) in combination with mass spectrometry reveals the relative solvent exposure of labeled residues within a protein, thereby providing insight into protein tertiary structure. HRPF labels nineteen residues with varying degrees of reliability and reactivity. Here, we are presenting a dynamics-driven HRPF-guided algorithm for protein structure prediction. In a benchmark test of our algorithm, usage of the dynamics data in a score term resulted in notable improvement of the root-mean-square deviations of the lowest-scoring ab initio models and improved the funnel-like metric Pnearfor all benchmark proteins. We identified models with accurate atomic detail for three of the four benchmark proteins. This work suggests that HRPF data along with side chain dynamics sampled by a Rosetta mover ensemble can be used to accurately predict protein structure.
more »
« less
- Award ID(s):
- 1750666
- PAR ID:
- 10209560
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Regulator of G protein signaling (RGS) proteins play a pivotal role in regulation of G protein‐coupled receptor (GPCR) signaling and are therefore becoming an increasingly important therapeutic target. Recently discovered thiadiazolidinone (TDZD) compounds that target cysteine residues have shown different levels of specificities and potencies for the RGS4 protein, thereby suggesting intrinsic differences in dynamics of this protein upon binding of these compounds. In this work, we investigated using atomistic molecular dynamics (MD) simulations the effect of binding of several small‐molecule inhibitors on perturbations and dynamical motions in RGS4. Specifically, we studied two conformational models of RGS4 in which a buried cysteine residue is solvent‐exposed due to side‐chain motions or due to flexibility in neighboring helices. We found that TDZD compounds with aromatic functional groups perturb the RGS4 structure more than compounds with aliphatic functional groups. Moreover, small‐molecules with aromatic functional groups but lacking sulfur atoms only transiently reside within the protein and spontaneously dissociate to the solvent. We further measured inhibitory effects of TDZD compounds using a protein–protein interaction assay on a single‐cysteine RGS4 protein showing trends in potencies of compounds consistent with our simulation studies. Thermodynamic analyses of RGS4 conformations in the apo‐state and on binding to TDZD compounds revealed links between both conformational models of RGS4. The exposure of cysteine side‐chains appears to facilitate initial binding of TDZD compounds followed by migration of the compound into a bundle of four helices, thereby causing allosteric perturbations in the RGS/Gα protein–protein interface.more » « less
-
Abstract H2S is a gaseous signaling molecule that modifies cysteine residues in proteins to form persulfides (P‐SSH). One family of proteins modified by H2S are zinc finger (ZF) proteins, which contain multiple zinc‐coordinating cysteine residues. Herein, we report the reactivity of H2S with a ZF protein called tristetraprolin (TTP). Rapid persulfidation leading to complete thiol oxidation of TTP mediated by H2S was observed by low‐temperature ESI‐MS and fluorescence spectroscopy. Persulfidation of TTP required O2 , which reacts with H2S to form superoxide, as detected by ESI‐MS, a hydroethidine fluorescence assay, and EPR spin trapping. H2S was observed to inhibit TTP function (binding to TNFα mRNA) by an in vitro fluorescence anisotropy assay and to modulate TNFα in vivo. H2S was unreactive towards TTP when the protein was bound to RNA, thus suggesting a protective effect of RNA.more » « less
-
Abstract Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.more » « less
-
Abstract In order to become bioactive, proteins must be translated and protected from aggregation during biosynthesis. The ribosome and molecular chaperones play a key role in this process. Ribosome-bound nascent chains (RNCs) of intrinsically disordered proteins and RNCs bearing a signal/arrest sequence are known to interact with ribosomal proteins. However, in the case of RNCs bearing foldable protein sequences, not much information is available on these interactions. Here, via a combination of chemical crosslinking and time-resolved fluorescence-anisotropy, we find that nascent chains of the foldable globin apoHmp1–140interact with ribosomal protein L23 and have a freely-tumbling non-interacting N-terminal compact region comprising 63–94 residues. Longer RNCs (apoHmp1–189) also interact with an additional yet unidentified ribosomal protein, as well as with chaperones. Surprisingly, the apparent strength of RNC/r-protein interactions does not depend on nascent-chain sequence. Overall, foldable nascent chains establish and expand interactions with selected ribosomal proteins and chaperones, as they get longer. These data are significant because they reveal the interplay between independent conformational sampling and nascent-protein interactions with the ribosomal surface.more » « less
An official website of the United States government
