Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo’s 37.40% and 23.08% and GLINTER’s 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER’s 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.
more »
« less
Predicting protein flexibility with AlphaFold
Abstract AlphaFold2 has revolutionized protein structure prediction from amino‐acid sequence. In addition to protein structures, high‐resolution dynamics information about various protein regions is important for understanding protein function. Although AlphaFold2 has neither been designed nor trained to predict protein dynamics, it is shown here how the information returned by AlphaFold2 can be used to predict dynamic protein regions at the individual residue level. The approach, which is termed cdsAF2, uses the 3D protein structure returned by AlphaFold2 to predict backbone NMR NHS2order parameters using a local contact model that takes into account the contacts made by each peptide plane along the backbone with its environment. By combining for each residue AlphaFold2's pLDDT confidence score for the structure prediction accuracy with the predictedS2value using the local contact model, an estimator is obtained that semi‐quantitatively captures many of the dynamics features observed in experimental backbone NMR NHS2order parameter profiles. The method is demonstrated for a set nine proteins of different sizes and variable amounts of dynamics and disorder.
more »
« less
- Award ID(s):
- 2103637
- PAR ID:
- 10419740
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Proteins: Structure, Function, and Bioinformatics
- Volume:
- 91
- Issue:
- 6
- ISSN:
- 0887-3585
- Page Range / eLocation ID:
- p. 847-855
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Approaches to in silico prediction of protein structures have been revolutionized by AlphaFold2, while those to predict interfaces between proteins are relatively underdeveloped, owing to the overly complicated yet relatively limited data of protein–protein complexes. In short, proteins are 1D sequences of amino acids folding into 3D structures, and interact to form assemblies to function. We believe that such intricate scenarios are better modeled with additional indicative information that reflects their multi-modality nature and multi-scale functionality. To improve binary prediction of inter-protein residue-residue contacts, we propose to augment input features with multi-modal representations and to synergize the objective with auxiliary predictive tasks. (i) We first progressively add three protein modalities into models: protein sequences, sequences with evolutionary information, and structure-aware intra-protein residue contact maps. We observe that utilizing all data modalities delivers the best prediction precision. Analysis reveals that evolutionary and structural information benefit predictions on the difficult and rigid protein complexes, respectively, assessed by the resemblance to native residue contacts in bound complex structures. (ii) We next introduce three auxiliary tasks via self-supervised pre-training (binary prediction of protein-protein interaction (PPI)) and multi-task learning (prediction of inter-protein residue–residue distances and angles). Although PPI prediction is reported to benefit from predicting intercontacts (as causal interpretations), it is not found vice versa in our study. Similarly, the finer-grained distance and angle predictions did not appear to uniformly improve contact prediction either. This again reflects the high complexity of protein–protein complex data, for which designing and incorporating synergistic auxiliary tasks remains challenging.more » « less
-
Abstract Structures at serine‐proline sites in proteins were analyzed using a combination of peptide synthesis with structural methods and bioinformatics analysis of the PDB. Dipeptides were synthesized with the proline derivative (2S,4S)‐(4‐iodophenyl)hydroxyproline [hyp(4‐I‐Ph)]. The crystal structure of Boc‐Ser‐hyp(4‐I‐Ph)‐OMe had two molecules in the unit cell. One molecule exhibitedcis‐proline and a type VIa2 β‐turn (BcisD). Thecis‐proline conformation was stabilized by a C–H/O interaction between Pro C–Hαand the Ser side‐chain oxygen. NMR data were consistent with stabilization ofcis‐proline by a C–H/O interaction in solution. The other crystallographically observed molecule hadtrans‐Pro and both residues in the PPII conformation. Two conformations were observed in the crystal structure of Ac‐Ser‐hyp(4‐I‐Ph)‐OMe, with Ser adopting PPII in one and the β conformation in the other, each with Pro in the δ conformation andtrans‐Pro. Structures at Ser‐Pro sequences were further examined via bioinformatics analysis of the PDB and via DFT calculations. Ser‐Pro versus Ala–Pro sequences were compared to identify bases for Ser stabilization of local structures. C–H/O interactions between the Ser side‐chain Oγand Pro C–Hαwere observed in 45% of structures with Ser‐cis‐Pro in the PDB, with nearly all Ser‐cis‐Pro structures adopting a type VI β‐turn. 53% of Ser‐trans‐Pro sequences exhibited main‐chain COi•••HNi+3or COi•••HNi+4hydrogen bonds, with Ser as theiresidue and Pro as thei + 1 residue. These structures were overwhelmingly either type I β‐turns or N‐terminal capping motifs on α‐helices or 310‐helices. These results indicate that Ser‐Pro sequences are particularly potent in favoring these structures. In each, Ser is in either the PPII or β conformation, with the Ser Oγcapable of engaging in a hydrogen bond with the amide N–H of thei + 2 (type I β‐turn or 310‐helix; Serχ1t) ori + 3 (α‐helix; Serχ1g+) residue. Non‐prolinecisamide bonds can also be stabilized by C–H/O interactions.more » « less
-
Chromatin, a dynamic protein-DNA complex that regulates eukaryotic genome accessibility and essential functions, is composed of nucleosomes connected by linker DNA with each nucleosome consisting of DNA wrapped around an octamer of histones H2A, H2B, H3 and H4. Magic angle spinning solid-state nuclear magnetic resonance (NMR) spectroscopy can yield unique insights into histone structure and dynamics in condensed nucleosomes and nucleosome arrays representative of chromatin at physiological concentrations. Recently we used J-coupling-based solid-state NMR methods to investigate with residue-specific resolution the conformational dynamics of histone H3 N-terminal tails in 16-mer nucleosome arrays containing 15, 30 or 60 bp DNA linkers. Here, we probe the H3 core domain in the 16-mer arrays as a function of DNA linker lengthviadipolar coupling-based1H-detected solid-state NMR techniques. Specifically, we established nearly complete assignments of backbone chemical shifts for H3 core residues in arrays with 15–60 bp DNA linkers reconstituted with2H,13C,15N-labeled H3. Overall, these chemical shifts were similar irrespective of the DNA linker length indicating no major changes in H3 core conformation. Notably, however, multiple residues at the H3-nucleosomal DNA interface in arrays with 15 bp DNA linkers exhibited relatively pronounced differences in chemical shifts and line broadening compared to arrays with 30 and 60 bp linkers. These findings are consistent with increased heterogeneity in nucleosome packing and structural strain within arrays containing short DNA linkers that likely leads to side-chains of these interfacial residues experiencing alternate conformations or shifts in their rotamer populations relative to arrays with the longer DNA linkers.more » « less
-
Abstract We report the recombinant preparation fromEscherichia colicells of samples of two closely related, small, secreted cysteine‐rich plant peptides: rapid alkalinization factor 1 (RALF1) and rapid alkalinization factor 8 (RALF8). Purified samples of the native sequence of RALF8 exhibited well‐resolved nuclear magnetic resonance (NMR) spectra and also biological activity through interaction with a plant receptor kinase, cytoplasmic calcium mobilization, andin vivoroot growth suppression. By contrast, RALF1 could only be isolated from inclusion bodies as a construct containing an N‐terminal His‐tag; its poorly resolved NMR spectrum was indicative of aggregation. We prepared samples of the RALF8 peptide labeled with15N and13C for NMR analysis and obtained near complete1H,13C, and15N NMR assignments; determined the disulfide pairing of its four cysteine residues; and examined its solution structure. RALF8 is mostly disordered except for the two loops spanned by each of its two disulfide bridges.more » « less
An official website of the United States government
