Intrinsically disordered proteins (IDPs) are highly prevalent and play important roles in biology and human diseases. It is now also recognized that many IDPs remain dynamic even in specific complexes and functional assemblies. Computer simulations are essential for deriving a molecular description of the disordered protein ensembles and dynamic interactions for a mechanistic understanding of IDPs in biology, diseases, and therapeutics. Here, we provide an in-depth review of recent advances in the multi-scale simulation of disordered protein states, with a particular emphasis on the development and application of advanced sampling techniques for studying IDPs. These techniques are critical for adequate sampling of the manifold functionally relevant conformational spaces of IDPs. Together with dramatically improved protein force fields, these advanced simulation approaches have achieved substantial success and demonstrated significant promise towards the quantitative and predictive modeling of IDPs and their dynamic interactions. We will also discuss important challenges remaining in the atomistic simulation of larger systems and how various coarse-grained approaches may help to bridge the remaining gaps in the accessible time- and length-scales of IDP simulations.
more »
« less
Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning
Abstract Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure–function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduced an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β1-42(Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42’s conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling.
more »
« less
- Award ID(s):
- 1932984
- PAR ID:
- 10528862
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.more » « less
-
de Groot, Bert L. (Ed.)Intrinsically disordered proteins (IDPs) are highly dynamic systems that play an important role in cell signaling processes and their misfunction often causes human disease. Proper understanding of IDP function not only requires the realistic characterization of their three-dimensional conformational ensembles at atomic-level resolution but also of the time scales of interconversion between their conformational substates. Large sets of experimental data are often used in combination with molecular modeling to restrain or bias models to improve agreement with experiment. It is shown here for the N-terminal transactivation domain of p53 (p53TAD) and Pup, which are two IDPs that fold upon binding to their targets, how the latest advancements in molecular dynamics (MD) simulations methodology produces native conformational ensembles by combining replica exchange with series of microsecond MD simulations. They closely reproduce experimental data at the global conformational ensemble level, in terms of the distribution properties of the radius of gyration tensor, and at the local level, in terms of NMR properties including 15 N spin relaxation, without the need for reweighting. Further inspection revealed that 10–20% of the individual MD trajectories display the formation of secondary structures not observed in the experimental NMR data. The IDP ensembles were analyzed by graph theory to identify dominant inter-residue contact clusters and characteristic amino-acid contact propensities. These findings indicate that modern MD force fields with residue-specific backbone potentials can produce highly realistic IDP ensembles sampling a hierarchy of nano- and picosecond time scales providing new insights into their biological function.more » « less
-
Abstract Deep learning approaches like AlphaFold 2 (AF2) have revolutionized structural biology by accurately predicting the ground state structures of proteins. Recently, clustering and subsampling techniques that manipulate multiple sequence alignment (MSA) inputs into AlphaFold to generate conformational ensembles of proteins have also been proposed. Although many of these techniques have been made open source, they often require integrating multiple packages and can be challenging for researchers who have a limited programming background to employ. This is especially true when researchers are interested in subsampling to produce predictions of protein conformational ensembles, which require multiple computational steps. This manuscript introduces FastConformation, a Python-based application that integrates MSA generation, structure prediction via AF2, and interactive analysis of protein conformations and their distributions, all in one place. FastConformation is accessible through a user-friendly GUI suitable for non-programmers, allowing users to iteratively refine subsampling parameters based on their analyses to achieve diverse conformational ensembles. Starting from an amino acid sequence, users can make protein conformation predictions and analyze results in just a few hours on their local machines, which is significantly faster than traditional molecular dynamics (MD) simulations. Uniquely, by leveraging the subsampling of MSAs, our tool enables the generation of alternative protein conformations. We demonstrate the utility of FastConformation on proteins including the Abl1 kinase, LAT1 transporter, and CCR5 receptor, showcasing its ability to predict and analyze the protein conformational ensembles and effects of mutations on a variety of proteins. This tool enables a wide range of high-throughput applications in protein biochemistry, drug discovery, and protein engineering.more » « less
-
Short-range interactions and long-range contacts drive the 3D folding of structured proteins. The proteins’ structure has a direct impact on their biological function. However, nearly 40% of the eukaryotes proteome is composed of intrinsically disordered proteins (IDPs) and protein regions that fluctuate between ensembles of numerous conformations. Therefore, to understand their biological function, it is critical to depict how the structural ensemble statistics correlate to the IDPs’ amino acid sequence. Here, using small-angle X-ray scattering and time-resolved Förster resonance energy transfer (trFRET), we study the intramolecular structural heterogeneity of the neurofilament low intrinsically disordered tail domain (NFLt). Using theoretical results of polymer physics, we find that the Flory scaling exponent of NFLt subsegments correlates linearly with their net charge, ranging from statistics of ideal to self-avoiding chains. Surprisingly, measuring the same segments in the context of the whole NFLt protein, we find that regardless of the peptide sequence, the segments’ structural statistics are more expanded than when measured independently. Our findings show that while polymer physics can, to some level, relate the IDP’s sequence to its ensemble conformations, long-range contacts between distant amino acids play a crucial role in determining intramolecular structures. This emphasizes the necessity of advanced polymer theories to fully describe IDPs ensembles with the hope that it will allow us to model their biological function.more » « less