skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Unraveling dynamic protein structures by two-dimensional infrared spectra with a pretrained machine learning model
Dynamic protein structures are crucial for deciphering their diverse biological functions. Two-dimensional infrared (2DIR) spectroscopy stands as an ideal tool for tracing rapid conformational evolutions in proteins. However, linking spectral characteristics to dynamic structures poses a formidable challenge. Here, we present a pretrained machine learning model based on 2DIR spectra analysis. This model has learned signal features from approximately 204,300 spectra to establish a “spectrum-structure” correlation, thereby tracing the dynamic conformations of proteins. It excels in accurately predicting the dynamic content changes of various secondary structures and demonstrates universal transferability on real folding trajectories spanning timescales from microseconds to milliseconds. Beyond exceptional predictive performance, the model offers attention-based spectral explanations of dynamic conformational changes. Our 2DIR-based pretrained model is anticipated to provide unique insights into the dynamic structural information of proteins in their native environments.  more » « less
Award ID(s):
2246379
PAR ID:
10614366
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
27
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ultrafast two-dimensional infrared (2DIR) spectroscopy is a relatively new methodology, which has now been widely used to study the molecular structure and dynamics of molecular processes occurring in solution. Typically, in 2DIR spectroscopy the dynamics of a system is inferred from the evolution of 2DIR spectral features over waiting times. One of the most important metrics derived from the 2DIR is the frequency–frequency correlation function (FFCF), which can be extracted using different methods, including center and nodal line slope. However, these methods struggle to correctly describe the dynamics in 2DIR spectra with multiple and overlapping transitions. Here, a new approach, utilizing pseudo-Zernike moments, is introduced to retrieve the FFCF dynamics of each spectral component from complex 2DIR spectra. The results show that this new method not only produces equivalent results to more established methodologies in simple spectra but also successfully extracts the FFCF dynamics of individual component from very congested and unresolved 2DIR spectra. In addition, this new methodology can be used to locate the individual frequency components from those complex spectra. Overall, a new methodology for analyzing the 2D spectra is presented here, which allows us to retrieve previously unattainable spectral features from the 2DIR spectra. 
    more » « less
  2. Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins. 
    more » « less
  3. Two-dimensional infrared (2DIR) spectroscopy has become an established method for generating vibrational spectra in condensed phase samples composed of mixtures that yield heavily congested infrared and Raman spectra. These condensed phase 2DIR spectrometers can provide very high temporal resolution (<1 ps), but the spectral resolution is generally insufficient for resolving rotational peaks in gas phase spectra. Conventional (1D) rovibrational spectra of gas phase molecules are often plagued by severe spectral congestion, even when the sample is not a mixture. Spectral congestion can obscure the patterns in rovibrational spectra that are needed to assign peaks in the spectra. A method for generating high resolution 2DIR spectra of gas phase molecules has now been developed and tested using methane as the sample. The 2D rovibrational patterns that are recorded resemble an asterisk with a center position that provides the frequencies of both of the two coupled vibrational levels. The ability to generate easily recognizable 2D rovibrational patterns, regardless of temperature, should make the technique useful for a wide range of applications that are otherwise difficult or impossible when using conventional 1D rovibrational spectroscopy. 
    more » « less
  4. Chemically identical chlorophyll (Chl) molecules undergo conformational changes when they are embedded in a protein matrix. The conformational changes will modulate their absorption spectra to meet the need for programmed excitation energy transfer or electron transfer. To interpret spectroscopic data using the knowledge of pigment–protein interactions requires a single pigment embedded in one polypeptide matrix. Unfortunately, most of the known photosynthetic systems contain a set of multiple pigments in each protein subunit. This makes it complicated to interpret spectroscopic data using structural data due to the potential overlapping spectra of two or more pigments. Chl–protein interactions have not been systematically studied to answer three fundamental questions: (i) What are the structural characteristics and commonly shared substructures of different types of Chl molecules (e.g., Chl a, b, c, d, and f)? (ii) How many structural groups can Chl molecules be divided into and how are different structural groups influenced by their surrounding environments? (iii) What are the structural characteristics of pigment surrounding environments? Having no clear answers to the unresolved questions is probably due to a lack of computational methods for quantifying conformational changes in individual Chls and individual surrounding amino acids. The first version of the Triangular Spatial Relationship (TSR)-based method was developed for comparing protein 3D structures. The input data for the TSR-based method are experimentally determined 3D structures from the Protein Data Bank (PDB). In this study, we take advantage of the 3D structures of Chl-binding proteins deposited in the PDB and the TSR-based method to systematically investigate the 3D structures of various types of Chls and their protein environments. The key contributions of this study can be summarized as follows: (i) Specific structural characteristics of Chl d and f were identified and are defined using the TSR keys. (ii) Two and three clusters were found for various types of Chls and Chls a, respectively. The signature structures for distinguishing their corresponding two and three clusters were identified. (iii) Histidine residues were used as an example for revealing structural characteristics of Chl-binding sites. This study provides evidence for the three unresolved questions and builds a structural foundation through quantifying Chl conformations as well as structures of their embedded protein environments for future mechanistic understanding of relationships between Chl–protein interactions and their corresponding spectroscopic data. 
    more » « less
  5. Abstract Proteins are inherently dynamic, and their conformational ensembles are functionally important in biology. Large-scale motions may govern protein structure–function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging both experimentally and computationally. In this paper we first introduced an unsupervised deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics (MD) simulation data. Second, we selected interpolating data points in the learned latent space that rapidly identify novel synthetic conformations with sophisticated and large-scale sidechains and backbone arrangements. Third, with the highly dynamic amyloid-β1-42(Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42’s conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that can be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct sidechain rearrangements that are probed by our EPR and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability for deep learning to utilize learned natural atomistic motions in protein conformation sampling. 
    more » « less