skip to main content


Title: Exploring the landscape of model representations

The success of any physical model critically depends upon adopting an appropriate representation for the phenomenon of interest. Unfortunately, it remains generally challenging to identify the essential degrees of freedom or, equivalently, the proper order parameters for describing complex phenomena. Here we develop a statistical physics framework for exploring and quantitatively characterizing the space of order parameters for representing physical systems. Specifically, we examine the space of low-resolution representations that correspond to particle-based coarse-grained (CG) models for a simple microscopic model of protein fluctuations. We employ Monte Carlo (MC) methods to sample this space and determine the density of states for CG representations as a function of their ability to preserve the configurational information, I, and large-scale fluctuations, Q, of the microscopic model. These two metrics are uncorrelated in high-resolution representations but become anticorrelated at lower resolutions. Moreover, our MC simulations suggest an emergent length scale for coarse-graining proteins, as well as a qualitative distinction between good and bad representations of proteins. Finally, we relate our work to recent approaches for clustering graphs and detecting communities in networks.

 
more » « less
Award ID(s):
1856337 1053970 1800344
NSF-PAR ID:
10191926
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
117
Issue:
39
ISSN:
0027-8424
Page Range / eLocation ID:
p. 24061-24068
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Simulations of soft materials often adopt low-resolution coarse-grained (CG) models. However, the CG representation is not unique and its impact upon simulated properties is poorly understood. In this work, we investigate the space of CG representations for ubiquitin, which is a typical globular protein with 72 amino acids. We employ Monte Carlo methods to ergodically sample this space and to characterize its landscape. By adopting the Gaussian network model as an analytically tractable atomistic model for equilibrium fluctuations, we exactly assess the intrinsic quality of each CG representation without introducing any approximations in sampling configurations or in modeling interactions. We focus on two metrics, the spectral quality and the information content, that quantify the extent to which the CG representation preserves low-frequency, large-amplitude motions and configurational information, respectively. The spectral quality and information content are weakly correlated among high-resolution representations but become strongly anticorrelated among low-resolution representations. Representations with maximal spectral quality appear consistent with physical intuition, while low-resolution representations with maximal information content do not. Interestingly, quenching studies indicate that the energy landscape of mapping space is very smooth and highly connected. Moreover, our study suggests a critical resolution below which a “phase transition” qualitatively distinguishes good and bad representations.

     
    more » « less
  2. Fluctuations of protein three-dimensional structures and large-scale conformational transitions are crucial for the biological function of proteins and their complexes. Experimental studies of such phenomena remain very challenging and therefore molecular modeling can be a good alternative or a valuable supporting tool for the investigation of large molecular systems and long-time events. In this minireview, we present two alternative approaches to the coarse-grained (CG) modeling of dynamic properties of protein systems. We discuss two CG representations of polypeptide chains used for Monte Carlo dynamics simulations of protein local dynamics and conformational transitions, and highly simplified structure-based elastic network models of protein flexibility. In contrast to classical all-atom molecular dynamics, the modeling strategies discussed here allow the quite accurate modeling of much larger systems and longer-time dynamic phenomena. We briefly describe the main features of these models and outline some of their applications, including modeling of near-native structure fluctuations, sampling of large regions of the protein conformational space, or possible support for the structure prediction of large proteins and their complexes. 
    more » « less
  3. Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.

     
    more » « less
  4. Transmembrane helix folding and self-association play important roles in biological signaling and transportation pathways across biomembranes. With molecular simulations, studies to explore the structural biochemistry of this process have been limited to focusing on individual fragments of this process – either helix formation or dimerization. While at an atomistic resolution, it can be prohibitive to access long spatio-temporal scales, at the coarse grained (CG) level, current methods either employ additional constraints to prevent spontaneous unfolding or have a low resolution on sidechain beads that restricts the study of dimer disruption caused by mutations. To address these research gaps, in this work, we apply our recent, in-house developed CG model ( ProMPT ) to study the folding and dimerization of Glycophorin A (GpA) and its mutants in the presence of Dodecyl-phosphocholine (DPC) micelles. Our results first validate the two-stage model that folding and dimerization are independent events for transmembrane helices and found a positive correlation between helix folding and DPC-peptide contacts. The wild type (WT) GpA is observed to be a right-handed dimer with specific GxxxG contacts, which agrees with experimental findings. Specific point mutations reveal several features responsible for the structural stability of GpA. While the T87L mutant forms anti-parallel dimers due to an absence of T87 interhelical hydrogen bonds, a slight loss in helicity and a hinge-like feature at the GxxxG region develops for the G79L mutant. We note that the local changes in the hydrophobic environment, affected by the point mutation, contribute to the development of this helical bend. This work presents a holistic overview of the structural stability of GpA in a micellar environment, while taking secondary structural fluctuations into account. Moreover, it presents opportunities for applications of computationally efficient CG models to study conformational alterations of transmembrane proteins that have physiological relevance. 
    more » « less
  5. We present a bottom-up coarse-graining (CG) method to establish implicit-solvent CG modeling for polymers in solution, which conserves the dynamic properties of the reference microscopic system. In particular, tens to hundreds of bonded polymer atoms (or Lennard-Jones beads) are coarse-grained as one CG particle, and the solvent degrees of freedom are eliminated. The dynamics of the CG system is governed by the generalized Langevin equation (GLE) derived via the Mori-Zwanzig formalism, by which the CG variables can be directly and rigorously linked to the microscopic dynamics generated by molecular dynamics (MD) simulations. The solvent-mediated dynamics of polymers is modeled by the non-Markovian stochastic dynamics in GLE, where the memory kernel can be computed from the MD trajectories. To circumvent the difficulty in direct evaluation of the memory term and generation of colored noise, we exploit the equivalence between the non-Markovian dynamics and Markovian dynamics in an extended space. To this end, the CG system is supplemented with auxiliary variables that are coupled linearly to the momentum and among themselves, subject to uncorrelated Gaussian white noise. A high-order time-integration scheme is used to solve the extended dynamics to further accelerate the CG simulations. To assess, validate, and demonstrate the established implicit-solvent CG modeling, we have applied it to study four different types of polymers in solution. The dynamic properties of polymers characterized by the velocity autocorrelation function, diffusion coefficient, and mean square displacement as functions of time are evaluated in both CG and MD simulations. Results show that the extended dynamics with auxiliary variables can construct arbitrarily high-order CG models to reproduce dynamic properties of the reference microscopic system and to characterize long-time dynamics of polymers in solution. 
    more » « less