skip to main content


Title: OpenAWSEM with Open3SPN2: A fast, flexible, and accessible framework for large-scale coarse-grained biomolecular simulations
We present OpenAWSEM and Open3SPN2, new cross-compatible implementations of coarse-grained models for protein (AWSEM) and DNA (3SPN2) molecular dynamics simulations within the OpenMM framework. These new implementations retain the chemical accuracy and intrinsic efficiency of the original models while adding GPU acceleration and the ease of forcefield modification provided by OpenMM’s Custom Forces software framework. By utilizing GPUs, we achieve around a 30-fold speedup in protein and protein-DNA simulations over the existing LAMMPS-based implementations running on a single CPU core. We showcase the benefits of OpenMM’s Custom Forces framework by devising and implementing two new potentials that allow us to address important aspects of protein folding and structure prediction and by testing the ability of the combined OpenAWSEM and Open3SPN2 to model protein-DNA binding. The first potential is used to describe the changes in effective interactions that occur as a protein becomes partially buried in a membrane. We also introduced an interaction to describe proteins with multiple disulfide bonds. Using simple pairwise disulfide bonding terms results in unphysical clustering of cysteine residues, posing a problem when simulating the folding of proteins with many cysteines. We now can computationally reproduce Anfinsen’s early Nobel prize winning experiments by using OpenMM’s Custom Forces framework to introduce a multi-body disulfide bonding term that prevents unphysical clustering. Our protein-DNA simulations show that the binding landscape is funneled towards structures that are quite similar to those found using experiments. In summary, this paper provides a simulation tool for the molecular biophysics community that is both easy to use and sufficiently efficient to simulate large proteins and large protein-DNA systems that are central to many cellular processes. These codes should facilitate the interplay between molecular simulations and cellular studies, which have been hampered by the large mismatch between the time and length scales accessible to molecular simulations and those relevant to cell biology.  more » « less
Award ID(s):
2019745
NSF-PAR ID:
10233534
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Editor(s):
Schneidman-Duhovny, Dina
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
2
ISSN:
1553-7358
Page Range / eLocation ID:
e1008308
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Babitzke, Paul (Ed.)
    ABSTRACT Oxidative stress causes cellular damage, including DNA mutations, protein dysfunction, and loss of membrane integrity. Here, we discovered that a TrmB (transcription regulator of mal operon) family protein (Pfam PF01978) composed of a single winged-helix DNA binding domain (InterPro IPR002831) can function as thiol-based transcriptional regulator of oxidative stress response. Using the archaeon Haloferax volcanii as a model system, we demonstrate that the TrmB-like OxsR is important for recovery of cells from hypochlorite stress. OxsR is shown to bind specific regions of genomic DNA, particularly during hypochlorite stress. OxsR-bound intergenic regions were found proximal to oxidative stress operons, including genes associated with thiol relay and low molecular weight thiol biosynthesis. Further analysis of a subset of these sites revealed OxsR to function during hypochlorite stress as a transcriptional activator and repressor. OxsR was shown to require a conserved cysteine (C24) for function and to use a CG-rich motif upstream of conserved BRE/TATA box promoter elements for transcriptional activation. Protein modeling suggested the C24 is located at a homodimer interface formed by antiparallel α helices, and that oxidation of this cysteine would result in the formation of an intersubunit disulfide bond. This covalent linkage may promote stabilization of an OxsR homodimer with the enhanced DNA binding properties observed in the presence of hypochlorite stress. The phylogenetic distribution TrmB family proteins, like OxsR, that have a single winged-helix DNA binding domain and conserved cysteine residue suggests this type of redox signaling mechanism is widespread in Archaea. IMPORTANCE TrmB-like proteins, while not yet associated with redox stress, are found in bacteria and widespread in archaea. Here, we expand annotation of a large group of TrmB-like single winged-helix DNA binding domain proteins from diverse archaea to function as thiol-based transcriptional regulators of oxidative stress response. Using Haloferax volcanii as a model, we reveal that the TrmB-like OxsR functions during hypochlorite stress as a transcriptional activator and repressor of an extensive gene coexpression network associated with thiol relay and other related activities. A conserved cysteine residue of OxsR serves as the thiol-based sensor for this function and likely forms an intersubunit disulfide bond during hypochlorite stress that stabilizes a homodimeric configuration with enhanced DNA binding properties. A CG-rich DNA motif in the promoter region of a subset of sites identified to be OxsR-bound is required for regulation; however, not all sites have this motif, suggesting added complexity to the regulatory network. 
    more » « less
  2. This data set for the manuscript entitled "Design of Peptides that Fold and Self-Assemble on Graphite" includes all files needed to run and analyze the simulations described in the this manuscript in the molecular dynamics software NAMD, as well as the output of the simulations. The files are organized into directories corresponding to the figures of the main text and supporting information. They include molecular model structure files (NAMD psf or Amber prmtop format), force field parameter files (in CHARMM format), initial atomic coordinates (pdb format), NAMD configuration files, Colvars configuration files, NAMD log files, and NAMD output including restart files (in binary NAMD format) and trajectories in dcd format (downsampled to 10 ns per frame). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts or python scripts. These scripts and their output are also included.

    Version: 2.0

    Changes versus version 1.0 are the addition of the free energy of folding, adsorption, and pairing calculations (Sim_Figure-7) and shifting of the figure numbers to accommodate this addition.


    Conventions Used in These Files
    ===============================

    Structure Files
    ----------------
    - graph_*.psf or sol_*.psf (original NAMD (XPLOR?) format psf file including atom details (type, charge, mass), as well as definitions of bonds, angles, dihedrals, and impropers for each dipeptide.)

    - graph_*.pdb or sol_*.pdb (initial coordinates before equilibration)
    - repart_*.psf (same as the above psf files, but the masses of non-water hydrogen atoms have been repartitioned by VMD script repartitionMass.tcl)
    - freeTop_*.pdb (same as the above pdb files, but the carbons of the lower graphene layer have been placed at a single z value and marked for restraints in NAMD)
    - amber_*.prmtop (combined topology and parameter files for Amber force field simulations)
    - repart_amber_*.prmtop (same as the above prmtop files, but the masses of non-water hydrogen atoms have been repartitioned by ParmEd)

    Force Field Parameters
    ----------------------
    CHARMM format parameter files:
    - par_all36m_prot.prm (CHARMM36m FF for proteins)
    - par_all36_cgenff_no_nbfix.prm (CGenFF v4.4 for graphene) The NBFIX parameters are commented out since they are only needed for aromatic halogens and we use only the CG2R61 type for graphene.
    - toppar_water_ions_prot_cgenff.str (CHARMM water and ions with NBFIX parameters needed for protein and CGenFF included and others commented out)

    Template NAMD Configuration Files
    ---------------------------------
    These contain the most commonly used simulation parameters. They are called by the other NAMD configuration files (which are in the namd/ subdirectory):
    - template_min.namd (minimization)
    - template_eq.namd (NPT equilibration with lower graphene fixed)
    - template_abf.namd (for adaptive biasing force)

    Minimization
    -------------
    - namd/min_*.0.namd

    Equilibration
    -------------
    - namd/eq_*.0.namd

    Adaptive biasing force calculations
    -----------------------------------
    - namd/eabfZRest7_graph_chp1404.0.namd
    - namd/eabfZRest7_graph_chp1404.1.namd (continuation of eabfZRest7_graph_chp1404.0.namd)

    Log Files
    ---------
    For each NAMD configuration file given in the last two sections, there is a log file with the same prefix, which gives the text output of NAMD. For instance, the output of namd/eabfZRest7_graph_chp1404.0.namd is eabfZRest7_graph_chp1404.0.log.

    Simulation Output
    -----------------
    The simulation output files (which match the names of the NAMD configuration files) are in the output/ directory. Files with the extensions .coor, .vel, and .xsc are coordinates in NAMD binary format, velocities in NAMD binary format, and extended system information (including cell size) in text format. Files with the extension .dcd give the trajectory of the atomic coorinates over time (and also include system cell information). Due to storage limitations, large DCD files have been omitted or replaced with new DCD files having the prefix stride50_ including only every 50 frames. The time between frames in these files is 50 * 50000 steps/frame * 4 fs/step = 10 ns. The system cell trajectory is also included for the NPT runs are output/eq_*.xst.

    Scripts
    -------
    Files with the .sh extension can be found throughout. These usually provide the highest level control for submission of simulations and analysis. Look to these as a guide to what is happening. If there are scripts with step1_*.sh and step2_*.sh, they are intended to be run in order, with step1_*.sh first.


    CONTENTS
    ========

    The directory contents are as follows. The directories Sim_Figure-1 and Sim_Figure-8 include README.txt files that describe the files and naming conventions used throughout this data set.

    Sim_Figure-1: Simulations of N-acetylated C-amidated amino acids (Ac-X-NHMe) at the graphite–water interface.

    Sim_Figure-2: Simulations of different peptide designs (including acyclic, disulfide cyclized, and N-to-C cyclized) at the graphite–water interface.

    Sim_Figure-3: MM-GBSA calculations of different peptide sequences for a folded conformation and 5 misfolded/unfolded conformations.

    Sim_Figure-4: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-5: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 295 K.

    Sim_Figure-5_replica: Temperature replica exchange molecular dynamics simulations for the peptide cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) with 20 replicas for temperatures from 295 to 454 K.

    Sim_Figure-6: Simulation of the peptide molecule cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) in free solution (no graphite).

    Sim_Figure-7: Free energy calculations for folding, adsorption, and pairing for the peptide CHP1404 (sequence: cyc(GTGSGTG-GPGG-GCGTGTG-SGPG)). For folding, we calculate the PMF as function of RMSD by replica-exchange umbrella sampling (in the subdirectory Folding_CHP1404_Graphene/). We make the same calculation in solution, which required 3 seperate replica-exchange umbrella sampling calculations (in the subdirectory Folding_CHP1404_Solution/). Both PMF of RMSD calculations for the scrambled peptide are in Folding_scram1404/. For adsorption, calculation of the PMF for the orientational restraints and the calculation of the PMF along z (the distance between the graphene sheet and the center of mass of the peptide) are in Adsorption_CHP1404/ and Adsorption_scram1404/. The actual calculation of the free energy is done by a shell script ("doRestraintEnergyError.sh") in the 1_free_energy/ subsubdirectory. Processing of the PMFs must be done first in the 0_pmf/ subsubdirectory. Finally, files for free energy calculations of pair formation for CHP1404 are found in the Pair/ subdirectory.

    Sim_Figure-8: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) where the peptides are far above the graphene–water interface in the initial configuration.

    Sim_Figure-9: Two replicates of a simulation of nine peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-9_scrambled: Two replicates of a simulation of nine peptide molecules with the control sequence cyc(GGTPTTGGGGGGSGGPSGTGGC) at the graphite–water interface at 370 K.

    Sim_Figure-10: Adaptive biasing for calculation of the free energy of the folded peptide as a function of the angle between its long axis and the zigzag directions of the underlying graphene sheet.

     

    This material is based upon work supported by the US National Science Foundation under grant no. DMR-1945589. A majority of the computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CHE-1726332, CNS-1006860, EPS-1006860, and EPS-0919443. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, through allocation BIO200030. 
    more » « less
  3. Colloidal particles with mobile binding molecules constitute a powerful platform for probing the physics of self-assembly. Binding molecules are free to diffuse and rearrange on the surface, giving rise to spontaneous control over the number of droplet–droplet bonds, i.e. , valence, as a function of the concentration of binders. This type of valence control has been realized experimentally by tuning the interaction strength between DNA-coated emulsion droplets. Optimizing for valence two yields droplet polymer chains, termed ‘colloidomers’, which have recently been used to probe the physics of folding. To understand the underlying self-assembly mechanisms, here we present a coarse-grained molecular dynamics (CGMD) model to study the self-assembly of this class of systems using explicit representations of mobile binding sites . We explore how valence of assembled structures can be tuned through kinetic control in the strong binding limit. More specifically, we optimize experimental control parameters to obtain the highest yield of long linear colloidomer chains. Subsequently tuning the dynamics of binding and unbinding via a temperature-dependent model allows us to observe a heptamer chain collapse into all possible rigid structures, in good agreement with recent folding experiments. Our CGMD platform and dynamic bonding model (implemented as an open-source custom plugin to HOOMD-Blue) reveal the molecular features governing the binding patch size and valence control, and opens the study of pathways in colloidomer folding. This model can therefore guide programmable design in experiments. 
    more » « less
  4. ABSTRACT

    The 70 kDa heat shock proteins (Hsp70) are a family of molecular chaperones involved in protein folding, aggregate prevention, and protein disaggregation. They consist of the substrate‐binding domain (SBD) that binds client substrates, and the nucleotide‐binding domain (NBD), whose cycles of nucleotide hydrolysis and exchange underpin the activity of the chaperone. To characterize the structure–function relationships that link the binding state of the NBD to its conformational behavior, we analyzed the dynamics of the NBD of the Hsp70 chaperone fromBos taurus(PDB 3C7N:B) by all‐atom canonical molecular dynamics simulations. It was found that essential motions within the NBD fall into three major classes: the mutual class, reflecting tendencies common to all binding states, and the ADP‐ and ATP‐unique classes, which reflect conformational trends that are unique to either the ADP‐ or ATP‐bound states, respectively. “Mutual” class motions generally describe “in‐plane” and/or “out‐of‐plane” (scissor‐like) rotation of the subdomains within the NBD. This result is consistent with experimental nuclear magnetic resonance data on the NBD. The “unique” class motions target specific regions on the NBD, usually surface loops or sites involved in nucleotide binding and are, therefore, expected to be involved in allostery and signal transmission. For all classes, and especially for those of the “unique” type, regions of enhanced mobility can be identified; these are termed “hot spots,” and their locations generally parallel those found by NMR spectroscopy. The presence of magnesium and potassium cations in the nucleotide‐binding pocket was also found to influence the dynamics of the NBD significantly. Proteins 2015; 83:282–299. © 2014 Wiley Periodicals, Inc.

     
    more » « less
  5. Binding-induced mechanical stabilization plays key roles in proteins involved in muscle contraction, cellular mechanotransduction, or bacterial adhesion. Because of the vector nature of force, single-molecule force spectroscopy techniques are ideal for measuring the mechanical unfolding of proteins. However, current approaches are still prone to calibration errors between experiments and geometrical variations between individual tethers. Here, we introduce a single-molecule assay based on magnetic tweezers and heterocovalent attachment, which can measure the binding of the substrate–ligand using the same protein molecule. We demonstrate this approach with protein L, a model bacterial protein which has two binding interfaces for the same region of kappa-light chain antibody ligands. Engineered molecules with eight identical domains of protein L between a HaloTag and a SpyTag were exposed to repeated unfolding–refolding cycles at forces up to 100 pN for several hours at a time. The unfolding behavior of the same protein was measured in solution buffers with different concentrations of antibody ligands. With increasing antibody concentration, an increasing number of protein L domains became more stable, indicative of ligand binding and mechanical reinforcement. Interestingly, the dissociation constant of the mechanically reinforced states coincides with that measured for the low-avidity binding interface of protein L, suggesting a physiological role for the second binding interface. The molecular approach presented here opens the road to a new type of binding experiments, where the same molecule can be exposed to different solvents or ligands. 
    more » « less