skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pipeline for Automating Compliance-based Elimination and Extension (PACE2): A Systematic Framework for High-throughput Biomolecular Material Simulation Workflows
The formation of biomolecular materials via dynamical interfacial processes such as self-assembly and fusion, for diverse compositions and external conditions, can be efficiently probed using ensemble Molecular Dynamics. However, this approach requires a large number of simulations when investigating a large composition phase space. In addition, there is difficulty in predicting whether each simulation is yielding biomolecular materials with the desired properties or outcomes and how long each simulation will run for. These difficulties can be overcome by rules-based management systems which include intermittent inspection, variable sampling, premature termination and extension of the individual Molecular Dynamics simulations. The automation of such a management system can significantly reduce the overhead of managing large ensembles of Molecular Dynamics simulations. To this end, a high-throughput workflows-based computational framework, Pipeline for Automating Compliance-based Elimination and Extension (PACE2), for biomolecular materials simulations is proposed. The PACE2 framework encompasses Simulation-Analysis Pipelines. Each Pipeline includes temporally separated simulation and analysis tasks. When a Molecular Dynamics simulation completes, an analysis task is triggered which evaluates the Molecular Dynamics trajectory for compliance. Compliant Molecular Dynamics simulations are extended to the next Molecular Dynamics phase with a suitable sample rate to allow additional, detailed analysis. Non-compliant Molecular Dynamics simulations are eliminated, and their computational resources are either reallocated or released. The framework is designed to run on local desktop computers and high performance computing resources. In the future, the framework will be extended to address generalized workflows and investigate other classes of materials.  more » « less
Award ID(s):
1654325
PAR ID:
10407975
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
arXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The formation of biomolecular materials via dynamical interfacial processes, such as self-assembly and fusion, for diverse compositions and external conditions can be efficiently probed using ensemble Molecular Dynamics (MD). However, this approach requires many simulations when investigating a large composition phase space. In addition, there is difficulty in predicting whether each simulation will yield biomolecular materials with the desired properties or outcomes and how long each simulation will run. These difficulties can be overcome by rules-based management systems, including intermittent inspection, variable sampling, and premature termination or extension of the individual MD simulations. Automating such a management system can significantly improve runtime efficiency and reduce the burden of organizing large ensembles of MD simulations. To this end, a computational framework, the Pipelines for Automating Compliance-based Elimination and Extension (PACE2), is proposed for high-throughput ensemble biomolecular materials simulations. The PACE2framework encompasses Candidate pipelines, where each pipeline includes temporally separated simulation and analysis tasks. When a MD simulation is completed, an analysis task is triggered, which evaluates the MD trajectory for compliance. Compliant simulations are extended to the next MD phase with a suitable sample rate to allow additional, detailed analysis. Non-compliant simulations are eliminated, and their computational resources are reallocated or released. The framework is designed to run on local desktop computers and high-performance computing resources. Preliminary scientific results enabled by the use of PACE2framework are presented, which demonstrate its potential and validates its function. In the future, the framework will be extended to address generalized workflows and investigate composition-structure-property relations for other classes of materials. 
    more » « less
  2. Scientific breakthroughs in biomolecular methods and improvements in hardware technology have shifted from a single long-running simulation to a large set of shorter simulations running simultaneously, called an ensemble. In an ensemble, each independent simulation is usually coupled with several analyses that apply identical or distinct algorithms on data produced by the corresponding simulation. Today, In situ methods are used to analyze large volumes of data generated by scientific simulations at runtime. This work studies the execution of ensemble-based simulations paired with In situ analyses using in-memory staging methods. Because simulations and analyses forming an ensemble typically run concurrently, deploying an ensemble requires efficient co-location-aware strategies, making sure the data flow between simulations and analyses that form an In situ workflow is efficient. Using an ensemble of molecular dynamics In situ workflows with multiple simulations and analyses, we first show that collecting traditional metrics such as makespan, instructions per cycle, memory usage, or cache miss ratio is not sufficient to characterize the complex behaviors of ensembles. Thus, we propose a method to evaluate the performance of ensembles of workflows that captures resource usage (efficiency), resource allocation, and component placement. Experimental results demonstrate that our proposed method can effectively capture the performance of different component placements in an ensemble. By evaluating different co-location scenarios, our performance indicator demonstrates improvements of up to four orders of magnitude when co-locating simulation and coupled analyses within a single computational host. 
    more » « less
  3. Workflow management systems (WMS) are widely used to describe and execute large computational or data intensive applications. However, when a large ensemble of workflows is run on a cluster, new resource management problems occur. Each WMS itself consumes otherwise unmanaged resources, such as the shared head node where the WMS coordinator runs, the shared filesystem where intermediate data is stored, and the shared batch queue itself. We introduce Mufasa, a meta-workflow management system, which is designed to control the concurrency of multiple workflows in an ensemble, by observing and controlling the resources required by each WMS. We show some initial results demonstrating that Mufasa correctly handles the overcommitment of different resource types by starting, pausing, and cancelling workflows with unexpected behavior. 
    more » « less
  4. Through the NSF Future Manufacturing undergraduate research program at Pasadena City College (PCC), students utilize the tools of synthetic biology to build sustainable, DNA-based materials. The manipulation of DNA enables the construction of microscopic biochemical reactors through the formation of liquid-liquid phase-separated droplets, or DNA condensates. This research investigates the potential of DNA nanostars fused with G-tetraplexes, which can bind hemin, an iron-containing porphyrin co-factor, to form a DNAzyme capable of catalyzing peroxidation reactions within single condensate layers. The in vitro component of this research was enhanced by in silico coarse-grained molecular dynamics simulations, which generated 3D models of the DNA nanostars that allowed student researchers to visualize the behavior of the structures created in the laboratory. Leveraging this computational technique, student researchers developed educational resources and modular lessons to introduce these molecular simulations to a broad student audience at PCC. The simulation programs used, oxDNA and oxView, were instrumental in making this research accessible and engaging for diverse student groups. DNA nanostar simulations were integrated into the General, Organic, and Biochemistry curriculum at PCC, as well as during outreach events such as Girls Science Day, offering students insights into DNA nanostar dynamics and potential applications of DNA-based inventions. This paper details the use of simulation programs to recreate nucleic acid-based nanostructures, advancing the field of DNA nanotechnology. Molecular simulations helped the PCC research students develop experiments that demonstrate how enzymatic activity within DNA droplets can be achieved through G4 complexing. Simulating DNA nanostars with G4s was a profound educational exercise for students, as it taught them about the powerful synergy between in silico and in vitro experimentation. Students also learned about the limitations of modeling biomolecules using computational software, and our G4 simulation results may even inspire the integration of guanine-guanine interactions into the oxDNA program. These findings underscore the significant implications of in silico modeling and structural analysis in biochemical manufacturing and industrial applications, paving the way for further innovations in programmable biomolecular systems. By developing YouTube tutorials that teach students how to carry out nucleic acid simulations on any standard computer, the exploration of DNA dynamics and molecular programming is now widely accessible to both students and educators. 
    more » « less
  5. To directly simulate rare events using atomistic molecular dynamics is a significant challenge in computational biophysics. Well-established enhanced-sampling techniques do exist to obtain the thermodynamic functions for such systems. However, developing methods for obtaining the kinetics of long timescale processes from simulation at atomic detail is comparatively less developed an area. Milestoning and the weighted ensemble (WE) method are two different stratification strategies; both have shown promise for computing long timescales of complex biomolecular processes. Nevertheless, both require a significant investment of computational resources. We have combined WE and milestoning to calculate observables in orders-of-magnitude less central processing unit and wall-clock time. Our weighted ensemble milestoning method (WEM) uses WE simulation to converge the transition probability and first passage times between milestones, followed by the utilization of the theoretical framework of milestoning to extract thermodynamic and kinetic properties of the entire process. We tested our method for a simple one-dimensional double-well potential, for an eleven-dimensional potential energy surface with energy barrier, and on the biomolecular model system alanine dipeptide. We were able to recover the free energy profiles, time correlation functions, and mean first passage times for barrier crossing events at a significantly small computational cost. WEM promises to extend the applicability of molecular dynamics simulation to slow dynamics of large systems that are well beyond the scope of present day brute-force computations. 
    more » « less