Abstract Molecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. Ideally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. Here, we introduce MDRepo, a robust infrastructure that provides a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyber-infrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data. 
                        more » 
                        « less   
                    
                            
                            MDRepo – an open environment for data warehousing and knowledge discovery from molecular dynamics simulations
                        
                    
    
            BackgroundMolecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. This enables advances in drug discovery and the design of therapeutic interventions. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. A needIdeally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. A solutionHere, we introduce MDRepo, a robust infrastructure that supports a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyberinfrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2005506
- PAR ID:
- 10546273
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Active learning (AL) is a powerful sequential optimization approach that has shown great promise in the discovery of new materials. However, a major challenge remains the acquisition of the initial data and the development of workflows to generate new data at each iteration. In this study, we demonstrate a significant speedup in an optimization task by reusing a published simulation workflow available for online simulations and its associated data repository, where the results of each workflow run are automatically stored. Both the workflow and its data follow FAIR (findable, accessible, interoperable, and reusable) principles using nanoHUB’s infrastructure. The workflow employs molecular dynamics to calculate the melting temperature of multi-principal component alloys. We leveraged all prior data not only to develop an accurate machine learning model to start the sequential optimization but also to optimize the simulation parameters and accelerate convergence. Prior work showed that finding the alloy composition with the highest melting temperature required testing several alloy compositions, and establishing the melting temperature for each composition took, on average, multiple simulations. By developing a workflow that utilizes the FAIR data in the nanoHUB database, we reduced the number of simulations per composition to one and found the alloy with the lowest melting temperature testing only three compositions. This second optimization, therefore, shows a speedup of 10x as compared to models that do not access the FAIR databasesmore » « less
- 
            Density functional theory (DFT) has been applied to modeling molecular interactions in water for over three decades. The ubiquity of water in chemical and biological processes demands a unified understanding of its physics, from the single molecule to the thermodynamic limit and everything in between. Recent advances in the development of data-driven and machine-learning potentials have accelerated simulation of water and aqueous systems with DFT accuracy. However, anomalous properties of water in the condensed phase, where a rigorous treatment of both local and non-local many-body (MB) interactions is in order, are often unsatisfactory or partially missing in DFT models of water. In this review, we discuss the modeling of water and aqueous systems based on DFT and provide a comprehensive description of a general theoretical/computational framework for the development of data-driven many-body potentials from DFT reference data. This framework, coined MB-DFT, readily enables efficient many-body molecular dynamics (MD) simulations of small molecules, in both gas and condensed phases, while preserving the accuracy of the underlying DFT model. Theoretical considerations are emphasized, including the role that the delocalization error plays in MB-DFT potentials of water and the possibility to elevate DFT and MB-DFT to near-chemical-accuracy through a density-corrected formalism. The development of the MB-DFT framework is described in detail, along with its application in MB-MD simulations and recent extension to the modeling of reactive processes in solution within a quantum mechanics/MB molecular mechanics (QM/MB-MM) scheme, using water as a prototypical solvent. Finally, we identify open challenges and discuss future directions for MB-DFT and QM/MB-MM simulations in condensed phases.more » « less
- 
            KCNE3 is a potassium channel accessory transmembrane protein that regulates the function of various voltage-gated potassium channels such as KCNQ1. KCNE3 plays an important role in the recycling of potassium ion by binding with KCNQ1. KCNE3 can be found in the small intestine, colon, and in the human heart. Despite its biological significance, there is little information on the structural dynamics of KCNE3 in native-like membrane environments. Molecular dynamics (MD) simulations are a widely used as a tool to study the conformational dynamics and interactions of proteins with lipid membranes. In this study, we have utilized all-atom molecular dynamics simulations to characterize the molecular motions and the interactions of KCNE3 in a bilayer composed of: a mixture of POPC and POPG lipids (3:1), POPC alone, and DMPC alone. Our MD simulation results suggested that the transmembrane domain (TMD) of KCNE3 is less flexible and more stable when compared to the N- and C-termini of KCNE3 in all three membrane environments. The conformational flexibility of N- and C-termini varies across these three lipid environments. The MD simulation results further suggested that the TMD of KCNE3 spans the membrane width, having residue A69 close to the center of the lipid bilayers and residues S57 and S82 close to the lipid bilayer membrane surfaces. These results are consistent with previous biophysical studies of KCNE3. The outcomes of these MD simulations will help design biophysical experiments and complement the experimental data obtained on KCNE3 to obtain a more detailed understanding of its structural dynamics in the native membrane environment.more » « less
- 
            Simulating the dynamics of ions near polarizable nanoparticles (NPs) using coarse-grained models is extremely challenging due to the need to solve the Poisson equation at every simulation timestep. Recently, a molecular dynamics (MD) method based on a dynamical optimization framework bypassed this obstacle by representing the polarization charge density as virtual dynamic variables and evolving them in parallel with the physical dynamics of ions. We highlight the computational gains accessible with the integration of machine learning (ML) methods for parameter prediction in MD simulations by demonstrating how they were realized in MD simulations of ions near polarizable NPs. An artificial neural network–based regression model was integrated with MD simulation and predicted the optimal simulation timestep and optimization parameters characterizing the virtual system with 94.3% success. The ML-enabled auto-tuning of parameters generated accurate dynamics of ions for ≈ 10 million steps while improving the stability of the simulation by over an order of magnitude. The integration of ML-enhanced framework with hybrid Open Multi-Processing / Message Passing Interface (OpenMP/MPI) parallelization techniques reduced the computational time of simulating systems with thousands of ions and induced charges from thousands of hours to tens of hours, yielding a maximum speedup of ≈ 3 from ML-only acceleration and a maximum speedup of ≈ 600 from the combination of ML and parallel computing methods. Extraction of ionic structure in concentrated electrolytes near oil–water emulsions demonstrates the success of the method. The approach can be generalized to select optimal parameters in other MD applications and energy minimization problems.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    