BackgroundMolecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. This enables advances in drug discovery and the design of therapeutic interventions. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. A needIdeally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. A solutionHere, we introduce MDRepo, a robust infrastructure that supports a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyberinfrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more »
« less
MDRepo—an open data warehouse for community-contributed molecular dynamics simulations of proteins
Abstract Molecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. Ideally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. Here, we introduce MDRepo, a robust infrastructure that provides a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyber-infrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more »
« less
- Award ID(s):
- 1953405
- PAR ID:
- 10576871
- Publisher / Repository:
- Oxford Academic
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 53
- Issue:
- D1
- ISSN:
- 0305-1048
- Page Range / eLocation ID:
- D477 to D486
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Active learning (AL) is a powerful sequential optimization approach that has shown great promise in the discovery of new materials. However, a major challenge remains the acquisition of the initial data and the development of workflows to generate new data at each iteration. In this study, we demonstrate a significant speedup in an optimization task by reusing a published simulation workflow available for online simulations and its associated data repository, where the results of each workflow run are automatically stored. Both the workflow and its data follow FAIR (findable, accessible, interoperable, and reusable) principles using nanoHUB’s infrastructure. The workflow employs molecular dynamics to calculate the melting temperature of multi-principal component alloys. We leveraged all prior data not only to develop an accurate machine learning model to start the sequential optimization but also to optimize the simulation parameters and accelerate convergence. Prior work showed that finding the alloy composition with the highest melting temperature required testing several alloy compositions, and establishing the melting temperature for each composition took, on average, multiple simulations. By developing a workflow that utilizes the FAIR data in the nanoHUB database, we reduced the number of simulations per composition to one and found the alloy with the lowest melting temperature testing only three compositions. This second optimization, therefore, shows a speedup of 10x as compared to models that do not access the FAIR databasesmore » « less
-
Abstract Molecular dynamics (MD) simulations are immensely valuable for studying protein structure, function and dynamics. Their ability to capture atomic‐level behavior of molecules and describe their evolution over time makes it a powerful synergistic tool for biochemistry, structural biology and other life sciences. To advance research and knowledge on reasonable timescales, researchers must optimize the amount of useful information extracted from simulation data while often frugally managing computational resources. Often, this involves balancing the length of MD trajectories with the number of replicas of a given system, with the aim of maximizing sampling of the conformational landscape. However, identifying this balance is not always intuitive, and the lack of standards among researchers can produce large variability in results and predictions from MD measurements. Here, we investigate the variability in MD results when simulation length and replica numbers are varied. Using a 231‐amino acid domain, we compare measurements from independent trajectories to a benchmark trajectory of 3, 1000‐ns replicates. We perform these simulations on 27 protein‐ligand complexes, allowing us to compare ligand‐specific rankings of complexes across independent replicas. Our results reveal that some MD measurements are accurately ranked by single trajectories, while others are not. We uncover similar variability in the effects of trajectory lengths on measurements. Our findings suggest that a one‐size‐fits‐all approach to MD simulations is not necessarily the best approach, and depending on the intended measurements and research question, it may be advantageous sometimes to prioritize longer trajectories over multiple replicas. This work provides important considerations for researchers while designing simulation studies.more » « less
-
Cell membranes are incredibly complex environments containing hundreds of components. Despite substantial advances in the past decade, fundamental questions related to lipid-lipid interactions and heterogeneity persist. This review explores the complexity of lipid membranes, showcasing recent advances in vibrational spectroscopy to characterize the structure, dynamics, and interactions at the membrane interface. We include an overview of modern techniques such as surface-enhanced infrared spectroscopy as a steady-state technique with single-bilayer sensitivity, two-dimensional sum-frequency generation spectroscopy, and two-dimensional infrared spectroscopy to measure time-evolving structures and dynamics with femtosecond time resolution. Furthermore, we discuss the potential of multiscale molecular dynamics (MD) simulations, focusing on recently developed simulation algorithms, which have emerged as a powerful approach to interpret complex spectra. We highlight the ongoing challenges in studying heterogeneous environments in multicomponent membranes via current vibrational spectroscopic techniques and MD simulations. Overall, this review provides an up-to-date comprehensive overview of the powerful combination of vibrational spectroscopy and simulations, which has great potential to illuminate lipid-lipid, lipid-protein, and lipid-water interactions in the intricate conformational landscape of cell membranes.more » « less
-
Overview of ICARUS─A Curated, Open Access, Online Repository for Atmospheric Simulation Chamber DataV. Faye McNeill (Ed.)Atmospheric simulation chambers continue to be indispensable tools for research in the atmospheric sciences. Insights from chamber studies are integrated into atmospheric chemical transport models, which are used for science-informed policy decisions. However, a centralized data management and access infrastructure for their scientific products had not been available in the United States and many parts of the world. ICARUS (Integrated Chamber Atmospheric data Repository for Unified Science) is an open access, searchable, web-based infrastructure for storing, sharing, discovering, and utilizing atmospheric chamber data [https://icarus.ucdavis.edu]. ICARUS has two parts: a data intake portal and a search and discovery portal. Data in ICARUS are curated, uniform, interactive, indexed on popular search engines, mirrored by other repositories, version-tracked, vocabulary-controlled, and citable. ICARUS hosts both legacy data and new data in compliance with open access data mandates. Targeted data discovery is available based on key experimental parameters, including organic reactants and mixtures that are managed using the PubChem chemical database, oxidant information, nitrogen oxide (NOx) content, alkylperoxy radical (RO2) fate, seed particle information, environmental conditions, and reaction categories. A discipline-specific repository such as ICARUS with high amounts of metadata works to support the evaluation and revision of atmospheric model mechanisms, intercomparison of data and models, and the development of new model frameworks that can have more predictive power in the current and future atmosphere. The open accessibility and interactive nature of ICARUS data may also be useful for teaching, data mining, and training machine learning models.more » « less
An official website of the United States government

