skip to main content


Title: CSDMS: a community platform for numerical modeling of Earth surface processes
Abstract. Computational modeling occupies a unique niche in Earth and environmental sciences. Models serve not just as scientific technology and infrastructure but also as digital containers of the scientific community's understanding of the natural world. As this understanding improves, so too must the associated software. This dual nature – models as both infrastructure and hypotheses – means that modeling software must be designed to evolve continually as geoscientific knowledge itself evolves. Here we describe design principles, protocols, and tools developed by the Community Surface Dynamics Modeling System (CSDMS) to promote a flexible, interoperable, and ever-improving research software ecosystem. These include a community repository for model sharing and metadata, interface and ontology standards for model interoperability, language-bridging tools, a modular programming library for model construction, modular software components for data access, and a Python-based execution and model-coupling framework. Methods of community support and engagement that help create a community-centered software ecosystem are also discussed.  more » « less
Award ID(s):
2026951 1831623 2104102
NSF-PAR ID:
10327948
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
15
Issue:
4
ISSN:
1991-9603
Page Range / eLocation ID:
1413 to 1439
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0, a modern data standard for collaborative science across the neurophysiology community. 
    more » « less
  2. Abstract Increasingly sophisticated experiments, coupled with large-scale computational models, have the potential to systematically test biological hypotheses to drive our understanding of multicellular systems. In this short review, we explore key challenges that must be overcome to achieve robust, repeatable data-driven multicellular systems biology. If these challenges can be solved, we can grow beyond the current state of isolated tools and datasets to a community-driven ecosystem of interoperable data, software utilities, and computational modeling platforms. Progress is within our grasp, but it will take community (and financial) commitment. 
    more » « less
  3. Abstract

    Meeting the United Nation’ Sustainable Development Goals (SDGs) calls for an integrative scientific approach, combining expertise, data, models and tools across many disciplines towards addressing sustainability challenges at various spatial and temporal scales. This holistic approach, while necessary, exacerbates the big data and computational challenges already faced by researchers. Many challenges in sustainability research can be tackled by harnessing the power of advanced cyberinfrastructure (CI). The objective of this paper is to highlight the key components and technologies of CI necessary for meeting the data and computational needs of the SDG research community. An overview of the CI ecosystem in the United States is provided with a specific focus on the investments made by academic institutions, government agencies and industry at national, regional, and local levels. Despite these investments, this paper identifies barriers to the adoption of CI in sustainability research that include, but are not limited to access to support structures; recruitment, retention and nurturing of an agile workforce; and lack of local infrastructure. Relevant CI components such as data, software, computational resources, and human-centered advances are discussed to explore how to resolve the barriers. The paper highlights multiple challenges in pursuing SDGs based on the outcomes of several expert meetings. These include multi-scale integration of data and domain-specific models, availability and usability of data, uncertainty quantification, mismatch between spatiotemporal scales at which decisions are made and the information generated from scientific analysis, and scientific reproducibility. We discuss ongoing and future research for bridging CI and SDGs to address these challenges.

     
    more » « less
  4. Proteins and nucleic acids participate in essentially every biochemical process in living organisms, and the elucidation of their structure and motions is essential for our understanding how these molecular machines perform their function. Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful versatile technique that provides critical information on the molecular structure and dynamics. Spin-relaxation data are used to determine the overall rotational diffusion and local motions of biological macromolecules, while residual dipolar couplings (RDCs) reveal local and long-range structural architecture of these molecules and their complexes. This information allows researchers to refine structures of proteins and nucleic acids and provides restraints for molecular docking. Several software packages have been developed by NMR researchers in order to tackle the complicated experimental data analysis and structure modeling. However, many of them are offline packages or command-line applications that require users to set up the run time environment and also to possess certain programming skills, which inevitably limits accessibility of this software to a broad scientific community. Here we present new science gateways designed for NMR/structural biology community that address these current limitations in NMR data analysis. Using the GenApp technology for scientific gateways (https://genapp.rocks), we successfully transformed ROTDIF and ALTENS, two offline packages for bio-NMR data analysis, into science gateways that provide advanced computational functionalities, cloud-based data management, and interactive 2D and 3D plotting and visualizations. Furthermore, these gateways are integrated with molecular structure visualization tools (Jmol) and with gateways/engines (SASSIE-web) capable of generating huge computer-simulated structural ensembles of proteins and nucleic acids. This enables researchers to seamlessly incorporate conformational ensembles into the analysis in order to adequately take into account structural heterogeneity and dynamic nature of biological macromolecules. ROTDIF-web offers a versatile set of integrated modules/tools for determining and predicting molecular rotational diffusion tensors and model-free characterization of bond dynamics in biomacromolecules and for docking of molecular complexes driven by the information extracted from NMR relaxation data. ALTENS allows characterization of the molecular alignment under anisotropic conditions, which enables researchers to obtain accurate local and long-range bond-vector restraints for refining 3-D structures of macromolecules and their complexes. We will describe our experience bringing our programs into GenApp and illustrate the use of these gateways for specific examples of protein systems of high biological significance. We expect these gateways to be useful to structural biologists and biophysicists as well as NMR community and to stimulate other researchers to share their scientific software in a similar way. 
    more » « less
  5. There is strong agreement across the sciences that replicable workflows are needed for computational modeling. Open and replicable workflows not only strengthen public confidence in the sciences, but also result in more efficient community science. However, the massive size and complexity of geoscience simulation outputs, as well as the large cost to produce and preserve these outputs, present problems related to data storage, preservation, duplication, and replication. The simulation workflows themselves present additional challenges related to usability, understandability, documentation, and citation. These challenges make it difficult for researchers to meet the bewildering variety of data management requirements and recommendations across research funders and scientific journals. This paper introduces initial outcomes and emerging themes from the EarthCube Research Coordination Network project titled “What About Model Data? - Best Practices for Preservation and Replicability,” which is working to develop tools to assist researchers in determining what elements of geoscience modeling research should be preserved and shared to meet evolving community open science expectations. Specifically, the paper offers approaches to address the following key questions: • How should preservation of model software and outputs differ for projects that are oriented toward knowledge production vs. projects oriented toward data production? • What components of dynamical geoscience modeling research should be preserved and shared? • What curation support is needed to enable sharing and preservation for geoscience simulation models and their output? • What cultural barriers impede geoscience modelers from making progress on these topics? 
    more » « less