skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CSDMS Data Components: data–model integration tools for Earth surface processes modeling
Abstract. Progress in better understanding and modeling Earth surface systems requires an ongoing integration of data and numerical models. Advances are currently hampered by technical barriers that inhibit finding, accessing, and executing modeling software with related datasets. We propose a design framework for Data Components, which are software packages that provide access to particular research datasets or types of data. Because they use a standard interface based on the Basic Model Interface (BMI), Data Components can function as plug-and-play components within modeling frameworks to facilitate seamless data–model integration. To illustrate the design and potential applications of Data Components and their advantages, we present several case studies in Earth surface processes analysis and modeling. The results demonstrate that the Data Component design provides a consistent and efficient way to access heterogeneous datasets from multiple sources and to seamlessly integrate them with various models. This design supports the creation of open data–model integration workflows that can be discovered, accessed, and reproduced through online data sharing platforms, which promotes data reuse and improves research transparency and reproducibility.  more » « less
Award ID(s):
1844181 2148762 2104102
PAR ID:
10495580
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Editor(s):
Wickert, A.
Publisher / Repository:
GMD
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
17
Issue:
5
ISSN:
1991-9603
Page Range / eLocation ID:
2165 to 2185
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Computational modeling occupies a unique niche in Earth and environmental sciences. Models serve not just as scientific technology and infrastructure but also as digital containers of the scientific community's understanding of the natural world. As this understanding improves, so too must the associated software. This dual nature – models as both infrastructure and hypotheses – means that modeling software must be designed to evolve continually as geoscientific knowledge itself evolves. Here we describe design principles, protocols, and tools developed by the Community Surface Dynamics Modeling System (CSDMS) to promote a flexible, interoperable, and ever-improving research software ecosystem. These include a community repository for model sharing and metadata, interface and ontology standards for model interoperability, language-bridging tools, a modular programming library for model construction, modular software components for data access, and a Python-based execution and model-coupling framework. Methods of community support and engagement that help create a community-centered software ecosystem are also discussed. 
    more » « less
  2. Abstract. Global change research demands a convergence among academic disciplines to understand complex changes in Earth system function. Limitations related to data usability and computing infrastructure, however, present barriers to effective use of the research tools needed for this cross-disciplinary collaboration. To address these barriers, we created a computational platform that pairs meteorological data and site-level ecosystem characterizations from the National Ecological Observatory Network (NEON) with the Community Terrestrial System Model (CTSM) that is developed with university partners at the National Center for Atmospheric Research (NCAR). This NCAR–NEON system features a simplified user interface that facilitates access to and use of NEON observations and NCAR models. We present preliminary results that compare observed NEON fluxes with CTSM simulations and describe how the collaboration between NCAR and NEON that can be used by the global change research community improves both the data and model. Beyond datasets and computing, the NCAR–NEON system includes tutorials and visualization tools that facilitate interaction with observational and model datasets and further enable opportunities for teaching and research. By expanding access to data, models, and computing, cyberinfrastructure tools like the NCAR–NEON system will accelerate integration across ecology and climate science disciplines to advance understanding in Earth system science and global change. 
    more » « less
  3. Motivated by a wide range of applications, research on agent-based models of contagion propagation over networks has attracted a lot of attention in the literature. Many of the available software systems for simulating such agent-based models require users to download software, build the executable and set up execution environments. Further, running the resulting executable may require access to high performance computing clusters. Our work describes an open access software system (NetSimS) that works under the “Modeling and Simulation as a Service” (MSaaS) paradigm. It allows users to run simulations by selecting agent-based models and parameters, initial conditions, and networks through a web interface. The system supports a variety of models and networks with millions of nodes and edges. In addition to the simulator, the system includes components that allow users to choose initial conditions for simulations in a variety of ways, to analyze the data generated through simulations, and to produce plots from the data. We describe the components of NetSimS and carry out a performance evaluation of the system. We also discuss two case studies carried out on large networks using the system. NetSimS is a major component within net.science, a cyberinfrastructure for network science. Index Terms—Agent-Based Simulation, Contagion, Networks, Modeling and Simulation as a Service, Cyberinfrastructure 
    more » « less
  4. Motivated by a wide range of applications, research on agent-based models of contagion propagation over networks has attracted a lot of attention in the literature. Many of the available software systems for simulating such agent-based models require users to download software, build the executable, and set up execution environments. Further, running the resulting executable may require access to high performance computing clusters. Our work describes an open access software system (NetSimS) that works under the “Modeling and Simulation as a Service” (MSaaS) paradigm. It enables users to run simulations by selecting models and parameter values, initial conditions, and networks through a web interface. The system supports a variety of models and networks with millions of nodes and edges. In addition to the simulator, the system includes components that enable users to choose initial conditions for simulations in a variety of ways, to analyze the data generated through simulations, and to produce plots from the data. We describe the components of NetSimS and carry out a performance evaluation of the system. We also discuss two case studies carried out on large networks using the system. NetSimS is a major component within net.science, a cyberinfrastructure for network science. 
    more » « less
  5. Recent rapid advances in deep pre-trained language models and the introduction of large datasets have powered research in embedding-based neural retrieval. While many excellent research papers have emerged, most of them come with their own implementations, which are typically optimized for some particular research goals instead of efficiency or code organization. In this paper, we introduce Tevatron, a neural retrieval toolkit that is optimized for efficiency, flexibility, and code simplicity. Tevatron enables model training and evaluation for a variety of ranking components such as dense retrievers, sparse retrievers, and rerankers. It also provides a standardized pipeline that includes text processing, model training, corpus/query encoding, and search. In addition, Tevatron incorporates well-studied methods for improving retriever effectiveness such as hard negative mining and knowledge distillation. We provide an overview of Tevatron in this paper, demonstrating its effectiveness and efficiency on multiple IR and QA datasets. We highlight Tevatron’s flexible design, which enables easy generalization across datasets, model architectures, and accelerator platforms (GPUs and TPUs). Overall, we believe that Tevatron can serve as a solid software foundation for research on neural retrieval systems, including their design, modeling, and optimization. 
    more » « less