skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Open Science Expectations for Simulation-Based Research
There is strong agreement across the sciences that replicable workflows are needed for computational modeling. Open and replicable workflows not only strengthen public confidence in the sciences, but also result in more efficient community science. However, the massive size and complexity of geoscience simulation outputs, as well as the large cost to produce and preserve these outputs, present problems related to data storage, preservation, duplication, and replication. The simulation workflows themselves present additional challenges related to usability, understandability, documentation, and citation. These challenges make it difficult for researchers to meet the bewildering variety of data management requirements and recommendations across research funders and scientific journals. This paper introduces initial outcomes and emerging themes from the EarthCube Research Coordination Network project titled “What About Model Data? - Best Practices for Preservation and Replicability,” which is working to develop tools to assist researchers in determining what elements of geoscience modeling research should be preserved and shared to meet evolving community open science expectations. Specifically, the paper offers approaches to address the following key questions: • How should preservation of model software and outputs differ for projects that are oriented toward knowledge production vs. projects oriented toward data production? • What components of dynamical geoscience modeling research should be preserved and shared? • What curation support is needed to enable sharing and preservation for geoscience simulation models and their output? • What cultural barriers impede geoscience modelers from making progress on these topics?  more » « less
Award ID(s):
1929757
PAR ID:
10356857
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Frontiers in Climate
Volume:
3
ISSN:
2624-9553
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract It has become common for researchers to make their data publicly available to meet the data management and accessibility requirements of funding agencies and scientific publishers. However, many researchers face the challenge of determining what data to preserve and share and where to preserve and share those data. This can be especially challenging for those who run dynamical models, which can produce complex, voluminous data outputs, and have not considered what outputs may need to be preserved and shared as part of the project design. This manuscript presents findings from the NSF EarthCube Research Coordination Network project titled “What About Model Data? Best Practices for Preservation and Replicability” (https://modeldatarcn.github.io/). These findings suggest that if the primary goal of sharing data are to communicate knowledge, most simulation-based research projects only need to preserve and share selected model outputs along with the full simulation experiment workflow. One major result of this project has been the development of a rubric, designed to provide guidance for making decisions on what simulation output needs to be preserved and shared in trusted community repositories to achieve the goal of knowledge communication. This rubric, along with use cases for selected projects, provide scientists with guidance on data accessibility requirements in the planning process of research, allowing for more thoughtful development of data management plans and funding requests. Additionally, this rubric can be referred to by publishers for what is expected in terms of data accessibility for publication. 
    more » « less
  2. This paper introduces the citizen science platform, LanguageARC, developed within the NIEUW (Novel Incentives and Workflows) project supported by the National Science Foundation under Grant No. 1730377. LanguageARC is a community- oriented online platform bringing together researchers and “citizen linguists” with the shared goal of contributing to linguistic research and language technology development. Like other Citizen Science platforms and projects, LanguageARC harnesses the power and efforts of volunteers who are motivated by the incentives of contributing to science, learning and discovery, and belonging to a community dedicated to social improvement. Citizen linguists contribute language data and judgments by participating in research tasks such as classifying regional accents from audio clips, recording audio of picture descriptions and answering personality questionnaires to create baseline data for NLP research into autism and neurodegenerative conditions. Researchers can create projects on Language ARC without any coding or HTML required using our Project Builder Toolkit. 
    more » « less
  3. This is a story about the challenges and opportunities that surfaced while answering a deceptively complex question - where's the data? As faculty and researchers publish articles, datasets, and other research outputs to meet promotion and tenure requirements, address federal funding policies, and institutional open access and data sharing policies, many online locations for publishing these materials have developed over time. How can we capture where all of the research generated on an academic campus is shared and preserved? This presentation will discuss how our multi-institution collaboration, the Reality of Academic Data Sharing (RADS) Initiative, sought to answer this question. We programmatically pulled DOIs from DataCite and CrossRef, making the naive assumption that these platforms, the two predominant DOI registration agencies for US data, would present us with a neutral and unbiased view of where data from our affiliated researchers were shared. However, as we dug into the data, we found inconsistencies in the use and completeness of the necessary metadata fields for our questions, as well as differences in how DOIs were assigned across repositories. Additionally, we recognized the systematic and privileged bias introduced by our choice of data sources. Specifically, while DataCite and CrossRef provide easy discovery of research outputs because they aggregate DOIs, they are also costly commercial services. Many repositories that cannot afford such services or lack local staffing and knowledge required to use these services are left out of the technology that has recently been labeled “global research infrastructure”. Our presentation will identify the challenges we encountered in conducting this research specifically around finding the data, and cleaning and interpreting the data. We will further engage the audience in a discussion around increasing representation in the global research infrastructure to discover and account for more research outputs. 
    more » « less
  4. Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability. 
    more » « less
  5. To reshape energy systems towards renewable energy resources, decision makers need to decide today on how to make the transition. Energy scenarios are widely used to guide decision making in this context. While considerable effort has been put into developing energy scenarios, researchers have pointed out three requirements for energy scenarios that are not fulfilled satisfactorily yet: The development and evaluation of energy scenarios should (1) incorporate the concept of sustainability, (2) provide decision support in a transparent way and (3) be replicable for other researchers. To meet these requirements, we combine different methodological approaches: story-and-simulation (SAS) scenarios, multi-criteria decision-making (MCDM), information modeling and co-simulation. We show in this paper how the combination of these methods can lead to an integrated approach for sustainability evaluation of energy scenarios with automated information exchange. Our approach consists of a sustainability evaluation process (SEP) and an information model for modeling dependencies. The objectives are to guide decisions towards sustainable development of the energy sector and to make the scenario and decision support processes more transparent for both decision makers and researchers. 
    more » « less