skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The eWaterCycle platform for open and FAIR hydrological collaboration
Abstract. Hutton et al. (2016) argued that computational hydrology can only be a proper science if the hydrological community makes sure that hydrological model studies are executed and presented in a reproducible manner. Hut, Drost and van de Giesen replied that to achieve this hydrologists should not “re-invent the water wheel” but rather use existing technology from other fields (such as containers and ESMValTool) and open interfaces (such as the Basic Model Interface, BMI) to do their computational science (Hut et al., 2017). With this paper and the associated release of the eWaterCycle platform and software package (available on Zenodo: https://doi.org/10.5281/zenodo.5119389, Verhoeven et al., 2022), we are putting our money where our mouth is and providing the hydrological community with a “FAIR by design” (FAIR meaning findable, accessible, interoperable, and reproducible) platform to do science. The eWaterCycle platform separates the experiments done on the model from the model code. In eWaterCycle, hydrological models are accessed through a common interface (BMI) in Python and run inside of software containers. In this way all models are accessed in a similar manner facilitating easy switching of models, model comparison and model coupling. Currently the following models and model suites are available through eWaterCycle: PCR-GLOBWB 2.0, wflow, Hype, LISFLOOD, MARRMoT, and WALRUS While these models are written in different programming languages they can all be run and interacted with from the Jupyter notebook environment within eWaterCycle. Furthermore, the pre-processing of input data for these models has been streamlined by making use of ESMValTool. Forcing for the models available in eWaterCycle from well-known datasets such as ERA5 can be generated with a single line of code. To illustrate the type of research that eWaterCycle facilitates, this paper includes five case studies: from a simple “hello world” where only a hydrograph is generated to a complex coupling of models in different languages. In this paper we stipulate the design choices made in building eWaterCycle and provide all the technical details to understand and work with the platform. For system administrators who want to install eWaterCycle on their infrastructure we offer a separate installation guide. For computational hydrologists that want to work with eWaterCycle we also provide a video explaining the platform from a user point of view (https://youtu.be/eE75dtIJ1lk, last access: 28 June 2022)​​​​​​​. With the eWaterCycle platform we are providing the hydrological community with a platform to conduct their research that is fully compatible with the principles of both Open Science and FAIR science.  more » « less
Award ID(s):
1831623
PAR ID:
10342846
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; « less
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
15
Issue:
13
ISSN:
1991-9603
Page Range / eLocation ID:
5371 to 5390
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Open access to scientific data is increasingly recognized as critical to fostering scientific progress, trustworthy and reproducible science, global information equity, and evidence-based policymaking. It requires scientists to not only share their data, but to share in such a way that the data have high utility for later users. The FAIR data principles define a set of characteristics for making data “findable, accessible, interoperable, and reusable” (Wilkinson et al., 2016). Training scientists, particularly early-career scientists, on these principles can improve the volume and quality of open science data. 
    more » « less
  2. Credibility building activities in computational research include verification and validation, reproducibility and replication, and uncertainty quantification. Though orthogonal to each other, they are related. This paper presents validation and replication studies in electromagnetic excitations on nanoscale structures, where the quantity of interest is the wavelength at which resonance peaks occur. The study uses the open-source software PyGBe : a boundary element solver with treecode acceleration and GPU capability. We replicate a result by Rockstuhl et al. (2005, doi:10/dsxw9d) with a two-dimensional boundary element method on silicon carbide (SiC) particles, despite differences in our method. The second replication case from Ellis et al. (2016, doi:10/f83zcb) looks at aspect ratio effects on high-order modes of localized surface phonon-polariton nanostructures. The results partially replicate: the wavenumber position of some modes match, but for other modes they differ. With virtually no information about the original simulations, explaining the discrepancies is not possible. A comparison with experiments that measured polarized reflectance of SiC nano pillars provides a validation case. The wavenumber of the dominant mode and two more do match, but differences remain in other minor modes. Results in this paper were produced with strict reproducibility practices, and we share reproducibility packages for all, including input files, execution scripts, secondary data, post-processing code and plotting scripts, and the figures (deposited in Zenodo). In view of the many challenges faced, we propose that reproducible practices make replication and validation more feasible. This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’. 
    more » « less
  3. Unsupervised PCFG inducers hypothesize sets of compact context-free rules as explanations for sentences. PCFG induction not only provides tools for low-resource languages, but also plays an important role in modeling language acquisition (Bannard et al., 2009; Abend et al. 2017). However, current PCFG induction models, using word tokens as input, are unable to incorporate semantics and morphology into induction, and may encounter issues of sparse vocabulary when facing morphologically rich languages. This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information. Linguistically motivated sparsity and categorical distance constraints are imposed on the inducer as regularization. Experiments show that the PCFG induction model with normalizing flow produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed regularizers. 
    more » « less
  4. Abstract. The mesoscale meteorology of lake breezes along Lake Michiganimpacts local observations of high-ozone events. Previous manned aircraftand UAS observations have demonstrated non-uniform ozone concentrationswithin and above the marine layer over water and within shorelineenvironments. During the 2021 Wisconsin's Dynamic Influence of ShorelineCirculations on Ozone (WiscoDISCO-21) campaign, two UAS platforms, afixed-wing (University of Colorado RAAVEN) and a multirotor (PurdueUniversity DJI M210), were used simultaneously to capture lake breeze duringforecasted high-ozone events at Chiwaukee Prairie State Natural Area insoutheastern Wisconsin from 21–26 May 2021​​​​​​​. The RAAVEN platform (data DOI:https://doi.org/10.5281/zenodo.5142491, de Boer et al., 2021) measured temperature, humidity, and 3-D winds during2 h flights following two separate flight patterns up to three times per dayat altitudes reaching 500 m above ground level (a.g.l.). The M210 platform (data DOI: https://doi.org/10.5281/zenodo.5160346, Cleary et al., 2021a) measured vertical profiles of temperature, humidity,and ozone during 15 min flights up to six times per day at altitudesreaching 120 ma.g.l. near a Wisconsin DNR ground monitoringstation (AIRS ID: 55-059-0019). This campaign was conducted in conjunctionwith the Enhanced Ozone Monitoring plan from the Wisconsin DNR that included Dopplerlidar wind profiler observations at the site (dataDOI: https://doi.org/10.5281/zenodo.5213039, Cleary et al., 2021b). 
    more » « less
  5. Sentiment Analysis is a popular text classification task in natural language processing. It involves developing algorithms or machine learning models to determine the sentiment or opinion expressed in a piece of text. The results of this task can be used by business owners and product developers to understand their consumers’ perceptions of their products. Asides from customer feedback and product/service analysis, this task can be useful for social media monitoring (Martin et al., 2021). One of the popular applications of sentiment analysis is for classifying and detecting the positive and negative sentiments on movie reviews. Movie reviews enable movie producers to monitor the performances of their movies (Abhishek et al., 2020) and enhance the decision of movie viewers to know whether a movie is good enough and worth investing time to watch (Lakshmi Devi et al., 2020). However, the task has been under-explored for African languages compared to their western counterparts, ”high resource languages”, that are privileged to have received enormous attention due to the large amount of available textual data. African languages fall under the category of the low resource languages which are on the disadvantaged end because of the limited availability of data that gives them a poor representation (Nasim & Ghani, 2020). Recently, sentiment analysis has received attention on African languages in the Twitter domain for Nigerian (Muhammad et al., 2022) and Amharic (Yimam et al., 2020) languages. However, there is no available corpus in the movie domain. We decided to tackle the problem of unavailability of Yoru`ba´ data for movie sentiment analysis by creating the first Yoru`ba´ sentiment corpus for Nollywood movie reviews. Also, we develop sentiment classification models using state-of-the-art pre-trained language models like mBERT (Devlin et al., 2019) and AfriBERTa (Ogueji et al., 2021). 
    more » « less