. Granting agencies invest millions of dollars on the generation and analysis of data, making these products extremely valuable. However, without sufficient annotation of the methods used to collect and analyze the data, the ability to reproduce and reuse those products suffers. This lack of assurance of the quality and credibility of the data at the different stages in the research process essentially wastes much of the investment of time and funding and fails to drive research forward to the level of potential possible if everything was effectively annotated and disseminated to the wider research community. In order to address this issue for the Hawai'i Established Program to Stimulate Competitive Research (EPSCoR) project, a water science gateway was developed at the University of Hawai‘i (UH), called the ‘Ike Wai Gateway. In Hawaiian, ‘Ike means knowledge and Wai means water. The gateway supports research in hydrology and water management by providing tools to address questions of water sustainability in Hawai‘i. The gateway provides a framework for data acquisition, analysis, model integration, and display of data products. The gateway is intended to complement and integrate with the capabilities of the Consortium of Universities for the Advancement of Hydrologic Science's (CUAHSI) Hydroshare by providing sound data and metadata management capabilities for multi-domain field observations, analytical lab actions, and modeling outputs. Functionality provided by the gateway is supported by a subset of the CUAHSI’s Observations Data Model (ODM) delivered as centralized web based user interfaces and APIs supporting multi-domain data management, computation, analysis, and visualization tools to support reproducible science, modeling, data discovery, and decision support for the Hawai'i EPSCoR ‘Ike Wai research team and wider Hawai‘i hydrology community. By leveraging the Tapis platform, UH has constructed a gateway that ties data and advanced computing resources together to support diverse research domains including microbiology, geochemistry, geophysics, economics, and humanities, coupled with computational and modeling workflows delivered in a user friendly web interface with workflows for effectively annotating the project data and products. Disseminating results for the ‘Ike Wai project through the ‘Ike Wai data gateway and Hydroshare makes the research products accessible and reusable.
more »
« less
Enabling Data Streaming-based Science Gateways through Federated Cyberinfrastructure
Large scientific facilities are unique and complex infrastructures that have become fundamental instruments for enabling high quality, world-leading research to tackle scientific problems at unprecedented scales. Cyberinfrastructure (CI) is an essential component of these facilities, providing the user community with access to data, data products, and services with the potential to transform data into knowledge. However, the timely evolution of the CI available at large facilities is challenging and can result in science communities requirements not being fully satisfied. Furthermore, integrating CI across multiple facilities as part of a scientific workflow is hard, resulting in data silos. In this paper, we explore how science gateways can provide improved user experiences and services that may not be offered at large facility datacenters. Using a science gateway supported by the Science Gateway Community Institute, which provides subscription-based delivery of streamed data and data products from the NSF Ocean Observatories Initiative (OOI), we propose a system that enables streaming-based capabilities and workflows using data from large facilities, such as the OOI, in a scalable manner. We leverage data infrastructure building blocks, such as the Virtual Data Collaboratory, which provides data and comput- ing capabilities in the continuum to efficiently and collaboratively integrate multiple data-centric CIs, build data-driven workflows, and connect large facilities data sources with NSF-funded CI, such as XSEDE. We also introduce architectural solutions for running these workflows using dynamically provisioned federated CI.
more »
« less
- PAR ID:
- 10187419
- Date Published:
- Journal Name:
- Gateways 2019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The field of oceanography is transitioning from data-poor to data-rich, thanks in part to increased deployment ofin-situplatforms and sensors, such as those that instrument the US-funded Ocean Observatories Initiative (OOI). However, generating science-ready data products from these sensors, particularly those making biogeochemical measurements, often requires extensive end-user calibration and validation procedures, which can present a significant barrier. Openly available community-developed and -vetted Best Practices contribute to overcoming such barriers, but collaboratively developing user-friendly Best Practices can be challenging. Here we describe the process undertaken by the NSF-funded OOI Biogeochemical Sensor Data Working Group to develop Best Practices for creating science-ready biogeochemical data products from OOI data, culminating in the publication of the GOOS-endorsed OOI Biogeochemical Sensor Data Best Practices and User Guide. For Best Practices related to ocean observatories, engaging observatory staff is crucial, but having a “user-defined” process ensures the final product addresses user needs. Our process prioritized bringing together a diverse team and creating an inclusive environment where all participants could effectively contribute. Incorporating the perspectives of a wide range of experts and prospective end users through an iterative review process that included “Beta Testers’’ enabled us to produce a final product that combines technical information with a user-friendly structure that illustrates data analysis pipelines via flowcharts and worked examples accompanied by pseudo-code. Our process and its impact on improving the accessibility and utility of the end product provides a roadmap for other groups undertaking similar community-driven activities to develop and disseminate new Ocean Best Practices.more » « less
-
Summary Large scientific facilities provide researchers with instrumentation, data, and data products that can accelerate scientific discovery. However, increasing data volumes coupled with limited local computational power prevents researchers from taking full advantage of what these facilities can offer. Many researchers looked into using commercial and academic cyberinfrastructure (CI) to process these data. Nevertheless, there remains a disconnect between large facilities and CI that requires researchers to be actively part of the data processing cycle. The increasing complexity of CI and data scale necessitates new data delivery models, those that can autonomously integrate large‐scale scientific facilities and CI to deliver real‐time data and insights. In this paper, we present our initial efforts using the Ocean Observatories Initiative project as a use case. In particular, we present a subscription‐based data streaming service for data delivery that leverages the Apache Kafka data streaming platform. We also show how our solution can automatically integrate large‐scale facilities with CI services for automated data processing.more » « less
-
Large scale observatories are shared-use resources that provide open access to data from geographically distributed sensors and instruments. This data has the potential to accelerate scientific discovery. However, seamlessly integrating the data into scientific workflows remains a challenge. In this paper, we summarize our ongoing work in supporting data-driven and data-intensive workflows and outline our vision for how these observatories can improve large-scale science. Specifically, we present programming abstractions and runtime management services to enable the automatic integration of data in scientific workflows. Further, we show how approximation techniques can be used to address network and processing variations by studying constraint limitations and their associated latencies. We use the Ocean Observatories Initiative (OOI) as a driving use case for this work.more » « less
-
This article describes experiences and lessons learned from the Trusted CI project, funded by the US National Science Foundation (NSF) to serve the community as the NSF Cybersecurity Center of Excellence (CCoE). Trusted CI is an effort to address cybersecurity for the open science community through a single organization that provides leadership, training, consulting, and knowledge to that community. The article describes the experiences and lessons learned of Trusted CI regarding both cybersecurity for open science and managing the process of providing centralized services to a broad and diverse community.more » « less