Abstract Large-scale processing and dissemination of distributed acoustic sensing (DAS) data are among the greatest computational challenges and opportunities of seismological research today. Current data formats and computing infrastructure are not well-adapted or user-friendly for large-scale processing. We propose an innovative, cloud-native solution for DAS seismology using the MinIO open-source object storage framework. We develop data schema for cloud-optimized data formats—Zarr and TileDB, which we deploy on a local object storage service compatible with the Amazon Web Services (AWS) storage system. We benchmark reading and writing performance for various data schema using canonical use cases in seismology. We test our framework on a local server and AWS. We find much-improved performance in compute time and memory throughout when using TileDB and Zarr compared to the conventional HDF5 data format. We demonstrate the platform with a computing heavy use case in seismology: ambient noise seismology of DAS data. We process one month of data, pairing all 2089 channels within 24 hr using AWS Batch autoscaling.
more »
« less
Intake / Pangeo Catalog: Making It Easier To Consume Earth’s Climate and Weather Data
Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc...). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it. In this notebook, we demonstrate the integration of data discovery tools such as intake and intake-esm (an intake plugin) with data stored in cloud optimized formats (zarr). We highlight (1) how these tools provide transparent access to local and remote catalogs and data, (2) the API for exploring arbitrary metadata associated with data, loading data sets into data array containers. We also showcase the Pangeo catalog, an open source project to enumerate and organize cloud optimized climate data stored across a variety of providers, and a place where several intake-esm collections are now publicly available. We use one of these public collections as an example to show how an end user would explore and interact with the data, and conclude with a short overview of the catalog's online presence.
more »
« less
- Award ID(s):
- 1937136
- PAR ID:
- 10284955
- Date Published:
- Journal Name:
- 2020 EarthCube Annual Meeting
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present gg, a framework and a set of command-line tools that helps people execute everyday applications—e.g., software compilation, unit tests, video encoding, or object recognition—using thousands of parallel threads on a cloud-functions service to achieve near-interactive completion time. In the future, instead of running these tasks on a laptop, or keeping a warm cluster running in the cloud, users might push a button that spawns 10,000 parallel cloud functions to execute a large job in a few seconds from start. gg is designed to make this practical and easy. With gg, applications express a job as a composition of lightweight OS containers that are individually transient (lifetimes of 1–60 seconds) and functional (each container is hermetically sealed and deterministic). gg takes care of instantiating these containers on cloud functions, loading dependencies, minimizing data movement, moving data between containers, and dealing with failure and stragglers. We ported several latency-sensitive applications to run on gg and evaluated its performance. In the best case, a distributed compiler built on gg outperformed a conventional tool (icecc) by 2–5×, without requiring a warm cluster running continuously. In the worst case, gg was within 20% of the hand-tuned performance of an existing tool for video encoding (ExCamera).more » « less
-
Abstract. Paleoclimate data assimilation (DA) is a tool for reconstructing past climates that directly integrates proxy records with climate model output. Despite the potential for DA to expand the scope of quantitative paleoclimatology, these methods remain difficult to implement in practice due to the multi-faceted requirements and data handling necessary for DA reconstructions, the diversity of DA methods, and the need for computationally efficient algorithms. Here, we present DASH, a MATLAB toolbox designed to facilitate paleoclimate DA analyses. DASH provides command line and scripting tools that implement common tasks in DA workflows. The toolbox is highly modular and is not built around any specific analysis, and thus DASH supports paleoclimate DA for a wide variety of time periods, spatial regions, proxy networks, and algorithms. DASH includes tools for integrating and cataloguing data stored in disparate formats, building state vector ensembles, and running proxy (system) forward models. The toolbox also provides optimized algorithms for implementing ensemble Kalman filters, particle filters, and optimal sensor analyses with variable and modular parameters. This paper reviews the key components of the DASH toolbox and presents examples illustrating DASH's use for paleoclimate DA applications.more » « less
-
Rich metadata is required to find and understand the recorded measurements from modern experiments with their immense and complex data stores. Systems to store and manage these metadata have improved over time, but in most cases are ad-hoc collections of data relationships, often represented in domain or site specific application code. We are developing a general set of tools to store, manage, and retrieve datarelationship metadata. These tools will be agnostic to the underlying data storage mechanisms, and to the data stored in them, making the system applicable across a wide range of science domains.more » « less
-
null (Ed.)Cherchiglia et al. Effects of ESM Use for Classroom Teams Proceedings of the Nineteenth Annual Pre-ICIS Workshop on HCI Research in MIS, Virtual Conference, December 12, 2020 1 An Exploration of the Effects of Enterprise Social Media Use for Classroom Teams Leticia Cherchiglia Michigan State University leticia@msu.edu Wietske Van Osch HEC Montreal & Michigan State University wietske.van-osch@hec.ca Yuyang Liang Michigan State University liangyuy@msu.edu Elisavet Averkiadi Michigan State University averkiad@msu.edu ABSTRACT This paper explores the adoption of Microsoft Teams, a group-based Enterprise Social Media (ESM) tool, in the context of a hybrid Information Technology Management undergraduate course from a large midwestern university. With the primary goal of providing insights into the use and design of tools for group-based educational settings, we constructed a model to reflect our expectations that core ESM affordances would enhance students’ perceptions of Microsoft Teams’ functionality and efficiency, which in turn would increase both students’ perceptions of group productivity and students’ actual usage of Microsoft Teams for communication purposes. In our model we used three core ESM affordances from Treem and Leonardi (2013), namely editability (i.e., information can be created and/or edited after creation, usually in a collaborative fashion), persistence (i.e., information is stored permanently), and visibility (i.e., information is visible to other users). Analysis of quantitative (surveys, server-side; N=62) and qualitative (interviews; N=7) data led to intriguing results. It seems that although students considered that editability, persistency, and visibility affordances within Microsoft Teams were convenient functions of this ESM, problems when working collaboratively (such as connectivity, formatting, and searching glitches) might have prevented considerations of this ESM as fast and user-friendly (i.e., efficient). Moreover, although perceived functionality and efficiency were positively connected to group productivity, hidden/non-intuitive communication features within this ESM might help explain the surprising negative connection between efficiency and usage of this ESM for the purpose of group communication. Another explanation is that, given the plethora of competing tools specifically designed to afford seamless/optimal team communication, students preferred to use more familiar tools or tools perceived as more efficient for group communication than Microsoft Teams, a finding consistent with findings in organizational settings (Van Osch, Steinfield, and Balogh, 2015). Beyond theoretical contributions related to the impact that ESM affordances have on users’ interaction perceptions, and the impact of users’ interaction perceptions on team and system outcomes, from a strategic and practical point of view, our findings revealed several challenges for the use of Microsoft Teams (and perhaps ESM at large) in educational settings: 1) As the demand for online education grows, collaborative tools such as Microsoft Teams should strive to provide seamless experiences for multiple-user access to files and messages; 2) Microsoft Teams should improve its visual design in order to increase ease of use, user familiarity, and intuitiveness; 3) Microsoft Teams appears to have a high-learning curve, partially related to the fact that some features are hidden or take extra steps/clicks to be accessed, thus undermining their use; 4) Team communication is a complex topic which should be further studied because, given the choice, students will fall upon familiar tools therefore undermining the full potential for team collaboration through the ESM. We expect that this paper can provide insights for educators faced with the choice for an ESM tool best-suited for group-based classroom settings, as well as designers interested in adapting ESMs to educational contexts, which is a promising avenue for market expansion.more » « less
An official website of the United States government

