Gridded spatial datasets arise naturally in environmental, climatic, meteorological, and ecological settings. Each grid point encapsulates a vector of variables representing different measures of interest. Gridded datasets tend to be voluminous since they encapsulate observations for long timescales. Visualizing such datasets poses significant challenges stemming from the need to preserve interactivity, manage I/O overheads, and cope with data volumes. Here we present our methodology to significantly alleviate I/O requirements by leveraging deep neural network-based models and a distributed, in-memory cache to facilitate interactive visualizations. Our benchmarks demonstrate that deploying our lightweight models coupled with back-end caching and prefetching schemes can reduce the client's query response time by 92.3% while maintaining a high perceptual quality with a PSNR (peak signal-to-noise ratio) of 38.7 dB.
more »
« less
Deep Learning based Approach for Fast, Effective Visualization of Voluminous Gridded Spatial Observations
Gridded spatial datasets arise naturally in environmental, climatic, meteorological, and ecological settings. Each grid point encapsulates a vector of variables representing different measures of interest. Gridded datasets tend to be voluminous since they encapsulate observations for long timescales. Visualizing such datasets poses significant challenges stemming from the need to preserve interactivity, manage I/O overheads, and cope with data volumes. Here we present our methodology to significantly alleviate I/O requirements by leveraging deep neural network-based models.
more »
« less
- Award ID(s):
- 1931363
- PAR ID:
- 10448768
- Date Published:
- Journal Name:
- Ph.D. Forum. 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)
- Page Range / eLocation ID:
- 316 to 318
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Gridded datasets occur in several domains. These datasets comprise (un)structured grid points, where each grid point is characterized by XY(Z) coordinates in a spatial referencing system. The data available at individual grid points are high-dimensional encapsulating multiple variables of interest. This study has two thrusts. The first targets supporting effective management of voluminous gridded datasets while reconciling challenges relating to colocation and dispersion. The second thrust is to support sliding (temporal) window queries over the gridded dataset. Such queries involve sliding a temporal window over the data to identify spatial locations and chronological time points where the specified predicate evaluates to true. Our methodology includes support for a space-efficient data structure for organizing information within the data, query decomposition based on dyadic intervals, support for temporal anchoring, query transformations, and effective evaluation of query predicates. Our empirical benchmarks are conducted on representative voluminous high dimensional datasets such as gridMET (historical meteorological data) and MACA (future climate datasets based on the RCP 8.5 greenhouse gas trajectory). In our benchmarks, our system can handle throughputs of over 3000 multi-predicate sliding window queries per second.more » « less
-
Abstract Monthly and daily gridded precipitation datasets are one of the most demanded products in climatology and hydrology. These datasets describe the high spatial and temporal variability of precipitation as a continuous surface and for defined periods. However, due to the complex characteristics of precipitation, it is difficult to obtain accurate estimations. Thus, the creation of a gridded dataset from observations requires the comprehensive and precise application of quality control, reconstruction, and gridding procedures. Yet, despite multiple advances, most of the gridded datasets created and published since the mid‐1990s to the present use a wide variety of techniques, methods, and outputs, which can completely change the final representativity of the data. It is, therefore, critical to provide general guidelines for the development of future and more robust gridded datasets based on the data characteristics, geographical factors, and advanced statistical techniques. We identified gaps and challenges for near‐future perspectives and provide guidelines for implementing improved approaches based on the performance of 48 products. Finally, we concluded that, despite better spatial and temporal resolutions, data access, and data processing capabilities, observational coverage remains a challenge. Moreover, scientists should adopt tailored strategies to improve the representativity and uncertainty of the estimates. This article is categorized under:Science of Water > Hydrological ProcessesScience of Water > Water ExtremesScience of Water > Methodsmore » « less
-
Scientific data analysis pipelines face scalability bottlenecks when processing massive datasets that consist of millions of small files. Such datasets commonly arise in domains as diverse as detecting supernovae and post-processing computational fluid dynamics simulations. Furthermore, applications often use inference frameworks such as TensorFlow and PyTorch whose naive I/O methods exacerbate I/O bottlenecks. One solution is to use scientific file formats, such as HDF5 and FITS, to organize small arrays in one big file. However, storing everything in one file does not fully leverage the heterogeneous data storage capabilities of modern clusters. This paper presents Henosis, a system that intercepts data accesses inside the HDF5 library and transparently redirects I/O to the in-memory Redis object store or the disk-based TileDB array store. During this process, Henosis consolidates small arrays into bigger chunks and intelligently places them in data stores. A critical research aspect of Henosis is that it formulates object consolidation and data placement as a single optimization problem. Henosis carefully constructs a graph to capture the I/O activity of a workload and produces an initial solution to the optimization problem using graph partitioning. Henosis then refines the solution using a hill-climbing algorithm which migrates arrays between data stores to minimize I/O cost. The evaluation on two real scientific data analysis pipelines shows that consolidation with Henosis makes I/O 300× faster than directly reading small arrays from TileDB and 3.5× faster than workload-oblivious consolidation methods. Moreover, jointly optimizing consolidation and placement in Henosis makes I/O 1.7× faster than strategies that perform consolidation and placement independently.more » « less
-
Abstract Cities need climate information to develop resilient infrastructure and for adaptation decisions. The information desired is at the order of magnitudes finer scales relative to what is typically available from climate analysis and future projections. Urban downscaling refers to developing such climate information at the city (order of 1 – 10 km) and neighborhood (order of 0.1 – 1 km) resolutions from coarser climate products. Developing these higher resolution (finer grid spacing) data needed for assessments typically covering multiyear climatology of past data and future projections is complex and computationally expensive for traditional physics-based dynamical models. In this study, we develop and adopt a novel approach for urban downscaling by generating a general-purpose operator using deep learning. This ‘DownScaleBench’ tool can aid the process of downscaling to any location. The DownScaleBench has been generalized for both in situ (ground- based) and satellite or reanalysis gridded data. The algorithm employs an iterative super-resolution convolutional neural network (Iterative SRCNN) over the city. We apply this for the development of a high-resolution gridded precipitation product (300 m) from a relatively coarse (10 km) satellite-based product (JAXA GsMAP). The high-resolution gridded precipitation datasets is compared against insitu observations for past heavy rain events over Austin, Texas, and shows marked improvement relative to the coarser datasets relative to cubic interpolation as a baseline. The creation of this Downscaling Bench has implications for generating high-resolution gridded urban meteorological datasets and aiding the planning process for climate-ready cities.more » « less
An official website of the United States government

