skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enabling Fast, Effective Visualization of Voluminous Gridded Spatial Datasets
Gridded spatial datasets arise naturally in environmental, climatic, meteorological, and ecological settings. Each grid point encapsulates a vector of variables representing different measures of interest. Gridded datasets tend to be voluminous since they encapsulate observations for long timescales. Visualizing such datasets poses significant challenges stemming from the need to preserve interactivity, manage I/O overheads, and cope with data volumes. Here we present our methodology to significantly alleviate I/O requirements by leveraging deep neural network-based models and a distributed, in-memory cache to facilitate interactive visualizations. Our benchmarks demonstrate that deploying our lightweight models coupled with back-end caching and prefetching schemes can reduce the client's query response time by 92.3% while maintaining a high perceptual quality with a PSNR (peak signal-to-noise ratio) of 38.7 dB.  more » « less
Award ID(s):
1931363 2312319
PAR ID:
10448759
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
Page Range / eLocation ID:
592 to 604
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gridded spatial datasets arise naturally in environmental, climatic, meteorological, and ecological settings. Each grid point encapsulates a vector of variables representing different measures of interest. Gridded datasets tend to be voluminous since they encapsulate observations for long timescales. Visualizing such datasets poses significant challenges stemming from the need to preserve interactivity, manage I/O overheads, and cope with data volumes. Here we present our methodology to significantly alleviate I/O requirements by leveraging deep neural network-based models. 
    more » « less
  2. Gridded datasets occur in several domains. These datasets comprise (un)structured grid points, where each grid point is characterized by XY(Z) coordinates in a spatial referencing system. The data available at individual grid points are high-dimensional encapsulating multiple variables of interest. This study has two thrusts. The first targets supporting effective management of voluminous gridded datasets while reconciling challenges relating to colocation and dispersion. The second thrust is to support sliding (temporal) window queries over the gridded dataset. Such queries involve sliding a temporal window over the data to identify spatial locations and chronological time points where the specified predicate evaluates to true. Our methodology includes support for a space-efficient data structure for organizing information within the data, query decomposition based on dyadic intervals, support for temporal anchoring, query transformations, and effective evaluation of query predicates. Our empirical benchmarks are conducted on representative voluminous high dimensional datasets such as gridMET (historical meteorological data) and MACA (future climate datasets based on the RCP 8.5 greenhouse gas trajectory). In our benchmarks, our system can handle throughputs of over 3000 multi-predicate sliding window queries per second. 
    more » « less
  3. Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers.This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model. 
    more » « less
  4. Abstract Cities need climate information to develop resilient infrastructure and for adaptation decisions. The information desired is at the order of magnitudes finer scales relative to what is typically available from climate analysis and future projections. Urban downscaling refers to developing such climate information at the city (order of 1 – 10 km) and neighborhood (order of 0.1 – 1 km) resolutions from coarser climate products. Developing these higher resolution (finer grid spacing) data needed for assessments typically covering multiyear climatology of past data and future projections is complex and computationally expensive for traditional physics-based dynamical models. In this study, we develop and adopt a novel approach for urban downscaling by generating a general-purpose operator using deep learning. This ‘DownScaleBench’ tool can aid the process of downscaling to any location. The DownScaleBench has been generalized for both in situ (ground- based) and satellite or reanalysis gridded data. The algorithm employs an iterative super-resolution convolutional neural network (Iterative SRCNN) over the city. We apply this for the development of a high-resolution gridded precipitation product (300 m) from a relatively coarse (10 km) satellite-based product (JAXA GsMAP). The high-resolution gridded precipitation datasets is compared against insitu observations for past heavy rain events over Austin, Texas, and shows marked improvement relative to the coarser datasets relative to cubic interpolation as a baseline. The creation of this Downscaling Bench has implications for generating high-resolution gridded urban meteorological datasets and aiding the planning process for climate-ready cities. 
    more » « less
  5. null (Ed.)
    Abstract. Topography is a fundamental input to hydrologic models criticalfor generating realistic streamflow networks as well as infiltration andgroundwater flow. Although there exist several national topographic datasetsfor the United States, they may not be compatible with gridded models thatrequire hydrologically consistent digital elevation models (DEMs). Here, wepresent a national topographic dataset developed to support griddedhydrologic simulations at 1 km and 250 m spatial resolution over the contiguousUnited States. The workflow is described step by step in two parts: (a) DEMprocessing using a Priority Flood algorithm to ensure hydrologicallyconsistent drainage networks and (b) slope calculation and smoothing toimprove drainage performance. The accuracy of the derived stream network isevaluated by comparing the derived drainage area to drainage areas reportedby the national stream gage network. The slope smoothing steps are evaluatedusing the runoff simulations with an integrated hydrologic model. Our DEMproduct started from the National Water Model DEM to ensure our finaldatasets will be as consistent as possible with this existing nationalframework. Our analysis shows that the additional processing we provideimproves the consistency of simulated drainage areas and the runoffsimulations that simulate gridded overland flow (as opposed to a networkrouting scheme). The workflow uses an open-source R package, and all outputdatasets and processing scripts are available and fully documented. All ofthe output datasets and scripts for processing are published through CyVerseat 250 m and 1 km resolution. The DOI link for the dataset is https://doi.org/10.25739/e1ps-qy48 (Zhang and Condon, 2020). 
    more » « less