skip to main content


Title: Viper: Interactive Exploration of Large Satellite Data
Significant increase in high-resolution satellite data requires more productive analysis methods to benefit data scientists. Interactive exploration is essential to productivity since it keeps the user en- gaged by providing quick responses. This paper addresses the pro- gressive zonal statistics problem that given big satellite data, an aggregate function, and a set of query polygons, zonal statistics computes the aggregate function for each query polygon over raster data. Efficiently querying complex polygons, reading high resolu- tion pixels and process multiple polygons simultaneously are three main challenges. This work introduces Viper, an interactive explo- ration pipeline to overcome these challenges and achieve require- ments. Viper uses a raster-vector index to bootstrap the answer with an accurate result in a short time. Then, it progressively refines the answer using a priority processing algorithm to produce the final answer. Experiments on large-scale real data show that Viper can reach 90% accuracy or higher up-to two orders of magnitude faster than baseline algorithms.  more » « less
Award ID(s):
2046236
NSF-PAR ID:
10438960
Author(s) / Creator(s):
;
Date Published:
Journal Name:
the 18th International Symposium on Spatial and Temporal Data, SSTD 2023
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The recent explosion in the number and size of spatio-temporal data sets from urban environments and social sensors creates new opportunities for data-driven approaches to understand and improve cities. Visual analytics systems like Urbane aim to empower domain experts to explore multiple data sets, at different time and space resolutions. Since these systems rely on computationally-intensive spatial aggregation queries that slice and summarize the data over different regions, an important challenge is how to attain interactivity. While traditional pre-aggregation approaches support interactive exploration, they are unsuitable in this setting because they do not support ad-hoc query constraints or polygons of arbitrary shapes. To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU). By doing so, Raster Join evaluates queries on the fly at interactive speeds on commodity laptops and desktops. In this demonstration, we showcase the efficiency of Raster Join by integrating it with Urbane and enabling interactivity. Demo visitors will interact with Urbane to filter and visualize several urban data sets over multiple resolutions. 
    more » « less
  2. Employing Differential Privacy (DP), the state-of-the-art privacy standard, to answer aggregate database queries poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? We propose to demonstrate DPXPlain, the first system for explaining group-by aggregate query answers with DP. DPXPlain allows users to compare values of two groups and receive a validity check, and further provides an explanation table with an interactive visualization, containing the approximately 'top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps.

     
    more » « less
  3. This paper studies the spatial group-by query over complex polygons. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within the boundaries of each polygon. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query on highly skewed spatial data. We also propose an effective query optimizer that adaptively assigns the appropriate processing scheme based on the query polygons. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques. 
    more » « less
  4. Abstract. Processing Earth observation data modelled in a time-series of raster format is critical to solving some of the most complex problems in geospatial science ranging from climate change to public health. Researchers are increasingly working with these large raster datasets that are often terabytes in size. At this scale, traditional GIS methods may fail to handle the processing, and new approaches are needed to analyse these datasets. The objective of this work is to develop methods to interactively analyse big raster datasets with the goal of most efficiently extracting vector data over specific time periods from any set of raster data. In this paper, we describe RINX (Raster INformation eXtraction) which is an end-to-end solution for automatic extraction of information from large raster datasets. RINX heavily utilises open source geospatial techniques for information extraction. It also complements traditional approaches with state-of-the- art high-performance computing techniques. This paper discusses details of achieving big temporal data extraction with RINX, implemented on the use case of air quality and climate data extraction for long term health studies, which includes methods used, code developed, processing time statistics, project conclusions, and next steps. 
    more » « less
  5. Data are available for download at http://arcticdata.io/data/10.18739/A2KW57K57 Permafrost can be indirectly detected via remote sensing techniques through the presence of ice-wedge polygons, which are a ubiquitous ground surface feature in tundra regions. Ice-wedge polygons form through repeated annual cracking of the ground during cold winter days. In spring, the cracks fill in with snowmelt water, creating ice wedges, which are connected across the landscape in an underground network and that can grow to several meters depth and width. The growing ice wedges push the soil upwards, forming ridges that bound low-centered ice-wedge polygons. If the top of the ice wedge melts, the ground subsides and the ridges become troughs and the ice-wedge polygons become high-centered. Here, a Convolutional Neural Network is used to map the boundaries of individual ice-wedge polygons based on high-resolution commercial satellite imagery obtained from the Polar Geospatial Center. This satellite imagery used for the detection of ice-wedge polygons represent years between 2001 and 2021, so this dataset represents ice-wedge polygons mapped from different years. This dataset does not include a time series (i.e. same area mapped more than once). The shapefiles are masked, reprojected, and processed into GeoPackages with calculated attributes for each ice-wedge polygon such as circumference and width. The GeoPackages are then rasterized with new calculated attributes for ice-wedge polygon coverage such a coverage density. This release represents the region classified as “high ice” by Brown et al. 1997. The dataset is available to explore on the Permafrost Discovery Gateway (PDG), an online platform that aims to make big geospatial permafrost data accessible to enable knowledge-generation by researchers and the public. The PDG project creates various pan-Arctic data products down to the sub-meter and monthly resolution. Access the PDG Imagery Viewer here: https://arcticdata.io/catalog/portals/permafrost Data limitations in use: This data is part of an initial release of the pan-Arctic data product for ice-wedge polygons, and it is expected that there are constraints on its accuracy and completeness. Users are encouraged to provide feedback regarding how they use this data and issues they encounter during post-processing. Please reach out to the dataset contact or a member of the PDG team via support@arcticdata.io. 
    more » « less