skip to main content

Title: Viper: Interactive Exploration of Large Satellite Data
Significant increase in high-resolution satellite data requires more productive analysis methods to benefit data scientists. Interactive exploration is essential to productivity since it keeps the user en- gaged by providing quick responses. This paper addresses the pro- gressive zonal statistics problem that given big satellite data, an aggregate function, and a set of query polygons, zonal statistics computes the aggregate function for each query polygon over raster data. Efficiently querying complex polygons, reading high resolu- tion pixels and process multiple polygons simultaneously are three main challenges. This work introduces Viper, an interactive explo- ration pipeline to overcome these challenges and achieve require- ments. Viper uses a raster-vector index to bootstrap the answer with an accurate result in a short time. Then, it progressively refines the answer using a priority processing algorithm to produce the final answer. Experiments on large-scale real data show that Viper can reach 90% accuracy or higher up-to two orders of magnitude faster than baseline algorithms.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
the 18th International Symposium on Spatial and Temporal Data, SSTD 2023
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.

    !!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.

    File descriptions:

    Datasets are available in three spatial reference systems:

    1. Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.
    2. Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.
    3. Raster data in REGCAN-95, covering the Canary Islands only.
    4. Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.
    5. ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.

    Source data:

    HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:

    • Araba province:
    • Bizkaia province:
    • Gipuzkoa province:
    • Navarra region:
    • Other regions:
    • Data source of municipality polygons: Centro Nacional de Información Geográfica (

    Technical notes:

    Gridded data

    File nomenclature:



    • all: complete territory of Spain
    • can: Canarian Islands only
    • ibe: Iberic peninsula + Céuta + Melilla


    • laea: Lambert azimuthal equal area (EPSG:3035)
    • regcan: REGCAN95 / UTM zone 28N (EPSG:4083)
    • utm: ETRS89 / UTM zone 30N (EPSG:25830)


    • evolution / evol: multi-temporal physical measurements
    • landuse: multi-temporal building counts per land use (i.e., building function) class
    • physical / phys: physical building characteristics in 2020
    • temporal / temp: temporal characteristics (construction year statistics)

    Variables: evolution

    • budens: building density (count per grid cell area)
    • bufa: building footprint area
    • deva: developed area (any grid cell containing at least one building)
    • resbufa: residential building footprint area
    • resbia: residential building indoor area

    Variables: physical

    • bia: building indoor area
    • bufa: building footprint area
    • bunits: number of building units
    • dwel: number of dwellings

    Variables: temporal

    • mincoy: minimum construction year per grid cell
    • maxcoy: minimum construction year per grid cell
    • meancoy: mean construction year per grid cell
    • medcoy: median construction year per grid cell
    • modecoy: mode (most frequent) construction year per grid cell
    • varcoy: variety of construction years per grid cell

    Variable: landuse

    Counts of buildings per grid cell and land use type.

    Municipality-level data

    • hisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.

    Column names for the completeness statistics tables:

    • NATCODE: National municipality identifier*
    • num_total: number of buildings per munic
    • perc_bymiss: Percentage of buildings with missing built year (construction year)
    • perc_lumiss: Percentage of buildings with missing landuse attribute
    • perc_luother: Percentage of buildings with landuse type "other"
    • perc_num_floors_miss: Percentage of buildings without valid number of floors attribute
    • perc_num_dwel_miss: Percentage of buildings without valid number of dwellings attribute
    • perc_num_bunits_miss: Percentage of buildings without valid number of building units attribute
    • perc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attribute
    • perc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attribute

    The same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).

    *From the NATCODE, other regional identifiers can be derived as follows:

    • NATCODE: 34 01 04 04001
    • Country: 34
    • Comunidad autónoma (CA_CODE): 01
    • Province (PROV_CODE): 04
    • LAU code: 04001 (province + municipality code)
    more » « less
  2. The recent explosion in the number and size of spatio-temporal data sets from urban environments and social sensors creates new opportunities for data-driven approaches to understand and improve cities. Visual analytics systems like Urbane aim to empower domain experts to explore multiple data sets, at different time and space resolutions. Since these systems rely on computationally-intensive spatial aggregation queries that slice and summarize the data over different regions, an important challenge is how to attain interactivity. While traditional pre-aggregation approaches support interactive exploration, they are unsuitable in this setting because they do not support ad-hoc query constraints or polygons of arbitrary shapes. To address this limitation, we have recently proposed Raster Join, an approach that converts a spatial aggregation query into a set of drawing operations on a canvas and leverages the rendering pipeline of the graphics hardware (GPU). By doing so, Raster Join evaluates queries on the fly at interactive speeds on commodity laptops and desktops. In this demonstration, we showcase the efficiency of Raster Join by integrating it with Urbane and enabling interactivity. Demo visitors will interact with Urbane to filter and visualize several urban data sets over multiple resolutions. 
    more » « less
  3. Employing Differential Privacy (DP), the state-of-the-art privacy standard, to answer aggregate database queries poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? We propose to demonstrate DPXPlain, the first system for explaining group-by aggregate query answers with DP. DPXPlain allows users to compare values of two groups and receive a validity check, and further provides an explanation table with an interactive visualization, containing the approximately 'top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps.

    more » « less
  4. Abstract. Flood-protection levees have been built along rivers and coastlines globally. Current datasets, however, are generally confined to territorial boundaries (national datasets) and are not always easily accessible, posing limitations for hydrologic models and assessments of flood hazard. Here, we bridge this knowledge gap by collecting and standardizing global flood-protection levee data for river deltas into the open-source global river delta levee data environment, openDELvE. In openDELvE, we aggregate levee data from national databases, reports, maps, and satellite imagery. The database identifies the river delta land areas that the levees have been designed to protect. Where data are available, we record the extent and design specifications of the levees themselves (e.g., levee height, crest width, construction material) in a harmonized format. The 1657 polygons of openDELvE contain 19 248 km of levees and 44 733.505 km2 of leveed area. For the 153 deltas included in openDELvE, 17 % of the land area is confined by flood-protection levees. Around 26 % of delta population lives within the 17 % of delta area that is protected, making leveed areas densely populated. openDELvE data can help improve flood exposure assessments, many of which currently do not account for flood-protection levees. We find that current flood hazard assessments that do not include levees may exaggerate the delta flood exposure by 33 % on average, but up to 100 % for some deltas. The openDELvE is made public on an interactive platform (, 1 October 2022), which includes a community-driven revision tool to encourage inclusion of new levee data and continuous improvement and refinement of open-source levee data. 
    more » « less
  5. Challenges in interactive visualizations over satellite data collections stem primarily from their inherent data volumes. Enabling interactive visualizations of such data results in both processing and I/O (network and disk) on the server side. These are further exacerbated by multiple, concurrent requests issued by different clients. Hotspots may also arise when multiple users are interested in a particular geographical extent. We propose a novel methodology to support interactive visualizations over voluminous satellite imagery. Our system, codenamed Glance, generates models that once installed on the client side, substantially alleviate resource requirements on the server side. Our system dynamically generates imagery during zoom-in operations. Glance also supports image refinements using partial high-resolution information when available. Glance is based broadly on a deep Generative Adversarial Network, and our model is space-efficient to facilitate memory-residency at the clients. We supplement Glance with a module to estimate rendering errors when using the model to generate imagery as opposed to a resource-intensive query-and-retrieve operation to the server. Benchmarks to profile our methodology show substantive improvements in interactivity with up to 23x reduction in time lags without utilizing GPU and 297x-6627x reduction while harnessing GPU. Further, the perceptual quality of the images from our generative model is robust with PSNR values ranging from 32.2-40.5, depending on the scenario and upscale factor. 
    more » « less