skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Demonstration of Interactive Exploration of Big Geospatial Data on UCR-Star
The ever rising volume of geospatial data is undeniable. So is the need to explore and analyze these datasets. However, these datasets vary widely in their size, coverage, and accuracy. Therefore, users need to assess these aspects of the data to choose the right dataset to use in their analysis. Unfortunately, all the publicly available repositories for geospatial datasets provide a list of datasets with some information about them with no way to explore the datasets beforehand. Through this demonstration, we propose the repository, UCR-Star, that is capable of hosting hundreds of thousands of geospatial datasets that a user can explore visually to judge their quality before even downloading them. This demo provides a deeper dive into the core engine behind UCR-Star. It provides a web interface geared towards database researchers to understand how the index internally works. It provides a comparison interface where the attendees can see side-by-side how two versions of the system work with the ability to customize each of them separately. Finally, the interface reports the response time of the indexes for a quantitative comparison.  more » « less
Award ID(s):
1924694 1925610 1954644
PAR ID:
10293607
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
SIGSPATIAL '20: Proceedings of the 28th International Conference on Advances in Geographic Information Systems
Page Range / eLocation ID:
151 to 154
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the rise of data science, there has been a sharp increase in data-driven techniques that rely on both real and synthetic data. At the same time, there is a growing interest from the scientific com- munity in the reproducibility of results. Some conferences include this explicitly in their review forms or give special badges to repro- ducible papers. This tutorial describes two systems that facilitate the design of reproducible experiments on both real and synthetic data. UCR-Star is an interactive repository that hosts terabytes of open geospatial data. In addition to the ability to explore and visu- alize this data, UCR-Star makes it easy to share all or parts of these datasets in many standard formats ensuring that other researchers can get the same exact data mentioned in the paper. Spider is a spa- tial data generator that generates standardized spatial datasets with full control over the data characteristics which further promotes the reproducibility of results. This tutorial will be organized into two parts. The first part will exhibit the key features of UCR-star and Spider where participants can get hands-on experience in in- teracting with real spatial datasets, generating synthetic data with varying distributions, and downloading them to a local machine or a remote server. The second part will explore the integration of both UCR-Star and Spider into existing systems such as QGIS and Apache AsterixDB. 
    more » « less
  2. null (Ed.)
    Modern visual data exploration systems are designed as client-server applications where the front-end interface generates a large number of queries to the back-end which are handled by a database server. As data exploration being a trial and error process, a significant amount of these queries return an empty result, which does not change the state of the visualization. These requests still add a significant overhead on network communication, request handling, and data processing. Moreover, given the virtually unlimited query space, it is impractical to enumerate and send all empty (or all non-empty) queries to the client to filter them. This paper introduces HQ-Filter, a hierarchy-aware filter for empty resulting queries, which utilizes the hierarchical nature of the data to construct a configurable and probabilistic filter. HQ-Filter can filter out empty-resulting queries at the client-side with a minimal size and processing overhead. HQ-Filter is applied to two existing data exploration systems for geospatial data, UCR-Star and Cloudberry. In both cases, it can successfully eliminate hundreds of queries per user which results in up-to 66% increase in server capacity by providing up to 15x speedup for average response time and up to 90% decrease in the server workload. 
    more » « less
  3. This thesis explores geospatial vector data, including geometric shapes such as points, lines, and polygons. This data is crucial in navigation, urban planning, and many more applications. Geospatial computing is a multidisciplinary field that focuses on creating techniques and tools to handle large geospatial datasets. Given the reliance on data lakes to store large data sets in their raw formats, it is critical to have full support for geospatial datasets to enable scalable processing. To address this, we make two contributions in this area. First, we propose a column-oriented binary format called Spatial Parquet, which integrates geospatial vector data into Apache Parquet that enables significant data compression and efficient querying. Second, to improve support for semi-structured data, we introduce a distributed JSON processor for scalable SQL queries on large JSON datasets, including GeoJSON. It processes complex datasets like Open Street Map with features such as projection and filter push-down. Advances in Deep Learning (DL), including foundation models and Large Language Models (LLMs), offer opportunities for geospatial data analysis. We make three main contributions in this area. First, we study how to design DL models that can express a wide range of geospatial functions. We explore three representations: an image-based representa- tion using geo-referenced histograms (GeoImg), a graph-based point-set representation (Ge- oGraph), and a vector-based representation using a Fourier encoder (GeoVec). We formal- ize these representations and design corresponding models: ResNet and UNet for the first, PointNet++ for the second, and Poly2Vec with Transformers for the third. We evaluate all approaches on four spatial problems, showing the accuracy and effectiveness of the three approaches. Second, we create a benchmark called GS-QA for evaluating spatial question- answering with LLMs. A semi-automated process generates diverse question-answer pairs that cover various spatial objects, predicates, and complexities. An evaluation methodology is suggested with some experiments. Finally, a prototype for generating geospatial vector data from text prompts, called GeoGen I, is proposed. It has potential for applications such as spatial interpolation, data augmentation, and change analysis. We adapt diffusion models, traditionally used for generating realistic images, as geospatial data generators. We also explore their use for similarity search through geospatial data embeddings, highlighting the potential of vector databases in this domain. This thesis advances geospatial data processing, storage, analysis, and generation, opening new research pathways in geospatial computing. 
    more » « less
  4. With recent advancements, large language models (LLMs) such as ChatGPT and Bard have shown the potential to disrupt many industries, from customer service to healthcare. Traditionally, humans interact with geospatial data through software (e.g., ArcGIS 10.3) and programming languages (e.g., Python). As a pioneer study, we explore the possibility of using an LLM as an interface to interact with geospatial datasets through natural language. To achieve this, we also propose a framework to (1) train an LLM to understand the datasets, (2) generate geospatial SQL queries based on a natural language question, (3) send the SQL query to the backend database, (4) parse the database response back to human language. As a proof of concept, a case study was conducted on real-world data to evaluate its performance on various queries. The results show that LLMs can be accurate in generating SQL code for most cases, including spatial joins, although there is still room for improvement. As all geospatial data can be stored in a spatial database, we hope that this framework can serve as a proxy to improve the efficiency of spatial data analyses and unlock the possibility of automated geospatial analytics. 
    more » « less
  5. null (Ed.)
    Geospatial technologies and geographic methods are foundational skills in modern water resources monitoring, research, management, and policy-making. Understanding and sustaining healthy water resources depends on spatial awareness of watersheds, land use, hydrologic networks, and the communities that depend on these resources. Water professionals across disciplines are expected to have familiarity with hydrologic geospatial data. Proficiency in spatial thinking and competency reading hydrologic maps are essential skills. In addition, climate change and non-stationary ecological conditions require water specialists to utilize dynamic, time-enabled spatiotemporal datasets to examine shifting patterns and changing environments. Future water specialists will likely require even more advanced geospatial knowledge with the implementation of distributed internet-of-things sensor networks and the collection of mobility data. To support the success of future water professionals and increase hydrologic awareness in our broader communities, teachers in higher education must consider how their curriculum provides students with these vital geospatial skills. This paper considers pedagogical perspectives from educators with expertise in remote sensing, geomorphology, human geography, environmental science, ecology, and private industry. These individuals share a wealth of experience teaching geographic techniques such as GIS, remote sensing, and field methods to explore water resources. The reflections of these educators provide a snapshot of current approaches to teaching water and geospatial techniques. This commentary captures faculty experiences, ambitions, and suggestions for teaching at this moment in time. 
    more » « less