skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Geospatial Knowledge Hypercube
Today a tremendous amount of geospatial knowledge is hidden in massive volumes of text data. To facilitate flexible and powerful geospatial analysis and applications, we introduce a new architecture: geospatial knowledge hypercube, a multi-scale, multidimensional knowledge structure that integrates information from geospatial dimensions, thematic themes and diverse application semantics, extracted and computed from spatial-related text data. To construct such a knowledge hypercube, weakly supervised language models are leveraged for automatic, dynamic and incremental extraction of heterogeneous geospatial data, thematic themes, latent connections and relationships, and application semantics, through combining a variety of information from unstructured text, structured tables, and maps. The hypercube lays a foundation for many knowledge discovery and in-depth spatial analysis, and other advanced applications. We have deployed a prototype web application of proposed geospatial knowledge hypercube for public access at: https://hcwebapp.cigi.illinois.edu/.  more » « less
Award ID(s):
2118329
PAR ID:
10543140
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400701689
Page Range / eLocation ID:
1 to 4
Subject(s) / Keyword(s):
Knowledge Hypercube Geographic Information Retrieval Weakly-Supervised Text Classification
Format(s):
Medium: X
Location:
Hamburg Germany
Sponsoring Org:
National Science Foundation
More Like this
  1. Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm. 
    more » « less
  2. Abstract Data containing geospatial semantics, such as geotagged tweets, travel blogs, and crime reports, associates natural language texts with geographical locations. This paper presents a lens‐based visual interaction technique, GTMapLens, to flexibly browse the geo‐text data on a map. It allows users to perform dynamic focus+context exploration by using movable lenses to browse geographical regions, find locations of interest, and perform comparative and drill‐down studies. Geo‐text data is visualized in a way that users can easily perceive the underlying geospatial semantics along with lens moving. Based on a requirement analysis with a cohort of multidisciplinary domain experts, a set of lens interaction techniques are developed including keywords control, path management, context visualization, and snapshot anchors. They allow users to achieve a guided and controllable exploration of geo‐text data. A hierarchical data model enables the interactive lens operations by accelerated data retrieval from a geo‐text database. Evaluation with real‐world datasets is presented to show the usability and effectiveness of GTMapLens. 
    more » « less
  3. This thesis explores geospatial vector data, including geometric shapes such as points, lines, and polygons. This data is crucial in navigation, urban planning, and many more applications. Geospatial computing is a multidisciplinary field that focuses on creating techniques and tools to handle large geospatial datasets. Given the reliance on data lakes to store large data sets in their raw formats, it is critical to have full support for geospatial datasets to enable scalable processing. To address this, we make two contributions in this area. First, we propose a column-oriented binary format called Spatial Parquet, which integrates geospatial vector data into Apache Parquet that enables significant data compression and efficient querying. Second, to improve support for semi-structured data, we introduce a distributed JSON processor for scalable SQL queries on large JSON datasets, including GeoJSON. It processes complex datasets like Open Street Map with features such as projection and filter push-down. Advances in Deep Learning (DL), including foundation models and Large Language Models (LLMs), offer opportunities for geospatial data analysis. We make three main contributions in this area. First, we study how to design DL models that can express a wide range of geospatial functions. We explore three representations: an image-based representa- tion using geo-referenced histograms (GeoImg), a graph-based point-set representation (Ge- oGraph), and a vector-based representation using a Fourier encoder (GeoVec). We formal- ize these representations and design corresponding models: ResNet and UNet for the first, PointNet++ for the second, and Poly2Vec with Transformers for the third. We evaluate all approaches on four spatial problems, showing the accuracy and effectiveness of the three approaches. Second, we create a benchmark called GS-QA for evaluating spatial question- answering with LLMs. A semi-automated process generates diverse question-answer pairs that cover various spatial objects, predicates, and complexities. An evaluation methodology is suggested with some experiments. Finally, a prototype for generating geospatial vector data from text prompts, called GeoGen I, is proposed. It has potential for applications such as spatial interpolation, data augmentation, and change analysis. We adapt diffusion models, traditionally used for generating realistic images, as geospatial data generators. We also explore their use for similarity search through geospatial data embeddings, highlighting the potential of vector databases in this domain. This thesis advances geospatial data processing, storage, analysis, and generation, opening new research pathways in geospatial computing. 
    more » « less
  4. Sila-Nowicka, Katarzyna; Moore, Antoni; O'Sullivan, David; Adams, Benjamin; Gahegan, Mark (Ed.)
    Geospatial Knowledge Graphs (GeoKGs) represent a significant advancement in the integration of AI-driven geographic information, facilitating interoperable and semantically rich geospatial analytics across various domains. This paper explores the use of topologically enriched GeoKGs, built on an explicit representation of S2 Geometry alongside precomputed topological relations, for constructing efficient geospatial analysis workflows within and across knowledge graphs (KGs). \r\nUsing the SAWGraph knowledge graph as a case study focused on enviromental contamination by PFAS, we demonstrate how this framework supports fundamental GIS operations - such as spatial filtering, proximity analysis, overlay operations and network analysis - in a GeoKG setting while allowing for the easy linking of these operations with one another and with semantic filters. This enables the efficient execution of complex geospatial analyses as semantically-explicit queries and enhances the usability of geospatial data across graphs. Additionally, the framework eliminates the need for explicit support for GeoSPARQL’s topological operations in the utilized graph databases and better integrates spatial knowledge into the overall semantic inference process supported by RDFS and OWL ontologies. 
    more » « less
  5. Encoding geospatial objects is fundamental for geospatial artificial intelligence (GeoAI) applications, which leverage machine learning (ML) models to analyze spatial information. Common approaches transform each object into known formats, like image and text, for compatibility with ML models. However, this process often discards crucial spatial information, such as the object’s position relative to the entire space, reducing downstream task effectiveness. Alternative encoding methods that preserve some spatial properties are often devised for specific data objects (e.g., point encoders), making them unsuitable for tasks that involve different data types (i.e., points, polylines, and polygons). To this end, we propose POLY2VEC, a polymorphic Fourier-based encoding approach that unifies the representation of geospatial objects, while preserving the essential spatial properties. POLY2VEC incorporates a learned fusion module that adaptively integrates the magnitude and phase of the Fourier transform for different tasks and geometries. We evaluate POLY2VEC on five diverse tasks, organized into two categories. The first empirically demonstrates that POLY2VEC consistently outperforms objectspecific baselines in preserving three key spatial relationships: topology, direction, and distance. The second shows that integrating POLY2VEC into a state-of-the-art GeoAI workflow improves the performance in two popular tasks: population prediction and land use inference. 
    more » « less