skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 8, 2025

Title: HierGP: Hierarchical Grid Partitioning for Scalable Geospatial Data Analytics
Application domains such as environmental health science, climate science, and geosciences—where the relationship between humans and the environment is studied—are constantly evolving and require innovative approaches in geospatial data analysis. Recent technological advancements have led to the proliferation of high-granularity geospatial data, enabling such domains but posing major challenges in managing vast datasets that have high spatiotemporal similarities. We introduce the Hierarchical Grid Partitioning (HierGP) framework to address this issue. Unlike conventional discrete global grid systems, HierGP dynamically adapts to the data’s inherent characteristics. At the core of our framework is the Map Point Reduction (MPR) algorithm, designed to aggregate and then collapse data points based on user-defined similarity criteria. This effectively reduces data volume while preserving essential information. The reduction process is particularly effective in handling environmental data from extensive geographical regions. We structure the data into a multilevel hierarchy from which a reduced representative dataset can be extracted. We compare the performance of HierGP against several state-of-the-art geospatial indexing algorithms and demonstrate that HierGP outperforms the existing approaches in terms of runtime, memory footprint, and scalability. We illustrate the benefits of the HierGP approach using two representative applications: analysis of over 289 million location samples from a registry of participants and efficient extraction of environmental data from large polygons. While the application demonstration in this work has focused on environmental health, the methodology of the HierGP framework can be extended to explore diverse geospatial analytics domains.  more » « less
Award ID(s):
2126449
PAR ID:
10548507
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Spatial Algorithms and Systems
ISSN:
2374-0353
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary The interdisciplinary field of cyberGIS (geographic information science and systems (GIS) based on advanced cyberinfrastructure) has a major focus on data‐ and computation‐intensive geospatial analytics. The rapidly growing needs across many application and science domains for such analytics based on disparate geospatial big data poses significant challenges to conventional GIS approaches. This paper describes CyberGIS‐Jupyter, an innovative cyberGIS framework for achieving data‐intensive, reproducible, and scalable geospatial analytics using Jupyter Notebook based on ROGER, the first cyberGIS supercomputer. The framework adapts the Notebook with built‐in cyberGIS capabilities to accelerate gateway application development and sharing while associated data, analytics, and workflow runtime environments are encapsulated into application packages that can be elastically reproduced through cloud‐computing approaches. As a desirable outcome, data‐intensive and scalable geospatial analytics can be efficiently developed and improved and seamlessly reproduced among multidisciplinary users in a novel cyberGIS science gateway environment. 
    more » « less
  2. A robust multi-functional framework for widespread planning of nature-based solutions (NBS) must incorporate components of social equity and hydro-environmental performance in a cost-effective manner. NBS systems address stormwater mitigation by increasing on-site infiltration and evaporation through enhanced greenspace while also improving various components of societal well-being, such as physical health (e.g., heart disease, diabetes), mental health (e.g., post-traumatic stress disorder, depression), and social cohesion. However, current optimization tools for NBS systems rely on stormwater quantity abatement and, to a lesser extent, economic costs and environmental pollutant mitigation. Therefore, the objective of this study is to explore how NBS planning may be improved to maximize hydrological, environmental, and social co-benefits in an unequivocal and equitable manner. Here, a novel equity-based indexing framework is proposed to better understand how we might optimize social and physical functionalities of NBS systems as a function of transdisciplinary characteristics. Specifically, this study explores the spatial tradeoffs associated with NBS allocation by first optimizing a local watershed-scale model according to traditional metrics of stormwater efficacy (e.g., cost efficiency, hydrological runoff reduction, and pollutant load reduction) using SWMM modeling. The statistical dispersion of social health is then identified using the Area Deprivation Index (ADI), which is a high-resolution spatial account of socioeconomic disadvantages that have been linked to adverse health outcomes, according to United States census properties. As NBSs have been shown to mitigate various adverse health conditions through increased urban greening, this improved understanding of geospatial health characteristics may be leveraged to inform an explicit representation of social wellness within NBS planning frameworks. This study presents and demonstrates a novel framework for integrating hydro-environmental modeling, economic efficiency, and social health deprivation using a dimensionless Gini coefficient, which is intended to spur the positive connection of social and physical influences within robust NBS planning. Hydro-environmental risk (according to hydro-dynamic modeling) and social disparity (according to ADI distribution) are combined within a common measurement unit to capture variation across spatial domains and to optimize fair distribution across the study area. A comparison between traditional SWMM-based optimization and the proposed Gini-based framework reveals how the spatial allocation of NBSs within the watershed may be structured to address significantly more areas of social health deprivation while achieving similar hydro-environmental performance and cost-efficiency. The results of a case study for NBS planning in the White Oak Bayou watershed in Houston, Texas, USA revealed runoff volume reductions of 3.45% and 3.38%, pollutant load reductions of 11.15% and 11.28%, and ADI mitigation metrics of 16.84% and 35.32% for the SWMM-based and the Gini-based approaches, respectively, according to similar cost expenditures. As such, the proposed framework enables an analytical approach for balancing the spatial tradeoffs of overlapping human-water goals in NBS planning while maintaining hydro-environmental robustness and economic efficiency. 
    more » « less
  3. Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability. 
    more » « less
  4. null (Ed.)
    The COVID-19 viral disease surfaced at the end of 2019 and quickly spread across the globe. To rapidly respond to this pandemic and offer data support for various communities (e.g., decision-makers in health departments and governments, researchers in academia, public citizens), the National Science Foundation (NSF) spatiotemporal innovation center constructed a spatiotemporal platform with various task forces including international researchers and implementation strategies. Compared to similar platforms that only offer viral and health data, this platform views virus-related environmental data collection (EDC) an important component for the geospatial analysis of the pandemic. The EDC contains environmental factors either proven or with potential to influence the spread of COVID-19 and virulence or influence the impact of the pandemic on human health (e.g., temperature, humidity, precipitation, air quality index and pollutants, nighttime light (NTL)). In this platform/framework, environmental data are processed and organized across multiple spatiotemporal scales for a variety of applications (e.g., global mapping of daily temperature, humidity, precipitation, correlation of the pandemic to the mean values of climate and weather factors by city). This paper introduces the raw input data, construction and metadata of reprocessed data, and data storage, as well as the sharing and quality control methodologies of the COVID-19 related environmental data collection. 
    more » « less
  5. Today a tremendous amount of geospatial knowledge is hidden in massive volumes of text data. To facilitate flexible and powerful geospatial analysis and applications, we introduce a new architecture: geospatial knowledge hypercube, a multi-scale, multidimensional knowledge structure that integrates information from geospatial dimensions, thematic themes and diverse application semantics, extracted and computed from spatial-related text data. To construct such a knowledge hypercube, weakly supervised language models are leveraged for automatic, dynamic and incremental extraction of heterogeneous geospatial data, thematic themes, latent connections and relationships, and application semantics, through combining a variety of information from unstructured text, structured tables, and maps. The hypercube lays a foundation for many knowledge discovery and in-depth spatial analysis, and other advanced applications. We have deployed a prototype web application of proposed geospatial knowledge hypercube for public access at: https://hcwebapp.cigi.illinois.edu/. 
    more » « less