skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HierGP: Hierarchical Grid Partitioning for Scalable Geospatial Data Analytics
Application domains such as environmental health science, climate science, and geosciences—where the relationship between humans and the environment is studied—are constantly evolving and require innovative approaches in geospatial data analysis. Recent technological advancements have led to the proliferation of high-granularity geospatial data, enabling such domains but posing major challenges in managing vast datasets that have high spatiotemporal similarities. We introduce the Hierarchical Grid Partitioning (HierGP) framework to address this issue. Unlike conventional discrete global grid systems, HierGP dynamically adapts to the data’s inherent characteristics. At the core of our framework is the Map Point Reduction (MPR) algorithm, designed to aggregate and then collapse data points based on user-defined similarity criteria. This effectively reduces data volume while preserving essential information. The reduction process is particularly effective in handling environmental data from extensive geographical regions. We structure the data into a multilevel hierarchy from which a reduced representative dataset can be extracted. We compare the performance of HierGP against several state-of-the-art geospatial indexing algorithms and demonstrate that HierGP outperforms the existing approaches in terms of runtime, memory footprint, and scalability. We illustrate the benefits of the HierGP approach using two representative applications: analysis of over 289 million location samples from a registry of participants and efficient extraction of environmental data from large polygons. While the application demonstration in this work has focused on environmental health, the methodology of the HierGP framework can be extended to explore diverse geospatial analytics domains.  more » « less
Award ID(s):
2126449
PAR ID:
10548507
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Spatial Algorithms and Systems
ISSN:
2374-0353
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary The interdisciplinary field of cyberGIS (geographic information science and systems (GIS) based on advanced cyberinfrastructure) has a major focus on data‐ and computation‐intensive geospatial analytics. The rapidly growing needs across many application and science domains for such analytics based on disparate geospatial big data poses significant challenges to conventional GIS approaches. This paper describes CyberGIS‐Jupyter, an innovative cyberGIS framework for achieving data‐intensive, reproducible, and scalable geospatial analytics using Jupyter Notebook based on ROGER, the first cyberGIS supercomputer. The framework adapts the Notebook with built‐in cyberGIS capabilities to accelerate gateway application development and sharing while associated data, analytics, and workflow runtime environments are encapsulated into application packages that can be elastically reproduced through cloud‐computing approaches. As a desirable outcome, data‐intensive and scalable geospatial analytics can be efficiently developed and improved and seamlessly reproduced among multidisciplinary users in a novel cyberGIS science gateway environment. 
    more » « less
  2. A robust multi-functional framework for widespread planning of nature-based solutions (NBS) must incorporate components of social equity and hydro-environmental performance in a cost-effective manner. NBS systems address stormwater mitigation by increasing on-site infiltration and evaporation through enhanced greenspace while also improving various components of societal well-being, such as physical health (e.g., heart disease, diabetes), mental health (e.g., post-traumatic stress disorder, depression), and social cohesion. However, current optimization tools for NBS systems rely on stormwater quantity abatement and, to a lesser extent, economic costs and environmental pollutant mitigation. Therefore, the objective of this study is to explore how NBS planning may be improved to maximize hydrological, environmental, and social co-benefits in an unequivocal and equitable manner. Here, a novel equity-based indexing framework is proposed to better understand how we might optimize social and physical functionalities of NBS systems as a function of transdisciplinary characteristics. Specifically, this study explores the spatial tradeoffs associated with NBS allocation by first optimizing a local watershed-scale model according to traditional metrics of stormwater efficacy (e.g., cost efficiency, hydrological runoff reduction, and pollutant load reduction) using SWMM modeling. The statistical dispersion of social health is then identified using the Area Deprivation Index (ADI), which is a high-resolution spatial account of socioeconomic disadvantages that have been linked to adverse health outcomes, according to United States census properties. As NBSs have been shown to mitigate various adverse health conditions through increased urban greening, this improved understanding of geospatial health characteristics may be leveraged to inform an explicit representation of social wellness within NBS planning frameworks. This study presents and demonstrates a novel framework for integrating hydro-environmental modeling, economic efficiency, and social health deprivation using a dimensionless Gini coefficient, which is intended to spur the positive connection of social and physical influences within robust NBS planning. Hydro-environmental risk (according to hydro-dynamic modeling) and social disparity (according to ADI distribution) are combined within a common measurement unit to capture variation across spatial domains and to optimize fair distribution across the study area. A comparison between traditional SWMM-based optimization and the proposed Gini-based framework reveals how the spatial allocation of NBSs within the watershed may be structured to address significantly more areas of social health deprivation while achieving similar hydro-environmental performance and cost-efficiency. The results of a case study for NBS planning in the White Oak Bayou watershed in Houston, Texas, USA revealed runoff volume reductions of 3.45% and 3.38%, pollutant load reductions of 11.15% and 11.28%, and ADI mitigation metrics of 16.84% and 35.32% for the SWMM-based and the Gini-based approaches, respectively, according to similar cost expenditures. As such, the proposed framework enables an analytical approach for balancing the spatial tradeoffs of overlapping human-water goals in NBS planning while maintaining hydro-environmental robustness and economic efficiency. 
    more » « less
  3. Internet-of-Things (IoT) approaches are continually introducing new sensors into the fields of agriculture and animal welfare. The application of multi-sensor data fusion to these domains remains a complex and open-ended challenge that defies straightforward optimization, often requiring iterative testing and refinement. To respond to this need, we have created a new open-source framework as well as a corresponding Python tool which we call the “Data Fusion Explorer (DFE)”. We demonstrated and evaluated the effectiveness of our proposed framework using four early-stage datasets from diverse disciplines, including animal/environmental tracking, agrarian monitoring, and food quality assessment. This included data across multiple common formats including single, array, and image data, as well as classification or regression and temporal or spatial distributions. We compared various pipeline schemes, such as low-level against mid-level fusion, or the placement of dimensional reduction. Based on their space and time complexities, we then highlighted how these pipelines may be used for different purposes depending on the given problem. As an example, we observed that early feature extraction reduced time and space complexity in agrarian data. Additionally, independent component analysis outperformed principal component analysis slightly in a sweet potato imaging dataset. Lastly, we benchmarked the DFE tool with respect to the Vanilla Python3 packages using our four datasets’ pipelines and observed a significant reduction, usually more than 50%, in coding requirements for users in almost every dataset, suggesting the usefulness of this package for interdisciplinary researchers in the field. 
    more » « less
  4. Sila-Nowicka, Katarzyna; Moore, Antoni; O'Sullivan, David; Adams, Benjamin; Gahegan, Mark (Ed.)
    Geospatial Knowledge Graphs (GeoKGs) represent a significant advancement in the integration of AI-driven geographic information, facilitating interoperable and semantically rich geospatial analytics across various domains. This paper explores the use of topologically enriched GeoKGs, built on an explicit representation of S2 Geometry alongside precomputed topological relations, for constructing efficient geospatial analysis workflows within and across knowledge graphs (KGs). \r\nUsing the SAWGraph knowledge graph as a case study focused on enviromental contamination by PFAS, we demonstrate how this framework supports fundamental GIS operations - such as spatial filtering, proximity analysis, overlay operations and network analysis - in a GeoKG setting while allowing for the easy linking of these operations with one another and with semantic filters. This enables the efficient execution of complex geospatial analyses as semantically-explicit queries and enhances the usability of geospatial data across graphs. Additionally, the framework eliminates the need for explicit support for GeoSPARQL’s topological operations in the utilized graph databases and better integrates spatial knowledge into the overall semantic inference process supported by RDFS and OWL ontologies. 
    more » « less
  5. Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability. 
    more » « less