NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A Survey

https://doi.org/10.1145/3487893

Xie, Yiqun; Shekhar, Shashi; Li, Yan (March 2023, ACM Computing Surveys)

Mapping of spatial hotspots, i.e., regions with significantly higher rates of generating cases of certain events (e.g., disease or crime cases), is an important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, and so on. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering (e.g., scan statistics) have been extensively studied by the data mining and statistics communities. In this survey, we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy for statistically-robust clustering, covering key steps of data and statistical modeling, region enumeration and maximization, and significance testing. We further discuss different paradigms and methods within each of the key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond.
more » « less
Full Text Available
NTEP‐DB 1.0: A relational database for the national turfgrass evaluation program

https://doi.org/10.1002/its2.76

Xie, Yiqun; Farhadloo, Majid; Guo, Ning; Shekhar, Shashi; Watkins, Eric; Kne, Len; Bao, Han; Patton, Aaron J.; Morris, Kevin (June 2022, International Turfgrass Society Research Journal)

Full Text Available
Spatial Variability Aware Deep Neural Networks (SVANN): A General Approach

https://doi.org/10.1145/3466688

Gupta, Jayant; Molnar, Carl; Xie, Yiqun; Knight, Joe; Shekhar, Shashi (December 2021, ACM Transactions on Intelligent Systems and Technology)

Spatial variability is a prominent feature of various geographic phenomena such as climatic zones, USDA plant hardiness zones, and terrestrial habitat types (e.g., forest, grasslands, wetlands, and deserts). However, current deep learning methods follow a spatial-one-size-fits-all (OSFA) approach to train single deep neural network models that do not account for spatial variability. Quantification of spatial variability can be challenging due to the influence of many geophysical factors. In preliminary work, we proposed a spatial variability aware neural network (SVANN-I, formerly called SVANN ) approach where weights are a function of location but the neural network architecture is location independent. In this work, we explore a more flexible SVANN-E approach where neural network architecture varies across geographic locations. In addition, we provide a taxonomy of SVANN types and a physics inspired interpretation model. Experiments with aerial imagery based wetland mapping show that SVANN-I outperforms OSFA and SVANN-E performs the best of all.
more » « less
Full Text Available
Significant DBSCAN+: Statistically Robust Density-based Clustering

https://doi.org/10.1145/3474842

Xie, Yiqun; Jia, Xiaowei; Shekhar, Shashi; Bao, Han; Zhou, Xun (October 2021, ACM Transactions on Intelligent Systems and Technology)

Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single “hotspot” of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected. As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10–20% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data.
more » « less
Full Text Available
Discovering regions of anomalous spatial co-locations

https://doi.org/10.1080/13658816.2020.1830998

Cai, Jiannan; Deng, Min; Guo, Yiwen; Xie, Yiqun; Shekhar, Shashi (May 2021, International Journal of Geographical Information Science)
null (Ed.)
Full Text Available
Significant spatial co-distribution pattern discovery

https://doi.org/10.1016/j.compenvurbsys.2020.101543

Cai, Jiannan; Xie, Yiqun; Deng, Min; Tang, Xun; Li, Yan; Shekhar, Shashi (November 2020, Computers, Environment and Urban Systems)
null (Ed.)
Full Text Available
Physics-guided Energy-efficient Path Selection Using On-board Diagnostics Data

https://doi.org/10.1145/3406596

Li, Yan; Kotwal, Pratik; Wang, Pengyue; Xie, Yiqun; Shekhar, Shashi; Northrop, William (October 2020, ACM/IMS Transactions on Data Science)
null (Ed.)
Given a spatial graph, an origin and a destination, and on-board diagnostics (OBD) data, the energy-efficient path selection problem aims to find the path with the least expected energy consumption (EEC). Two main objectives of smart cities are sustainability and prosperity, both of which benefit from reducing the energy consumption of transportation. The challenges of the problem include the dependence of EEC on the physical parameters of vehicles, the autocorrelation of the EEC on segments of paths, the high computational cost of EEC estimation, and potential negative EEC. However, the current cost estimation models for the path selection problem do not consider vehicles’ physical parameters. Moreover, the current path selection algorithms follow the “path + edge” pattern when exploring candidate paths, resulting in redundant computation. Our preliminary work introduced a physics-guided energy consumption model and proposed a maximal-frequented-path-graph shortest-path algorithm using the model. In this work, we propose an informed algorithm using an admissible heuristic and propose an algorithm to handle negative EEC. We analyze the proposed algorithms theoretically and evaluate the proposed algorithms via experiments with real-world and synthetic data. We also conduct two case studies using real-world data and a road test to validate the proposed method.
more » « less
Full Text Available
Technical perspective: Progress in spatial computing for flood prediction

https://doi.org/10.1145/3410410

Shekhar, Shashi (August 2020, Communications of the ACM)
null (Ed.)
Full Text Available
A Unified Framework for Robust and Efficient Hotspot Detection in Smart Cities

https://doi.org/10.1145/3379562

Xie, Yiqun; Shekhar, Shashi (January 2020, ACM/IMS Transactions on Data Science)

Given N geo-located point instances (e.g., crime or disease cases) in a spatial domain, we aim to detect sub-regions (i.e., hotspots) that have a higher probability density of generating such instances than the others. Hotspot detection has been widely used in a variety of important urban applications, including public safety, public health, urban planning, equity, etc. The problem is challenging because its societal applications often have low-tolerance for false positives, and require significance testing which is computationally intensive. In related work, the spatial scan statistic introduced a likelihood ratio based framework for hotspot evaluation and significance testing. However, it fails to consider the effect of spatial nondeterminism, causing many missing detections. Our previous work introduced a nondeterministic normalization based scan statistic to mitigate this issue. However, its robustness against false positives is not stably controlled. To address these limitations, we propose a unified framework which can improve the completeness of results without incurring more false positives. We also propose a reduction algorithm to improve the computational efficiency. Experiment results confirm that the unified framework can greatly improve the recall of hotspot detection without increasing the number of false positives, and the reduction algorithm can greatly reduce execution time.
more » « less
Full Text Available
Spatial Ensemble Learning for Heterogeneous Geographic Data with Class Ambiguity

https://doi.org/10.1145/3337798

Jiang, Zhe; Sainju, Arpan Man; Li, Yan; Shekhar, Shashi; Knight, Joseph (August 2019, ACM Transactions on Intelligent Systems and Technology)

Full Text Available

Search for: All records