skip to main content

Search for: All records

Creators/Authors contains: "Bao, Han"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 8, 2024
  2. Free, publicly-accessible full text available November 1, 2023
  3. Free, publicly-accessible full text available November 1, 2023
  4. Spatial data are ubiquitous and have transformed decision-making in many critical domains, including public health, agriculture, transportation, etc. While recent advances in machine learning offer promising ways to harness massive spatial datasets (e.g., satellite imagery), spatial heterogeneity -- a fundamental property of spatial data -- poses a major challenge as data distributions or generative processes often vary over space. Recent studies targeting this difficult problem either require a known space-partitioning as the input, or can only support limited special cases (e.g., binary classification). Moreover, heterogeneity-pattern learned by these methods are locked to the locations of the training samples, and cannot be applied to new locations. We propose a statistically-guided framework to adaptively partition data in space during training using distribution-driven optimization and transform a deep learning model (of user's choice) into a heterogeneity-aware architecture. We also propose a spatial moderator to generalize learned patterns to new test regions. Experiment results on real-world datasets show that the framework can effectively capture footprints of heterogeneity and substantially improve prediction performances.

    Free, publicly-accessible full text available July 1, 2023
  5. Estimating human mobility responses to the large-scale spreading of the COVID-19 pandemic is crucial, since its significance guides policymakers to give Non-pharmaceutical Interventions, such as closure or reopening of businesses. It is challenging to model due to complex social contexts and limited training data. Recently, we proposed a conditional generative adversarial network (COVID-GAN) to estimate human mobility response under a set of social and policy conditions integrated from multiple data sources. Although COVID-GAN achieves a good average estimation accuracy under real-world conditions, it produces higher errors in certain regions due to the presence of spatial heterogeneity and outliers. To address these issues, in this article, we extend our prior work by introducing a new spatio-temporal deep generative model, namely, COVID-GAN+. COVID-GAN+ deals with the spatial heterogeneity issue by introducing a new spatial feature layer that utilizes the local Moran statistic to model the spatial heterogeneity strength in the data. In addition, we redesign the training objective to learn the estimated mobility changes from historical average levels to mitigate the effects of spatial outliers. We perform comprehensive evaluations using urban mobility data derived from cell phone records and census data. Results show that COVID-GAN+ can better approximate real-world human mobility responsesmore »than prior methods, including COVID-GAN.« less
  6. Free, publicly-accessible full text available June 1, 2023
  7. Free, publicly-accessible full text available March 24, 2024
  8. Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single “hotspot” of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected.more »As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10–20% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data.« less