skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy
Several "data-for-good" projects [1, 5, 12] initiated by major companies (e.g., Meta, Google) release to the public spatio-temporal datasets to benefit COVID-19 spread modeling [17, 47, 64] and understand human mobility [14, 24]. Most often, spatio-temporal data are provided in the form of snapshot high resolution population density information, where the released statistics capture population counts in small areas for short time periods. Since high resolution is required for utility (e.g., in modeling COVID hotspots) privacy risks are elevated. To prevent malicious actors from using the data to infer sensitive details about individuals, the released datasets must be first sanitized. Typically, [1, 5, 7, 12], differential privacy (DP) is employed as protection model, due to its formal protection guarantees that prevent an adversary to learn whether a particular individual's data has been included in the release or not.  more » « less
Award ID(s):
2128661 2125530 1910950 1909806
PAR ID:
10431849
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
1
Issue:
1
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present an interpretable high-resolution spatio-temporal model to estimate COVID-19 deaths together with confirmed cases 1 week ahead of the current time, at the county level and weekly aggregated, in the United States. A notable feature of our spatio-temporal model is that it considers the (1) temporal auto- and pairwise correlation of the two local time series (confirmed cases and deaths from the COVID-19), (2) correlation between locations (propagation between counties), and (3) covariates such as local within-community mobility and social demographic factors. The within-community mobility and demographic factors, such as total population and the proportion of the elderly, are included as important predictors since they are hypothesized to be important in determining the dynamics of COVID-19. To reduce the model’s high dimensionality, we impose sparsity structures as constraints and emphasize the impact of the top 10 metropolitan areas in the nation, which we refer to (and treat within our models) as hubs in spreading the disease. Our retrospective out-of-sample county-level predictions were able to forecast the subsequently observed COVID-19 activity accurately. The proposed multivariate predictive models were designed to be highly interpretable, with clear identification and quantification of the most important factors that determine the dynamics of COVID-19. Ongoing work involves incorporating more covariates, such as education and income, to improve prediction accuracy and model interpretability. 
    more » « less
  2. Intelligently responding to a pandemic like Covid-19 requires sophisticated models over accurate real-time data, which is typically lacking at the start, e.g., due to deficient population testing. In such times, crowdsensing of spatially tagged disease-related symptoms provides an alternative way of acquiring real-time insights about the pandemic. Existing crowdsensing systems aggregate and release data for pre-fixed regions, e.g., counties. However, the insights obtained from such aggregates do not provide useful information about smaller regions e.g., neighborhoods where outbreaks typically occur and the aggregate-and-release method is vulnerable to privacy attacks. Therefore, we propose a novel differentially private method to obtain accurate insights from crowdsensed data for any number of regions specified by the users (e.g., researchers and a policy makers) without compromising privacy of the data contributors. Our approach, which has been implemented and deployed, informs the development of the future privacy-preserving intelligent systems for longitudinal and spatial data analytics. 
    more » « less
  3. As a decisive part in the success of Mobility-as-a-Service (MaaS), spatio-temporal predictive modeling for crowd movements is a challenging task particularly considering scenarios where societal events drive mobility behavior deviated from the normality. While tremendous progress has been made to model high-level spatio-temporal regularities with deep learning, most, if not all of the existing methods are neither aware of the dynamic interactions among multiple transport modes nor adaptive to unprecedented volatility brought by potential societal events. In this paper, we are therefore motivated to improve the canonical spatio-temporal network (ST-Net) from two perspectives: (1) design a heterogeneous mobility information network (HMIN) to explicitly represent intermodality in multimodal mobility; (2) propose a memory-augmented dynamic filter generator (MDFG) to generate sequence-specific parameters in an on-the-fly fashion for various scenarios. The enhanced event-aware spatio-temporal network, namely EAST-Net, is evaluated on several real-world datasets with a wide variety and coverage of societal events. Both quantitative and qualitative experimental results verify the superiority of our approach compared with the state-of-the-art baselines. Code and data are published on https://github.com/underdoc-wang/EAST-Net. 
    more » « less
  4. null (Ed.)
    The COVID-19 pandemic has posed grand challenges to policy makers, raising major social conflicts between public health and economic resilience. Policies such as closure or reopen of businesses are made based on scientific projections of infection risks obtained from infection dynamics models. While most parameters in infection dynamics models can be set using domain knowledge of COVID-19, a key parameter - human mobility - is often challenging to estimate due to complex social contexts and limited training data under escalating COVID-19 conditions. To address these challenges, we formulate the problem as a spatio-temporal data generation problem and propose COVID-GAN, a spatio-temporal Conditional Generative Adversarial Network, to estimate mobility (e.g., changes in POI visits) under various real-world conditions (e.g., COVID-19 severity, local policy interventions) integrated from multiple data sources. We also introduce a domain-constraint correction layer in the generator of COVID-GAN to reduce the difficulty of learning. Experiments using urban mobility data derived from cell phone records and census data show that COVID-GAN can well approximate real-world human mobility responses, and that the proposed domain-constraint based correction can greatly improve solution quality. 
    more » « less
  5. Data sets and statistics about groups of individuals are increasingly collected and released, feeding many optimization and learning algorithms. In many cases, the released data contain sensitive information whose privacy is strictly regulated. For example, in the U.S., the census data is regulated under Title 13, which requires that no individual be identified from any data released by the Census Bureau. In Europe, data release is regulated according to the General Data Protection Regulation, which addresses the control and transfer of personal data. Differential privacy has emerged as the de-facto standard to protect data privacy. In a nutshell, differentially private algorithms protect an individual’s data by injecting random noise into the output of a computation that involves such data. While this process ensures privacy, it also impacts the quality of data analysis, and, when private data sets are used as inputs to complex machine learning or optimization tasks, they may produce results that are fundamentally different from those obtained on the original data and even rise unintended bias and fairness concerns. In this talk, I will first focus on the challenge of releasing privacy-preserving data sets for complex data analysis tasks. I will introduce the notion of Constrained-based Differential Privacy (C-DP), which allows casting the data release problem to an optimization problem whose goal is to preserve the salient features of the original data. I will review several applications of C-DP in the context of very large hierarchical census data, data streams, energy systems, and in the design of federated data-sharing protocols. Next, I will discuss how errors induced by differential privacy algorithms may propagate within a decision problem causing biases and fairness issues. This is particularly important as privacy-preserving data is often used for critical decision processes, including the allocation of funds and benefits to states and jurisdictions, which ideally should be fair and unbiased. Finally, I will conclude with a roadmap to future work and some open questions. 
    more » « less