skip to main content

Title: Census TopDown: The Impacts of Differential Privacy on Redistricting
The 2020 Decennial Census will be released with a new disclosure avoidance system in place, putting differential privacy in the spotlight for a wide range of data users. We consider several key applications of Census data in redistricting, developing tools and demonstrations for practitioners who are concerned about the impacts of this new noising algorithm called TopDown. Based on a close look at reconstructed Texas data, we find reassuring evidence that TopDown will not threaten the ability to produce districts with tolerable population balance or to detect signals of racial polarization for Voting Rights Act enforcement.
; ; ;
Ligett, Katrina; Gupta, Swati
Award ID(s):
Publication Date:
Journal Name:
Symposium on Foundations of Responsible Computing
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning ofmore »a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion.« less
  2. The deployment of vaccines across the US provides significant defense against serious illness and death from COVID-19. Over 70% of vaccine-eligible Americans are at least partially vaccinated, but there are pockets of the population that are under-vaccinated, such as in rural areas and some demographic groups (e.g. age, race, ethnicity). These unvaccinated pockets are extremely susceptible to the Delta variant, exacerbating the healthcare crisis and increasing the risk of new variants. In this paper, we describe a data-driven model that provides real-time support to Virginia public health officials by recommending mobile vaccination site placement in order to target under-vaccinated populations. Our strategy uses fine-grained mobility data, along with US Census and vaccination uptake data, to identify locations that are most likely to be visited by unvaccinated individuals. We further extend our model to choose locations that maximize vaccine uptake among hesitant groups. We show that the top recommended sites vary substantially across some demographics, demonstrating the value of developing customized recommendation models that integrate fine-grained, heterogeneous data sources. In addition, we used a statistically equivalent Synthetic Population to study the effect of combined demographics (eg, people of a particular race and age), which is not possible using US Census datamore »alone. We validate our recommendations by analyzing the success rates of deployed vaccine sites, and show that sites placed closer to our recommended areas administered higher numbers of doses. Our model is the first of its kind to consider evolving mobility patterns in real-time for suggesting placement strategies customized for different targeted demographic groups. Our results will be presented at IAAI-22, but given the critical nature of the pandemic, we offer this extended version of that paper for more timely consideration of our approach and to cover additional findings.« less
  3. Geocomputation is increasingly integrated with spatial data infrastructure to develop and deliver massive datasets and attendant analysis and visualization capacity to a wide range of users. IPUMS Terra is spatial data infrastructure that develops and uses geocomputational approaches to provide one of the largest collections of integrated population and environment data in the world. In this paper, we describe new efforts to fundamentally change the landscape of population-environment data by integrating, preserving, and disseminating vast amounts of aggregate census and agricultural census data. We are developing data manipulation tools and workflow management approaches to transform and standardize data as well as capture metadata. These developments in turn facilitate the processing, documenting, and intake of tens of thousands of data tables into IPUMS Terra, which then are shared with the scientific community and the broader public to advance understanding of the population and agricultural systems that are central to many complex human-environment systems.
  4. Estimating human mobility responses to the large-scale spreading of the COVID-19 pandemic is crucial, since its significance guides policymakers to give Non-pharmaceutical Interventions, such as closure or reopening of businesses. It is challenging to model due to complex social contexts and limited training data. Recently, we proposed a conditional generative adversarial network (COVID-GAN) to estimate human mobility response under a set of social and policy conditions integrated from multiple data sources. Although COVID-GAN achieves a good average estimation accuracy under real-world conditions, it produces higher errors in certain regions due to the presence of spatial heterogeneity and outliers. To address these issues, in this article, we extend our prior work by introducing a new spatio-temporal deep generative model, namely, COVID-GAN+. COVID-GAN+ deals with the spatial heterogeneity issue by introducing a new spatial feature layer that utilizes the local Moran statistic to model the spatial heterogeneity strength in the data. In addition, we redesign the training objective to learn the estimated mobility changes from historical average levels to mitigate the effects of spatial outliers. We perform comprehensive evaluations using urban mobility data derived from cell phone records and census data. Results show that COVID-GAN+ can better approximate real-world human mobility responsesmore »than prior methods, including COVID-GAN.« less
  5. This work quanti es mobility changes observed during the di erent phases of the pandemic world-wide at multiple resolutions { county, state, country { using an anonymized aggregate mobility map that captures population ows between geographic cells of size 5 km2. As we overlay the global mobility map with epidemic incidence curves and dates of government interventions, we observe that as case counts rose, mobility fell and has since then seen a slow but steady increase in ows. Further, in order to understand mixing within a region, we propose a new metric to quantify the e ect of social distancing on the basis of mobility.Taking two very di erent countries sampled from the global spectrum, We analyze in detail the mobility patterns of the United States (US) and India. We then carry out a counterfactual analysis of delaying the lockdown and show that a one week delay would have doubled the reported number of cases in the US and India. Finally, we quantify the e ect of college students returning back to school for the fall semester on COVID-19 dynamics in the surrounding community. We employ the data from a recent university outbreak (reported on August 16, 2020) to infermore »possible Re values and mobility ows combined with daily prevalence data and census data to obtain an estimate of new cases that might arrive on a college campus. We nd that maintaining social distancing at existing levels would be e ective in mitigating the extra seeding of cases. However, potential behavioral change and increased social interaction amongst students (30% increase in Re ) along with extra seeding can increase the number of cases by 20% over a period of one month in the encompassing county. To our knowledge, this work is the rst to model in near real-time, the interplay of human mobility, epidemic dynamics and public policies across multiple spatial resolutions and at a global scale.« less