skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Census TopDown: The Impacts of Differential Privacy on Redistricting
The 2020 Decennial Census will be released with a new disclosure avoidance system in place, putting differential privacy in the spotlight for a wide range of data users. We consider several key applications of Census data in redistricting, developing tools and demonstrations for practitioners who are concerned about the impacts of this new noising algorithm called TopDown. Based on a close look at reconstructed Texas data, we find reassuring evidence that TopDown will not threaten the ability to produce districts with tolerable population balance or to detect signals of racial polarization for Voting Rights Act enforcement.  more » « less
Award ID(s):
1915763
PAR ID:
10273456
Author(s) / Creator(s):
; ; ;
Editor(s):
Ligett, Katrina; Gupta, Swati
Date Published:
Journal Name:
Symposium on Foundations of Responsible Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion. 
    more » « less
  2. This article describes the linkage methods that will be used in the Decennial Census Digitization and Linkage project (DCDL), which is completing the final four decades of a longitudinal census infrastructure covering the past 170 years of United States history. DCDL is digitizing and creating linkages between nearly a billion records across the 1960 through 1990 U.S. censuses, as well as to already-linked records from the censuses of 1940, 2000, 2010, and 2020. Our main goals in this article are to (1) describe the development of the DCDL and the protocol we will follow to build the linkages between the census files, (2) outline the techniques we will use to evaluate the quality of the links, and (3) show how the assignment and evaluation of these linkages leverages the joint use of routinely collected administrative data and non-routine survey data. 
    more » « less
  3. Data sets and statistics about groups of individuals are increasingly collected and released, feeding many optimization and learning algorithms. In many cases, the released data contain sensitive information whose privacy is strictly regulated. For example, in the U.S., the census data is regulated under Title 13, which requires that no individual be identified from any data released by the Census Bureau. In Europe, data release is regulated according to the General Data Protection Regulation, which addresses the control and transfer of personal data. Differential privacy has emerged as the de-facto standard to protect data privacy. In a nutshell, differentially private algorithms protect an individual’s data by injecting random noise into the output of a computation that involves such data. While this process ensures privacy, it also impacts the quality of data analysis, and, when private data sets are used as inputs to complex machine learning or optimization tasks, they may produce results that are fundamentally different from those obtained on the original data and even rise unintended bias and fairness concerns. In this talk, I will first focus on the challenge of releasing privacy-preserving data sets for complex data analysis tasks. I will introduce the notion of Constrained-based Differential Privacy (C-DP), which allows casting the data release problem to an optimization problem whose goal is to preserve the salient features of the original data. I will review several applications of C-DP in the context of very large hierarchical census data, data streams, energy systems, and in the design of federated data-sharing protocols. Next, I will discuss how errors induced by differential privacy algorithms may propagate within a decision problem causing biases and fairness issues. This is particularly important as privacy-preserving data is often used for critical decision processes, including the allocation of funds and benefits to states and jurisdictions, which ideally should be fair and unbiased. Finally, I will conclude with a roadmap to future work and some open questions. 
    more » « less
  4. null (Ed.)
    Research on crime and neighborhood racial composition establishes that Black neighborhoods with high levels of violent crime will experience an increase in Black residents and concentrated disadvantage—due to the constrained housing choices Black people face. Some studies on the relationship between gentrification and crime, however, show that high-crime neighborhoods can experience reinvestment as well as displacement of Black residents. In Washington, DC, we have seen both trends—concentration of poverty and segregation as well as racial turnover and reinvestment. We employ a spatial analysis using a merged data set including crime data, Census data, and American Community Survey (ACS) data to analyze the relationship between crime and neighborhood change at the Census tract level. Our findings demonstrate the importance of distinguishing between periods of neighborhood decline and ascent, between the effects of property and violent crime, and between racial change and socioeconomic change. 
    more » « less
  5. Abstract American Community Survey (ACS) data have become the workhorse for the empirical analysis of segregation in the U.S.A. during the past decade. The increased frequency the ACS offers over the 10-year Census, which is the main reason for its popularity, comes with an increased level of uncertainty in the published estimates due to the reduced sampling ratio of ACS (1:40 households) relative to the Census (1:6 households). This paper introduces a new approach to integrate ACS data uncertainty into the analysis of segregation. Our method relies on variance replicate estimates for the 5-year ACS and advances over existing approaches by explicitly taking into account the covariance between ACS estimates when developing sampling distributions for segregation indices. We illustrate our approach with a study of comparative segregation dynamics for 29 metropolitan statistical areas in California, using the 2010–2014 and 2015–2019. Our methods yield different results than the simulation technique described by Napierala and Denton (Demography 54(1):285–309, 2017). Taking the ACS estimate covariance into account yields larger error margins than those generated with the simulated approach when the number of census tracts is large and minority percentage is low, and the converse is true when the number of census tracts is small and minority percentage is high. 
    more » « less