skip to main content


Title: Census TopDown: The Impacts of Differential Privacy on Redistricting
The 2020 Decennial Census will be released with a new disclosure avoidance system in place, putting differential privacy in the spotlight for a wide range of data users. We consider several key applications of Census data in redistricting, developing tools and demonstrations for practitioners who are concerned about the impacts of this new noising algorithm called TopDown. Based on a close look at reconstructed Texas data, we find reassuring evidence that TopDown will not threaten the ability to produce districts with tolerable population balance or to detect signals of racial polarization for Voting Rights Act enforcement.  more » « less
Award ID(s):
1915763
NSF-PAR ID:
10273456
Author(s) / Creator(s):
; ; ;
Editor(s):
Ligett, Katrina; Gupta, Swati
Date Published:
Journal Name:
Symposium on Foundations of Responsible Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion. 
    more » « less
  2. The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a topdown manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model [20] to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead. 
    more » « less
  3. The use of crowdsourced data has been finding practical use for enhancing situational awareness during disasters. While recent studies have shown promising results regarding the potential of crowdsourced data (such as user-generated flood reports) for flash flood mapping and situational awareness, little attention has been paid to data imbalance issues that could introduce biases in data and assessment. To address this gap, in this study, we examine biases present in crowdsourced reports to identify data imbalance with a goal of improving disaster situational awareness. Three biases are examined: sample bias, spatial bias, and demographic bias. To examine these biases, we analyzed reported flooding from 3-1-1 reports (which is a citizen hotline allowing the community to report problems such as flooding) and Waze reports (which is a GPS navigation app that allows drivers to report flooded roads) with respect to FEMA damage data collected in the aftermaths of Tropical Storm Imelda in Harris County, Texas, in 2019 and Hurricane Ida in New York City in 2021. First, sample bias is assessed by expanding the flood-related categories in 3-1-1 reports. Integrating other flooding related topics into the Global Moran's I and Local Indicator of Spatial Association (LISA) revealed more communities that were impacted by floods. To examine spatial bias, we perform the LISA and BI-LISA tests on the data sets—FEMA damage, 3-1-1 reports, and Waze reports—at the census tract level and census block group level. By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results. Through a regression analysis, we found that 3-1-1 reports and Waze reports have data imbalance limitations in areas where minority populations and single parent households reside. The findings of this study advance understanding of data imbalance and biases in crowdsourced datasets that are growingly used for disaster situational awareness. Through addressing data imbalance issues, researchers and practitioners can proactively mitigate biases in crowdsourced data and prevent biased and inequitable decisions and actions. 
    more » « less
  4. Abstract

    American Community Survey (ACS) data have become the workhorse for the empirical analysis of segregation in the U.S.A. during the past decade. The increased frequency the ACS offers over the 10-year Census, which is the main reason for its popularity, comes with an increased level of uncertainty in the published estimates due to the reduced sampling ratio of ACS (1:40 households) relative to the Census (1:6 households). This paper introduces a new approach to integrate ACS data uncertainty into the analysis of segregation. Our method relies on variance replicate estimates for the 5-year ACS and advances over existing approaches by explicitly taking into account the covariance between ACS estimates when developing sampling distributions for segregation indices. We illustrate our approach with a study of comparative segregation dynamics for 29 metropolitan statistical areas in California, using the 2010–2014 and 2015–2019. Our methods yield different results than the simulation technique described by Napierala and Denton (Demography 54(1):285–309, 2017). Taking the ACS estimate covariance into account yields larger error margins than those generated with the simulated approach when the number of census tracts is large and minority percentage is low, and the converse is true when the number of census tracts is small and minority percentage is high.

     
    more » « less
  5. The deployment of vaccines across the US provides significant defense against serious illness and death from COVID-19. Over 70% of vaccine-eligible Americans are at least partially vaccinated, but there are pockets of the population that are under-vaccinated, such as in rural areas and some demographic groups (e.g. age, race, ethnicity). These unvaccinated pockets are extremely susceptible to the Delta variant, exacerbating the healthcare crisis and increasing the risk of new variants. In this paper, we describe a data-driven model that provides real-time support to Virginia public health officials by recommending mobile vaccination site placement in order to target under-vaccinated populations. Our strategy uses fine-grained mobility data, along with US Census and vaccination uptake data, to identify locations that are most likely to be visited by unvaccinated individuals. We further extend our model to choose locations that maximize vaccine uptake among hesitant groups. We show that the top recommended sites vary substantially across some demographics, demonstrating the value of developing customized recommendation models that integrate fine-grained, heterogeneous data sources. In addition, we used a statistically equivalent Synthetic Population to study the effect of combined demographics (eg, people of a particular race and age), which is not possible using US Census data alone. We validate our recommendations by analyzing the success rates of deployed vaccine sites, and show that sites placed closer to our recommended areas administered higher numbers of doses. Our model is the first of its kind to consider evolving mobility patterns in real-time for suggesting placement strategies customized for different targeted demographic groups. Our results will be presented at IAAI-22, but given the critical nature of the pandemic, we offer this extended version of that paper for more timely consideration of our approach and to cover additional findings. 
    more » « less