skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using administrative records to support the linkage of census data: protocol for building a longitudinal infrastructure of U.S. census records
This article describes the linkage methods that will be used in the Decennial Census Digitization and Linkage project (DCDL), which is completing the final four decades of a longitudinal census infrastructure covering the past 170 years of United States history. DCDL is digitizing and creating linkages between nearly a billion records across the 1960 through 1990 U.S. censuses, as well as to already-linked records from the censuses of 1940, 2000, 2010, and 2020. Our main goals in this article are to (1) describe the development of the DCDL and the protocol we will follow to build the linkages between the census files, (2) outline the techniques we will use to evaluate the quality of the links, and (3) show how the assignment and evaluation of these linkages leverages the joint use of routinely collected administrative data and non-routine survey data.  more » « less
Award ID(s):
2023639
PAR ID:
10516013
Author(s) / Creator(s):
;
Publisher / Repository:
Swansea University
Date Published:
Journal Name:
International Journal of Population Data Science
Volume:
7
Issue:
4
ISSN:
2399-4908
Subject(s) / Keyword(s):
linkage longitudinal census
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We report on the successful completion of a project to upgrade the positional accuracy of every response to the 1990, 2000, and 2010 U.S. decennial censuses. The resulting data set, called Optimized Spatial Census Information Linked Across Time (OSCILAT), resides within the restricted-access data warehouse of the Federal Statistical Research Data Center (FSRDC) system where it is available for use with approval from the U.S. Census Bureau. OSCILAT greatly improves the accuracy and completeness of spatial information for older censuses conducted prior to major quality improvements undertaken by the Bureau. Our work enables more precise spatial and longitudinal analysis of census data and supports exact tabulations of census responses for arbitrary spatial units, including tabulating responses from 1990, 2000, and 2010 within 2020 block boundaries for precise measures of change over time for small geographic areas. 
    more » « less
  2. Tropical forests are well known for their high woody plant diversity. Processes occurring at early life stages are thought to play a critical role in maintaining this high diversity and shaping the composition of tropical tree communities. To evaluate hypothesized mechanisms promoting tropical tree species coexistence and influencing composition, we initiated a census of woody seedlings and small saplings in the permanent 50-ha Forest Dynamics Plot (FDP) on Barro Colorado Island (BCI), Panama. Situated in old-growth, lowland tropical moist forest, the BCI FDP was originally established in 1980 to monitor trees and shrubs ≥1 cm diameter at 1.3 m above ground (dbh) at ca. 5-yr intervals. However, critical data on the dynamics occurring at earlier life stages were initially lacking. Therefore, in 2001 we established a 1-m2 seedling plot in the center of every 5 x 5 m section of the BCI FDP. All freestanding woody individuals ≥20 cm tall and <1 cm dbh (hereafter referred to as seedlings) were tagged, mapped, measured, and identified to species in 19,313 1-m2 seedling plots. Because seedling dynamics are rapid, we censused these seedling plots every 1–2 years. Here we present data from the 14 censuses of these seedling plots conducted between the initial census in 2001 to the most recent census, in 2018. This data set includes nearly 1M observations of ~185,000 individuals of >400 tree, shrub, and liana species. These data will permit spatially-explicit analyses of seedling distributions, recruitment, growth, and survival for hundreds of woody plant species. In addition, the data presented here can be linked to openly-available, long-term data on the dynamics of trees and shrubs ≥1cm dbh in the BCI FDP, as well as existing data sets from the site on climate, canopy structure, phylogenetic relatedness, functional traits, soil nutrients, and topography. 
    more » « less
  3. Abstract Tropical forests are well known for their high woody plant diversity. Processes occurring at early life stages are thought to play a critical role in maintaining this high diversity and shaping the composition of tropical tree communities. To evaluate hypothesized mechanisms promoting tropical tree species coexistence and influencing composition, we initiated a census of woody seedlings and small saplings in the permanent 50 ha Forest Dynamics Plot (FDP) on Barro Colorado Island (BCI), Panama. Situated in old‐growth, lowland tropical moist forest, the BCI FDP was originally established in 1980 to monitor trees and shrubs ≥1 cm diameter at 1.3 m above ground (dbh) at ca. 5‐year intervals. However, critical data on the dynamics occurring at earlier life stages were initially lacking. Therefore, in 2001 we established a 1‐m2seedling plot in the center of every 5 × 5 m section of the BCI FDP. All freestanding woody individuals ≥20 cm tall and <1 cm dbh (hereafter referred to as seedlings) were tagged, mapped, measured, and identified to species in 19,313 1‐m2seedling plots. Because seedling dynamics are rapid, we censused these seedling plots every 1–2 years. Here, we present data from the 14 censuses of these seedling plots conducted between the initial census in 2001 to the most recent census, in 2018. This data set includes nearly 1 M observations of ~185,000 individuals of >400 tree, shrub, and liana species. These data will permit spatially‐explicit analyses of seedling distributions, recruitment, growth, and survival for hundreds of woody plant species. In addition, the data presented here can be linked to openly‐available, long‐term data on the dynamics of trees and shrubs ≥1 cm dbh in the BCI FDP, as well as existing data sets from the site on climate, canopy structure, phylogenetic relatedness, functional traits, soil nutrients, and topography. This data set can be freely used for non‐commercial purposes; we request that users of these data cite this data paper in all publications resulting from the use of this data set. 
    more » « less
  4. Abstract The question of whether to carry out a quinquennial Census is faced by national statistical offices in increasingly many countries, including Canada, Nigeria, Ireland, Australia, and South Africa. We describe uses and limitations of cost-benefit analysis in this decision problem in the case of the 2016 Census of South Africa. The government of South Africa needed to decide whether to conduct a 2016 Census or to rely on increasingly inaccurate postcensal estimates accounting for births, deaths, and migration since the previous (2011) Census. The cost-benefit analysis compared predicted costs of the 2016 Census to the benefits of improved allocation of intergovernmental revenue, which was considered by the government to be a critical use of the 2016 Census, although not the only important benefit. Without the 2016 Census, allocations would be based on population estimates. Accuracy of the postcensal estimates was estimated from the performance of past estimates, and the hypothetical expected reduction in errors in allocation due to the 2016 Census was estimated. A loss function was introduced to quantify the improvement in allocation. With this evidence, the government was able to decide not to conduct the 2016 Census, but instead to improve data and capacity for producing post-censal estimates. 
    more » « less
  5. Background: The 2020 US Census will use a novel approach to disclosure avoidance to protect respondents’ data, called TopDown. This TopDown algorithm was applied to the 2018 end-to-end (E2E) test of the decennial census. The computer code used for this test as well as accompanying exposition has recently been released publicly by the Census Bureau. Methods: We used the available code and data to better understand the error introduced by the E2E disclosure avoidance system when Census Bureau applied it to 1940 census data and we developed an empirical measure of privacy loss to compare the error and privacy of the new approach to that of a (non-differentially private) simple-random-sampling approach to protecting privacy. Results: We found that the empirical privacy loss of TopDown is substantially smaller than the theoretical guarantee for all privacy loss budgets we examined. When run on the 1940 census data, TopDown with a privacy budget of 1.0 was similar in error and privacy loss to that of a simple random sample of 50% of the US population. When run with a privacy budget of 4.0, it was similar in error and privacy loss of a 90% sample. Conclusions: This work fits into the beginning of a discussion on how to best balance privacy and accuracy in decennial census data collection, and there is a need for continued discussion. 
    more » « less