skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dynamically adjusting case reporting policy to maximize privacy and public health utility in the face of a pandemic
Abstract Objective Supporting public health research and the public’s situational awareness during a pandemic requires continuous dissemination of infectious disease surveillance data. Legislation, such as the Health Insurance Portability and Accountability Act of 1996 and recent state-level regulations, permits sharing deidentified person-level data; however, current deidentification approaches are limited. Namely, they are inefficient, relying on retrospective disclosure risk assessments, and do not flex with changes in infection rates or population demographics over time. In this paper, we introduce a framework to dynamically adapt deidentification for near-real time sharing of person-level surveillance data. Materials and Methods The framework leverages a simulation mechanism, capable of application at any geographic level, to forecast the reidentification risk of sharing the data under a wide range of generalization policies. The estimates inform weekly, prospective policy selection to maintain the proportion of records corresponding to a group size less than 11 (PK11) at or below 0.1. Fixing the policy at the start of each week facilitates timely dataset updates and supports sharing granular date information. We use August 2020 through October 2021 case data from Johns Hopkins University and the Centers for Disease Control and Prevention to demonstrate the framework’s effectiveness in maintaining the PK11 threshold of 0.01. Results When sharing COVID-19 county-level case data across all US counties, the framework’s approach meets the threshold for 96.2% of daily data releases, while a policy based on current deidentification techniques meets the threshold for 32.3%. Conclusion Periodically adapting the data publication policies preserves privacy while enhancing public health utility through timely updates and sharing epidemiologically critical features.  more » « less
Award ID(s):
2029661
PAR ID:
10362769
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of the American Medical Informatics Association
Volume:
29
Issue:
5
ISSN:
1527-974X
Page Range / eLocation ID:
853 to 863
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The rapidly expanding use of wastewater for public health surveillance requires new strategies to protect privacy rights, while data are collected at increasingly discrete geospatial scales, i.e., city, neighborhood, campus, and building-level. Data collected at high geospatial resolution can inform on labile, short-lived biomarkers, thereby making wastewater-derived data both more actionable and more likely to cause privacy concerns and stigma- tization of subpopulations. Additionally, data sharing restrictions among neighboring cities and communities can complicate efforts to balance public health protections with citizens’ privacy. Here, we have created an encrypted framework that facilitates the sharing of sensitive population health data among entities that lack trust for one another (e.g., between adjacent municipalities with different governance of health monitoring and data sharing). We demonstrate the utility of this approach with two real-world cases. Our results show the feasibility of sharing encrypted data between two municipalities and a laboratory, while performing secure private com- putations for wastewater-based epidemiology (WBE) with high precision, fast speeds, and low data costs. This framework is amenable to other computations used by WBE researchers including population normalized mass loads, fecal indicator normalizations, and quality control measures. The Centers for Disease Control and Pre- vention’s National Wastewater Surveillance System shows ~8 % of the records attributed to collection before the wastewater treatment plant, illustrating an opportunity to further expand currently limited community-level sampling and public health surveillance through security and responsible data-sharing as outlined here. 
    more » « less
  2. Abstract Traditional health surveillance methods play a critical role in public health safety but are limited by the data collection speed, coverage, and resource requirements. Wastewater‐based epidemiology (WBE) has emerged as a cost‐effective and rapid tool for detecting infectious diseases through sewage analysis of disease biomarkers. Recent advances in big data analytics have enhanced public health monitoring by enabling predictive modeling and early risk detection. This paper explores the application of machine learning (ML) in WBE data analytics, with a focus on infectious disease surveillance and forecasting. We highlight the advantages of ML‐driven WBE prediction models, including their ability to process multimodal data, predict disease trends, and evaluate policy impacts through scenario simulations. We also examine challenges such as data quality, model interpretability, and integration with existing public health infrastructure. The integration of ML WBE data analytics enables rapid health data collection, analysis, and interpretation that are not feasible in current surveillance approaches. By leveraging ML and WBE, decision makers can reduce cognitive biases and enhance data‐driven responses to public health threats. As global health risks evolve, the synergy between WBE, ML, and data‐driven decision‐making holds significant potential for improving public health outcomes. 
    more » « less
  3. The COVID-19 pandemic highlights the need for broad dissemination of case surveillance data. Local and global public health agencies have initiated efforts to do so, but there remains limited data available, due in part to concerns over privacy. As a result, current COVID-19 case surveillance data sharing policies are based on strong adversarial assumptions, such as the expectation that an attacker can readily re-identify individuals based on their distinguishability in a dataset. There are various re-identification risk measures to account for adversarial capabilities; however, the current array insufficiently accounts for real world data challenges - particularly issues of missing records in resources of identifiable records that adversaries may rely upon to execute attacks (e.g., 10 50-year-old male in the de-identified dataset vs. 5 50-year-old male in the identified dataset). In this paper, we introduce several approaches to amend such risk measures and assess re-identification risk in light of how an attacker's capabilities relate to missing records. We demonstrate the potential for these measures through a record linkage attack using COVID-19 case surveillance data and voter registration records in the state of Florida. Our findings demonstrate that adversarial assumptions, as realized in a risk measure, can dramatically affect re-identification risk estimation. Notably, we show that the re-identification risk is likely to be substantially smaller than the typical risk thresholds, which suggests that more detailed data could be shared publicly than is currently the case. 
    more » « less
  4. Abstract Most of the current public health surveillance methods used in epidemiological studies to identify hotspots of diseases assume that the regional disease case counts are independently distributed and they lack the ability of adjusting for confounding covariates. This article proposes a new approach that uses a simultaneous autoregressive (SAR) model, a popular spatial regression approach, within the classical space‐time cumulative sum (CUSUM) framework for detecting changes in the spatial distribution of count data while accounting for risk factors and spatial correlation. We develop expressions for the likelihood ratio test monitoring statistics based on a SAR model with covariates, leading to the proposed space‐time CUSUM test statistic. The effectiveness of the proposed monitoring approach in detecting and identifying step shifts is studied by simulation of various shift scenarios in regional counts. A case study for monitoring regional COVID‐19 infection counts while adjusting for social vulnerability, often correlated with a community's susceptibility towards disease infection, is presented to illustrate the application of the proposed methodology in public health surveillance. 
    more » « less
  5. none (Ed.)
    Meteorological data for public health surveillanceMichael Wimberly, Professor from the University of Oklahoma, walks us through integrating meteorological data for public health surveillance and disease forecasting. Public health surveillance involves the collection, analysis, interpretation, and dissemination of health-related data to plan, implement, and evaluate public health practices. The resulting information supports the detection of emerging health threats, planning interventions, and evaluating policies and programs to protect and improve population health. 
    more » « less