skip to main content

Search for: All records

Creators/Authors contains: "Yan, Chao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Objective Supporting public health research and the public’s situational awareness during a pandemic requires continuous dissemination of infectious disease surveillance data. Legislation, such as the Health Insurance Portability and Accountability Act of 1996 and recent state-level regulations, permits sharing deidentified person-level data; however, current deidentification approaches are limited. Namely, they are inefficient, relying on retrospective disclosure risk assessments, and do not flex with changes in infection rates or population demographics over time. In this paper, we introduce a framework to dynamically adapt deidentification for near-real time sharing of person-level surveillance data. Materials and Methods The framework leverages a simulation mechanism, capable of application at any geographic level, to forecast the reidentification risk of sharing the data under a wide range of generalization policies. The estimates inform weekly, prospective policy selection to maintain the proportion of records corresponding to a group size less than 11 (PK11) at or below 0.1. Fixing the policy at the start of each week facilitates timely dataset updates and supports sharing granular date information. We use August 2020 through October 2021 case data from Johns Hopkins University and the Centers for Disease Control and Prevention to demonstrate the framework’s effectiveness in maintaining the PK11 thresholdmore »of 0.01. Results When sharing COVID-19 county-level case data across all US counties, the framework’s approach meets the threshold for 96.2% of daily data releases, while a policy based on current deidentification techniques meets the threshold for 32.3%. Conclusion Periodically adapting the data publication policies preserves privacy while enhancing public health utility through timely updates and sharing epidemiologically critical features.« less
    Free, publicly-accessible full text available February 19, 2023
  2. The COVID-19 pandemic highlights the need for broad dissemination of case surveillance data. Local and global public health agencies have initiated efforts to do so, but there remains limited data available, due in part to concerns over privacy. As a result, current COVID-19 case surveillance data sharing policies are based on strong adversarial assumptions, such as the expectation that an attacker can readily re-identify individuals based on their distinguishability in a dataset. There are various re-identification risk measures to account for adversarial capabilities; however, the current array insufficiently accounts for real world data challenges - particularly issues of missing records in resources of identifiable records that adversaries may rely upon to execute attacks (e.g., 10 50-year-old male in the de-identified dataset vs. 5 50-year-old male in the identified dataset). In this paper, we introduce several approaches to amend such risk measures and assess re-identification risk in light of how an attacker's capabilities relate to missing records. We demonstrate the potential for these measures through a record linkage attack using COVID-19 case surveillance data and voter registration records in the state of Florida. Our findings demonstrate that adversarial assumptions, as realized in a risk measure, can dramatically affect re-identification risk estimation.more »Notably, we show that the re-identification risk is likely to be substantially smaller than the typical risk thresholds, which suggests that more detailed data could be shared publicly than is currently the case.« less
  3. Abstract

    Transformation of low-volatility gaseous precursors to new particles affects aerosol number concentration, cloud formation and hence the climate. The clustering of acid and base molecules is a major mechanism driving fast nucleation and initial growth of new particles in the atmosphere. However, the acid–base cluster composition, measured using state-of-the-art mass spectrometers, cannot explain the measured high formation rate of new particles. Here we present strong evidence for the existence of base molecules such as amines in the smallest atmospheric sulfuric acid clusters prior to their detection by mass spectrometers. We demonstrate that forming (H2SO4)1(amine)1 is the rate-limiting step in atmospheric H2SO4-amine nucleation and the uptake of (H2SO4)1(amine)1 is a major pathway for the initial growth of H2SO4 clusters. The proposed mechanism is very consistent with measured new particle formation in urban Beijing, in which dimethylamine is the key base for H2SO4 nucleation while other bases such as ammonia may contribute to the growth of larger clusters. Our findings further underline the fact that strong amines, even at low concentrations and when undetected in the smallest clusters, can be crucial to particle formation in the planetary boundary layer.