skip to main content

Title: Sharing personal ECG time-series data privately
Abstract Objective

Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics.

Methods and materials

We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis.


We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering).


Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that more » individual-level data can be highly unique, requiring new privacy solutions to protect data contributors.


The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.

« less
; ;
Award ID(s):
2040727 1949217 1951430
Publication Date:
Journal Name:
Journal of the American Medical Informatics Association
Page Range or eLocation-ID:
p. 1152-1160
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Transparent and accessible reporting of COVID-19 data is critical for public health efforts. Each Indian state has its own mechanism for reporting COVID-19 data, and the quality of their reporting has not been systematically evaluated. We present a comprehensive assessment of the quality of COVID-19 data reporting done by the Indian state governments between 19 May and 1 June, 2020.


    We designed a semi-quantitative framework with 45 indicators to assess the quality of COVID-19 data reporting. The framework captures four key aspects of public health data reporting – availability, accessibility, granularity, and privacy. We used this framework to calculate a COVID-19 Data Reporting Score (CDRS, ranging from 0–1) for each state.


    Our results indicate a large disparity in the quality of COVID-19 data reporting across India. CDRS varies from 0.61 (good) in Karnataka to 0.0 (poor) in Bihar and Uttar Pradesh, with a median value of 0.26. Ten states do not report data stratified by age, gender, comorbidities or districts. Only ten states provide trend graphics for COVID-19 data. In addition, we identify that Punjab and Chandigarh compromised the privacy of individuals under quarantine by publicly releasing their personally identifiable information. The CDRS is positively associated with the state’smore »sustainable development index for good health and well-being (Pearson correlation:r=0.630,p=0.0003).


    Our assessment informs the public health efforts in India and serves as a guideline for pandemic data reporting. The disparity in CDRS highlights three important findings at the national, state, and individual level. At the national level, it shows the lack of a unified framework for reporting COVID-19 data in India, and highlights the need for a central agency to monitor or audit the quality of data reporting done by the states. Without a unified framework, it is difficult to aggregate the data from different states, gain insights, and coordinate an effective nationwide response to the pandemic. Moreover, it reflects the inadequacy in coordination or sharing of resources among the states. The disparate reporting score also reflects inequality in individual access to public health information and privacy protection based on the state of residence.

    « less
  2. Abstract Objective

    Electronic medical records (EMRs) can support medical research and discovery, but privacy risks limit the sharing of such data on a wide scale. Various approaches have been developed to mitigate risk, including record simulation via generative adversarial networks (GANs). While showing promise in certain application domains, GANs lack a principled approach for EMR data that induces subpar simulation. In this article, we improve EMR simulation through a novel pipeline that (1) enhances the learning model, (2) incorporates evaluation criteria for data utility that informs learning, and (3) refines the training process.

    Materials and Methods

    We propose a new electronic health record generator using a GAN with a Wasserstein divergence and layer normalization techniques. We designed 2 utility measures to characterize similarity in the structural properties of real and simulated EMRs in the original and latent space, respectively. We applied a filtering strategy to enhance GAN training for low-prevalence clinical concepts. We evaluated the new and existing GANs with utility and privacy measures (membership and disclosure attacks) using billing codes from over 1 million EMRs at Vanderbilt University Medical Center.


    The proposed model outperformed the state-of-the-art approaches with significant improvement in retaining the nature of real records, including prediction performance andmore »structural properties, without sacrificing privacy. Additionally, the filtering strategy achieved higher utility when the EMR training dataset was small.


    These findings illustrate that EMR simulation through GANs can be substantially improved through more appropriate training, modeling, and evaluation criteria.

    « less
  3. Abstract Background Personal privacy is a significant concern in the era of big data. In the field of health geography, personal health data are collected with geographic location information which may increase disclosure risk and threaten personal geoprivacy. Geomasking is used to protect individuals’ geoprivacy by masking the geographic location information, and spatial k-anonymity is widely used to measure the disclosure risk after geomasking is applied. With the emergence of individual GPS trajectory datasets that contains large volumes of confidential geospatial information, disclosure risk can no longer be comprehensively assessed by the spatial k-anonymity method. Methods This study proposes and develops daily activity locations (DAL) k-anonymity as a new method for evaluating the disclosure risk of GPS data. Instead of calculating disclosure risk based on only one geographic location (e.g., home) of an individual, the new DAL k-anonymity is a composite evaluation of disclosure risk based on all activity locations of an individual and the time he/she spends at each location abstracted from GPS datasets. With a simulated individual GPS dataset, we present case studies of applying DAL k-anonymity in various scenarios to investigate its performance. The results of applying DAL k-anonymity are also compared with those obtained with spatialmore »k-anonymity under these scenarios. Results The results of this study indicate that DAL k-anonymity provides a better estimation of the disclosure risk than does spatial k-anonymity. In various case-study scenarios of individual GPS data, DAL k-anonymity provides a more effective method for evaluating the disclosure risk by considering the probability of re-identifying an individual’s home and all the other daily activity locations. Conclusions This new method provides a quantitative means for understanding the disclosure risk of sharing or publishing GPS data. It also helps shed new light on the development of new geomasking methods for GPS datasets. Ultimately, the findings of this study will help to protect individual geoprivacy while benefiting the research community by promoting and facilitating geospatial data sharing.« less
  4. Differential privacy concepts have been successfully used to protect anonymity of individuals in population-scale analysis. Sharing of mobile sensor data, especially physiological data, raise different privacy challenges, that of protecting private behaviors that can be revealed from time series of sensor data. Existing privacy mechanisms rely on noise addition and data perturbation. But the accuracy requirement on inferences drawn from physiological data, together with well-established limits within which these data values occur, render traditional privacy mechanisms inapplicable. In this work, we define a new behavioral privacy metric based on differential privacy and propose a novel data substitution mechanism to protect behavioral privacy. We evaluate the efficacy of our scheme using 660 hours of ECG, respiration, and activity data collected from 43 participants and demonstrate that it is possible to retain meaningful utility, in terms of inference accuracy (90%), while simultaneously preserving the privacy of sensitive behaviors.
  5. Background Social networks such as Twitter offer the clinical research community a novel opportunity for engaging potential study participants based on user activity data. However, the availability of public social media data has led to new ethical challenges about respecting user privacy and the appropriateness of monitoring social media for clinical trial recruitment. Researchers have voiced the need for involving users’ perspectives in the development of ethical norms and regulations. Objective This study examined the attitudes and level of concern among Twitter users and nonusers about using Twitter for monitoring social media users and their conversations to recruit potential clinical trial participants. Methods We used two online methods for recruiting study participants: the open survey was (1) advertised on Twitter between May 23 and June 8, 2017, and (2) deployed on TurkPrime, a crowdsourcing data acquisition platform, between May 23 and June 8, 2017. Eligible participants were adults, 18 years of age or older, who lived in the United States. People with and without Twitter accounts were included in the study. Results While nearly half the respondents—on Twitter (94/603, 15.6%) and on TurkPrime (509/603, 84.4%)—indicated agreement that social media monitoring constitutes a form of eavesdropping that invades their privacy, overmore »one-third disagreed and nearly 1 in 5 had no opinion. A chi-square test revealed a positive relationship between respondents’ general privacy concern and their average concern about Internet research (P<.005). We found associations between respondents’ Twitter literacy and their concerns about the ability for researchers to monitor their Twitter activity for clinical trial recruitment (P=.001) and whether they consider Twitter monitoring for clinical trial recruitment as eavesdropping (P<.001) and an invasion of privacy (P=.003). As Twitter literacy increased, so did people’s concerns about researchers monitoring Twitter activity. Our data support the previously suggested use of the nonexceptionalist methodology for assessing social media in research, insofar as social media-based recruitment does not need to be considered exceptional and, for most, it is considered preferable to traditional in-person interventions at physical clinics. The expressed attitudes were highly contextual, depending on factors such as the type of disease or health topic (eg, HIV/AIDS vs obesity vs smoking), the entity or person monitoring users on Twitter, and the monitored information. Conclusions The data and findings from this study contribute to the critical dialogue with the public about the use of social media in clinical research. The findings suggest that most users do not think that monitoring Twitter for clinical trial recruitment constitutes inappropriate surveillance or a violation of privacy. However, researchers should remain mindful that some participants might find social media monitoring problematic when connected with certain conditions or health topics. Further research should isolate factors that influence the level of concern among social media users across platforms and populations and inform the development of more clear and consistent guidelines.« less