skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Title: Sharing personal ECG time-series data privately
Abstract Objective

Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics.

Methods and materials

We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis.


We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering).


Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors.


The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.

more » « less
Award ID(s):
2040727 1949217 1951430
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the American Medical Informatics Association
Medium: X Size: p. 1152-1160
p. 1152-1160
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Transparent and accessible reporting of COVID-19 data is critical for public health efforts. Each Indian state has its own mechanism for reporting COVID-19 data, and the quality of their reporting has not been systematically evaluated. We present a comprehensive assessment of the quality of COVID-19 data reporting done by the Indian state governments between 19 May and 1 June, 2020.


    We designed a semi-quantitative framework with 45 indicators to assess the quality of COVID-19 data reporting. The framework captures four key aspects of public health data reporting – availability, accessibility, granularity, and privacy. We used this framework to calculate a COVID-19 Data Reporting Score (CDRS, ranging from 0–1) for each state.


    Our results indicate a large disparity in the quality of COVID-19 data reporting across India. CDRS varies from 0.61 (good) in Karnataka to 0.0 (poor) in Bihar and Uttar Pradesh, with a median value of 0.26. Ten states do not report data stratified by age, gender, comorbidities or districts. Only ten states provide trend graphics for COVID-19 data. In addition, we identify that Punjab and Chandigarh compromised the privacy of individuals under quarantine by publicly releasing their personally identifiable information. The CDRS is positively associated with the state’s sustainable development index for good health and well-being (Pearson correlation:r=0.630,p=0.0003).


    Our assessment informs the public health efforts in India and serves as a guideline for pandemic data reporting. The disparity in CDRS highlights three important findings at the national, state, and individual level. At the national level, it shows the lack of a unified framework for reporting COVID-19 data in India, and highlights the need for a central agency to monitor or audit the quality of data reporting done by the states. Without a unified framework, it is difficult to aggregate the data from different states, gain insights, and coordinate an effective nationwide response to the pandemic. Moreover, it reflects the inadequacy in coordination or sharing of resources among the states. The disparate reporting score also reflects inequality in individual access to public health information and privacy protection based on the state of residence.

    more » « less
  2. Differential privacy concepts have been successfully used to protect anonymity of individuals in population-scale analysis. Sharing of mobile sensor data, especially physiological data, raise different privacy challenges, that of protecting private behaviors that can be revealed from time series of sensor data. Existing privacy mechanisms rely on noise addition and data perturbation. But the accuracy requirement on inferences drawn from physiological data, together with well-established limits within which these data values occur, render traditional privacy mechanisms inapplicable. In this work, we define a new behavioral privacy metric based on differential privacy and propose a novel data substitution mechanism to protect behavioral privacy. We evaluate the efficacy of our scheme using 660 hours of ECG, respiration, and activity data collected from 43 participants and demonstrate that it is possible to retain meaningful utility, in terms of inference accuracy (90%), while simultaneously preserving the privacy of sensitive behaviors. 
    more » « less
  3. Abstract Motivation

    Analysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort.


    We developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns.

    Availability and implementation

    The source code and instructions to download the data have been deposited on GitHub at

    more » « less
  4. null (Ed.)
    The increased availability of on-body sensors gives researchers access to rich time-series data, many of which are related to human health conditions. Sharing such data can allow cross-institutional collaborations that create advanced data-driven models to make inferences on human well-being. However, such data are usually considered privacy-sensitive, and publicly sharing this data may incur significant privacy concerns. In this work, we seek to protect clinical time-series data against membership inference attacks, while maximally retaining the data utility. We achieve this by adding an imperceptible noise to the raw data. Known as adversarial perturbations, the noise is specially trained to force a deep learning model to make inference mistakes (in our case, mispredicting user identities). Our preliminary results show that our solution can better protect the data from membership inference attacks than the baselines, while succeeding in all the designed data quality checks. 
    more » « less
  5. A Mavragani (Ed.)

    Posttraumatic stress disorder (PTSD) is a serious public health concern. However, individuals with PTSD often do not have access to adequate treatment. A conversational agent (CA) can help to bridge the treatment gap by providing interactive and timely interventions at scale. Toward this goal, we have developed PTSDialogue—a CA to support the self-management of individuals living with PTSD. PTSDialogue is designed to be highly interactive (eg, brief questions, ability to specify preferences, and quick turn-taking) and supports social presence to promote user engagement and sustain adherence. It includes a range of support features, including psychoeducation, assessment tools, and several symptom management tools.


    This paper focuses on the preliminary evaluation of PTSDialogue from clinical experts. Given that PTSDialogue focuses on a vulnerable population, it is critical to establish its usability and acceptance with clinical experts before deployment. Expert feedback is also important to ensure user safety and effective risk management in CAs aiming to support individuals living with PTSD.


    We conducted remote, one-on-one, semistructured interviews with clinical experts (N=10) to gather insight into the use of CAs. All participants have completed their doctoral degrees and have prior experience in PTSD care. The web-based PTSDialogue prototype was then shared with the participant so that they could interact with different functionalities and features. We encouraged them to “think aloud” as they interacted with the prototype. Participants also shared their screens throughout the interaction session. A semistructured interview script was also used to gather insights and feedback from the participants. The sample size is consistent with that of prior works. We analyzed interview data using a qualitative interpretivist approach resulting in a bottom-up thematic analysis.


    Our data establish the feasibility and acceptance of PTSDialogue, a supportive tool for individuals with PTSD. Most participants agreed that PTSDialogue could be useful for supporting self-management of individuals with PTSD. We have also assessed how features, functionalities, and interactions in PTSDialogue can support different self-management needs and strategies for this population. These data were then used to identify design requirements and guidelines for a CA aiming to support individuals with PTSD. Experts specifically noted the importance of empathetic and tailored CA interactions for effective PTSD self-management. They also suggested steps to ensure safe and engaging interactions with PTSDialogue.


    Based on interviews with experts, we have provided design recommendations for future CAs aiming to support vulnerable populations. The study suggests that well-designed CAs have the potential to reshape effective intervention delivery and help address the treatment gap in mental health.

    more » « less