The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then alignsmore »
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Symptoms-tracking applications allow crowdsensing of health and location related data from individuals to track the spread and outbreaks of infectious diseases. During the COVID-19 pandemic, for the first time in history, these apps were widely adopted across the world to combat the pandemic. However, due to the sensitive nature of the data collected by these apps, serious privacy concerns were raised and apps were critiqued for their insufficient privacy safeguards. The Covid Nearby project was launched to develop a privacy-focused symptoms-tracking app and to understand the privacy preferences of users in health emergencies. In this work, we draw on the insights from the Covid Nearby users' data, and present an analysis of the significantly varying trends in users' privacy preferences with respect to demographics, attitude towards information sharing, and health concerns, e.g. after being possibly exposed to COVID-19. These results and insights can inform health informatics researchers and policy designers in developing more socially acceptable health apps in the future.Free, publicly-accessible full text available November 7, 2023
-
Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals’ privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate themore »Free, publicly-accessible full text available October 19, 2023
-
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals’ privacy in the researcher’s dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata and show that the additional privacy risk due to the providedmore »
-
Privacy Attitudes and COVID Symptom Tracking Apps: Understanding Active Boundary Management by UsersMultiple symptom tracking applications (apps) were created during the early phase of the COVID-19 pandemic. While they provided crowdsourced information about the state of the pandemic in a scalable manner, they also posed significant privacy risks for individuals. The present study investigates the interplay between individual privacy attitudes and the adoption of symptom tracking apps. Using the communication privacy theory as a framework, it studies how users’ privacy attitudes changed during the public health emergency compared to the pre-COVID times. Based on focus-group interviews (N = 21), this paper reports significant changes in users’ privacy attitudes toward such apps. Research participants shared various reasons for both increased acceptability (e.g., disease uncertainty, public good) and decreased acceptability (e.g., reduced utility due to changed lifestyle) during COVID. The results of this study can assist health informatics researchers and policy designers in creating more socially acceptable health apps in the future.
-
Intelligently responding to a pandemic like Covid-19 requires sophisticated models over accurate real-time data, which is typically lacking at the start, e.g., due to deficient population testing. In such times, crowdsensing of spatially tagged disease-related symptoms provides an alternative way of acquiring real-time insights about the pandemic. Existing crowdsensing systems aggregate and release data for pre-fixed regions, e.g., counties. However, the insights obtained from such aggregates do not provide useful information about smaller regions e.g., neighborhoods where outbreaks typically occur and the aggregate-and-release method is vulnerable to privacy attacks. Therefore, we propose a novel differentially private method to obtain accurate insights from crowdsensed data for any number of regions specified by the users (e.g., researchers and a policy makers) without compromising privacy of the data contributors. Our approach, which has been implemented and deployed, informs the development of the future privacy-preserving intelligent systems for longitudinal and spatial data analytics.