skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis
Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.  more » « less
Award ID(s):
1838200
PAR ID:
10168360
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Page Range / eLocation ID:
1291 to 1300
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Irfan Awan; Muhammad Younas; Jamal Bentahar; Salima Benbernou (Ed.)
    Multi-site clinical trial systems face security challenges when streamlining information sharing while protecting patient privacy. In addition, patient enrollment, transparency, traceability, data integrity, and reporting in clinical trial systems are all critical aspects of maintaining data compliance. A Blockchain-based clinical trial framework has been proposed by lots of researchers and industrial companies recently, but its limitations of lack of data governance, limited confidentiality, and high communication overhead made data-sharing systems insecure and not efficient. We propose π–²π—ˆπ—π–Ύπ—‹π—‚π–Ί, a privacy-preserving smart contracts framework, to manage, share and analyze clinical trial data on fabric private chaincode (FPC). Compared to public Blockchain, fabric has fewer participants with an efficient consensus protocol. π–²π—ˆπ—π–Ύπ—‹π—‚π–Ί consists of several modules: patient consent and clinical trial approval management chaincode, secure execution for confidential data sharing, API Gateway, and decentralized data governance with adaptive threshold signature (ATS). We implemented two versions of π–²π—ˆπ—π–Ύπ—‹π—‚π–Ί with non-SGX deploys on AWS blockchain and SGX-based on a local data center. We evaluated the response time for all of the access endpoints on AWS Managed Blockchain, and demonstrated the utilization of SGX-based smart contracts for data sharing and analysis. 
    more » « less
  2. Artificial Intelligence (AI) has demonstrated strong potential in automating medical imaging tasks, with potential applications across disease diagnosis, prognosis, treatment planning, and posttreatment surveillance. However, privacy concerns surrounding patient data remain a major barrier to the widespread adoption of AI in clinical practice, as large and diverse training datasets are essential for developing accurate, robust, and generalizable AI models. Federated Learning offers a privacy-preserving solution by enabling collaborative model training across institutions without sharing sensitive data. Instead, model parameters, such as model weights, are exchanged between participating sites. Despite its potential, federated learning is still in its early stages of development and faces several challenges. Notably, sensitive information can still be inferred from the shared model parameters. Additionally, postdeployment data distribution shifts can degrade model performance, making uncertainty quantification essential. In federated learning, this task is particularly challenging due to data heterogeneity across participating sites. This review provides a comprehensive overview of federated learning, privacy-preserving federated learning, and uncertainty quantification in federated learning. Key limitations in current methodologies are identified, and future research directions are proposed to enhance data privacy and trustworthiness in medical imaging applications 
    more » « less
  3. Differential privacy has emerged as a leading theoretical framework for privacy-preserving data gathering and analysis. It allows meaningful statistics to be collected for a population without revealing ``too much'' information about any individual member of the population. For software profiling, this machinery allows profiling data from many users of a deployed software system to be collected and analyzed in a privacy-preserving manner. Such a solution is appealing to many stakeholders, including software users, software developers, infrastructure providers, and government agencies. We propose an approach for differentially-private collection of frequency vectors from software executions. Frequency information is reported with the addition of random noise drawn from the Laplace distribution. A key observation behind the design of our scheme is that event frequencies are closely correlated due to the static code structure. Differential privacy protections must account for such relationships; otherwise, a seemingly-strong privacy guarantee is actually weaker than it appears. Motivated by this observation, we propose a novel and general differentially-private profiling scheme when correlations between frequencies can be expressed through linear inequalities. Using a linear programming formulation, we show how to determine the magnitude of random noise that should be added to achieve meaningful privacy protections under such linear constraints. Next, we develop an efficient instance of this general machinery for an important subclass of constraints. Instead of LP, our solution uses a reachability analysis of a constraint graph. As an exemplar, we employ this approach to implement differentially-private method frequency profiling for Android apps. Any differentially-private scheme has to balance two competing aspects: privacy and accuracy. Through an experimental study to characterize these trade-offs, we (1) show that our proposed randomization achieves much higher accuracy compared to related prior work, (2) demonstrate that high accuracy and high privacy protection can be achieved simultaneously, and (3) highlight the importance of linear constraints in the design of the randomization. These promising results provide evidence that our approach is a good candidate for privacy-preserving frequency profiling of deployed software. 
    more » « less
  4. Federated learning (FL) has been widely studied recently due to its property to collaboratively train data from different devices without sharing the raw data. Nevertheless, recent studies show that an adversary can still be possible to infer private information about devices' data, e.g., sensitive attributes such as income, race, and sexual orientation. To mitigate the attribute inference attacks, various existing privacy-preserving FL methods can be adopted/adapted. However, all these existing methods have key limitations: they need to know the FL task in advance, or have intolerable computational overheads or utility losses, or do not have provable privacy guarantees. We address these issues and design a task-agnostic privacy-preserving presentation learning method for FL (TAPPFL) against attribute inference attacks. TAPPFL is formulated via information theory. Specifically, TAPPFL has two mutual information goals, where one goal learns task-agnostic data representations that contain the least information about the private attribute in each device's data, and the other goal ensures the learnt data representations include as much information as possible about the device data to maintain FL utility. We also derive privacy guarantees of TAPPFL against worst-case attribute inference attacks, as well as the inherent tradeoff between utility preservation and privacy protection. Extensive results on multiple datasets and applications validate the effectiveness of TAPPFL to protect data privacy, maintain the FL utility, and be efficient as well. Experimental results also show that TAPPFL outperforms the existing defenses. 
    more » « less
  5. While embracing various machine learning techniques to make effective decisions in the big data era, preserving the privacy of sensitive data poses significant challenges. In this paper, we develop a privacy-preserving distributed machine learning algorithm to address this issue. Given the assumption that each data provider owns a dataset with different sample size, our goal is to learn a common classifier over the union of all the local datasets in a distributed way without leaking any sensitive information of the data samples. Such an algorithm needs to jointly consider efficient distributed learning and effective privacy preservation. In the proposed algorithm, we extend stochastic alternating direction method of multipliers (ADMM) in a distributed setting to do distributed learning. For preserving privacy during the iterative process, we combine differential privacy and stochastic ADMM together. In particular, we propose a novel stochastic ADMM based privacy-preserving distributed machine learning (PS-ADMM) algorithm by perturbing the updating gradients, that provide differential privacy guarantee and have a low computational cost. We theoretically demonstrate the convergence rate and utility bound of our proposed PS-ADMM under strongly convex objective. Through our experiments performed on real-world datasets, we show that PS-ADMM outperforms other differentially private ADMM algorithms under the same differential privacy guarantee. 
    more » « less