skip to main content

Title: Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis
Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Page Range / eLocation ID:
1291 to 1300
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Irfan Awan ; Muhammad Younas ; Jamal Bentahar ; Salima Benbernou (Ed.)
    Multi-site clinical trial systems face security challenges when streamlining information sharing while protecting patient privacy. In addition, patient enrollment, transparency, traceability, data integrity, and reporting in clinical trial systems are all critical aspects of maintaining data compliance. A Blockchain-based clinical trial framework has been proposed by lots of researchers and industrial companies recently, but its limitations of lack of data governance, limited confidentiality, and high communication overhead made data-sharing systems insecure and not efficient. We propose 𝖲𝗈𝗍𝖾𝗋𝗂𝖺, a privacy-preserving smart contracts framework, to manage, share and analyze clinical trial data on fabric private chaincode (FPC). Compared to public Blockchain, fabric has fewer participants with an efficient consensus protocol. 𝖲𝗈𝗍𝖾𝗋𝗂𝖺 consists of several modules: patient consent and clinical trial approval management chaincode, secure execution for confidential data sharing, API Gateway, and decentralized data governance with adaptive threshold signature (ATS). We implemented two versions of 𝖲𝗈𝗍𝖾𝗋𝗂𝖺 with non-SGX deploys on AWS blockchain and SGX-based on a local data center. We evaluated the response time for all of the access endpoints on AWS Managed Blockchain, and demonstrated the utilization of SGX-based smart contracts for data sharing and analysis. 
    more » « less
  2. Abstract Purpose

    Most commercially available treatment planning systems (TPSs) approximate the continuous delivery of volumetric modulated arc therapy (VMAT) plans with a series of discretized static beams for treatment planning, which can make VMAT dose computation extremely inefficient. In this study, we developed a polar‐coordinate‐based pencil beam (PB) algorithm for efficient VMAT dose computation with high‐resolution gantry angle sampling that can improve the computational efficiency and reduce the dose discrepancy due to the angular under‐sampling effect.

    Methods and Materials

    6 MV pencil beams were simulated on a uniform cylindrical phantom under an EGSnrc Monte Carlo (MC) environment. The MC‐generated PB kernels were collected in the polar coordinate system for each bixel on a fluence map and subsequently fitted via a series of Gaussians. The fluence was calculated using a detectors’ eye view with off‐axis and MLC transmission factors corrected. Doses of VMAT arc on the phantom were computed by summing the convolution results between the corresponding PB kernels and fluence for each bixel in the polar coordinate system. The convolution was performed using fast Fourier transform to expedite the computing speed. The calculated doses were converted to the Cartesian coordinate system and compared with the reference dose computed by a collapsed cone convolution (CCC) algorithm of the TPS. A heterogeneous phantom was created to study the heterogeneity corrections using the proposed algorithm. Ten VMAT arcs were included to evaluate the algorithm performance. Gamma analysis and computation complexity theory were used to measure the dosimetric accuracy and computational efficiency, respectively.


    The dosimetric comparisons on the homogeneous phantom between the proposed PB algorithm and the CCC algorithm for 10 VMAT arcs demonstrate that the proposed algorithm can achieve a dosimetric accuracy comparable to that of the CCC algorithm with average gamma passing rates of 96% (2%/2mm) and 98% (3%/3mm). In addition, the proposed algorithm can provide better computational efficiency for VMAT dose computation using a PC equipped with a 4‐core processor, compared to the CCC algorithm utilizing a dual 10‐core server. Moreover, the computation complexity theory reveals that the proposed algorithm has a great advantage with regard to computational efficiency for VMAT dose computation on homogeneous medium, especially when a fine angular sampling rate is applied. This can support a reduction in dose errors from the angular under‐sampling effect by using a finer angular sampling rate, while still preserving a practical computing speed. For dose calculation on the heterogeneous phantom, the proposed algorithm with heterogeneity corrections can still offer a reasonable dosimetric accuracy with comparable computational efficiency to that of the CCC algorithm.


    We proposed a novel polar‐coordinate‐based pencil beam algorithm for VMAT dose computation that enables a better computational efficiency while maintaining clinically acceptable dosimetric accuracy and reducing dose error caused by the angular under‐sampling effect. It also provides a flexible VMAT dose computation structure that allows adjustable sampling rates and direct dose computation in regions of interest, which makes the algorithm potentially useful for clinical applications such as independent dose verification for VMAT patient‐specific QA.

    more » « less
  3. Differential privacy concepts have been successfully used to protect anonymity of individuals in population-scale analysis. Sharing of mobile sensor data, especially physiological data, raise different privacy challenges, that of protecting private behaviors that can be revealed from time series of sensor data. Existing privacy mechanisms rely on noise addition and data perturbation. But the accuracy requirement on inferences drawn from physiological data, together with well-established limits within which these data values occur, render traditional privacy mechanisms inapplicable. In this work, we define a new behavioral privacy metric based on differential privacy and propose a novel data substitution mechanism to protect behavioral privacy. We evaluate the efficacy of our scheme using 660 hours of ECG, respiration, and activity data collected from 43 participants and demonstrate that it is possible to retain meaningful utility, in terms of inference accuracy (90%), while simultaneously preserving the privacy of sensitive behaviors. 
    more » « less
  4. While embracing various machine learning techniques to make effective decisions in the big data era, preserving the privacy of sensitive data poses significant challenges. In this paper, we develop a privacy-preserving distributed machine learning algorithm to address this issue. Given the assumption that each data provider owns a dataset with different sample size, our goal is to learn a common classifier over the union of all the local datasets in a distributed way without leaking any sensitive information of the data samples. Such an algorithm needs to jointly consider efficient distributed learning and effective privacy preservation. In the proposed algorithm, we extend stochastic alternating direction method of multipliers (ADMM) in a distributed setting to do distributed learning. For preserving privacy during the iterative process, we combine differential privacy and stochastic ADMM together. In particular, we propose a novel stochastic ADMM based privacy-preserving distributed machine learning (PS-ADMM) algorithm by perturbing the updating gradients, that provide differential privacy guarantee and have a low computational cost. We theoretically demonstrate the convergence rate and utility bound of our proposed PS-ADMM under strongly convex objective. Through our experiments performed on real-world datasets, we show that PS-ADMM outperforms other differentially private ADMM algorithms under the same differential privacy guarantee. 
    more » « less
  5. null (Ed.)
    Privacy concerns on sharing sensitive data across institutions are particularly paramount for the medical domain, which hinders the research and development of many applications, such as cohort construction for cross-institution observational studies and disease surveillance. Not only that, the large volume and heterogeneity of the patient data pose great challenges for retrieval and analysis. To address these challenges, in this paper, we propose a Federated Patient Hashing (FPH) framework, which collaboratively trains a retrieval model stored in a shared memory while keeping all the patient-level information in local institutions. Specifically, the objective function is constructed by minimization of a similarity preserving loss and a heterogeneity digging loss, which preserves both inter-data and intra-data relationships. Then, by leveraging the concept of Bregman divergence, we implement optimization in a federated manner in both centralized and decentralized learning settings, without accessing the raw training data across institutions. In addition to this, we also analyze the convergence rate of the FPH framework. Extensive experiments on real-world clinical data set from critical care are provided to demonstrate the effectiveness of the proposed method on similar patient matching across institutions. 
    more » « less