skip to main content

Title: Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization
Distributed learning allows a group of independent data owners to collaboratively learn a model over their data sets without exposing their private data. We present a distributed learning approach that combines differential privacy with secure multi-party computation. We explore two popular methods of differential privacy, output perturbation and gradient perturbation, and advance the state-of-the-art for both methods in the distributed learning setting. In our output perturbation method, the parties combine local models within a secure computation and then add the required differential privacy noise before revealing the model. In our gradient perturbation method, the data owners collaboratively train a global model via an iterative learning algorithm. At each iteration, the parties aggregate their local gradients within a secure computation, adding sufficient noise to ensure privacy before the gradient updates are revealed. For both methods, we show that the noise can be reduced in the multi-party setting by adding the noise inside the secure computation after aggregation, asymptotically improving upon the best previous results. Experiments on real world data sets demonstrate that our methods provide substantial utility gains for typical privacy requirements.
Award ID(s):
Publication Date:
Journal Name:
Advances in neural information processing systems
Sponsoring Org:
National Science Foundation
More Like this
  1. Preserving differential privacy has been well studied under the centralized setting. However, it’s very challenging to preserve differential privacy under multiparty setting, especially for the vertically partitioned case. In this work, we propose a new framework for differential privacy preserving multiparty learning in the vertically partitioned setting. Our core idea is based on the functional mechanism that achieves differential privacy of the released model by adding noise to the objective function. We show the server can simply dissect the objective function into single-party and cross-party sub-functions, and allocate computation and perturbation of their polynomial coefficients t o l ocal p arties. Our method n eeds o nly o ne r ound of noise addition and secure aggregation. The released model in our framework achieves the same utility as applying the functional mechanism in the centralized setting. Evaluation on real-world and synthetic datasets for linear and logistic regressions shows the effectiveness of our proposed method.
  2. Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. Inmore »particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.« less
  3. Secure multi-party computation (MPC) allows multiple parties to jointly compute the output of a function while preserving the privacy of any individual party's inputs to that function. As MPC protocols transition from research prototypes to real-world applications, the usability of MPC-enabled applications is increasingly critical to their successful deployment and wide adoption. Our Web-MPC platform, designed with a focus on usability, has been deployed for privacy-preserving data aggregation initiatives with the City of Boston and the Greater Boston Chamber of Commerce. After building and deploying an initial version of this platform, we conducted a heuristic evaluation to identify additional usability improvements and implemented corresponding application enhancements. However, it is difficult to gauge the effectiveness of these changes within the context of real-world deployments using traditional web analytics tools without compromising the security guarantees of the platform. This work consists of two contributions that address this challenge: (1) the Web-MPC platform has been extended with the capability to collect web analytics using existing MPC protocols, and (2) this capability has been leveraged to conduct a usability study comparing the two version of Web-MPC (before and after the heuristic evaluation and associated improvements). While many efforts have focused on ways to enhancemore »the usability of privacy-preserving technologies, this study can serve as a model for using a privacy-preserving data-driven approach in evaluating or enhancing the usability of privacy-preserving websites and applications deployed in real-world scenarios. The data collected in this study yields insights about the interplay between usability and security that can help inform future implementations of applications that employ MPC.« less
  4. Differential privacy offers a formal privacy guarantee for individuals, but many deployments of differentially private systems require a trusted third party (the data curator). We propose DuetSGX, a system that uses secure hardware (Intel’s SGX) to eliminate the need for a trusted data curator. Data owners submit encrypted data that can be decrypted only within a secure enclave running the DuetSGX system, ensuring that sensitive data is never available to the data curator. Analysts submit queries written in the Duet language, which is specifically designed for verifying that programs satisfy differential privacy; DuetSGX uses the Duet typechecker to verify that each query satisfies differential privacy before running it. DuetSGX therefore provides the benefits of local differential privacy and central differential privacy simultaneously: noise is only added to final results, and there is no trusted third party. We have implemented a proof-of-concept implementation of DuetSGX and we release it as open-source.
  5. Preserving privacy in machine learning on multi-party data is of importance to many domains. In practice, existing solutions suffer from several critical limitations, such as significantly reduced utility under privacy constraints or excessive communication burden between the information fusion center and local data providers. In this paper, we propose and implement a new distributed deep learning framework that addresses these shortcomings and preserves privacy more efficiently than previous methods. During the stochastic gradient descent training of a deep neural network, we focus on the parameters with large absolute gradients in order to save privacy budget consumption. We adopt a generalization of the Report-Noisy-Max algorithm in differential privacy to select these gradients and prove its privacy guarantee rigorously. Inspired by the recent novel idea of Terngrad, we also quantize the released gradients to ternary levels {-B, 0, B}, where B is the bound of gradient clipping. Applying Terngrad can significantly reduce the communication cost without incurring severe accuracy loss. Furthermore, we evaluate the performance of our method on a real-world credit card fraud detection data set consisting of millions of transactions.