skip to main content


Title: OpBoost: a vertical federated tree boosting framework based on order-preserving desensitization
Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.  more » « less
Award ID(s):
2308730 2034870 2046335 2302689
NSF-PAR ID:
10407008
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
16
Issue:
2
ISSN:
2150-8097
Page Range / eLocation ID:
202 to 215
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine Learning (ML) algorithms have shown quite promising applications in smart meter data analytics enabling intelligent energy management systems for the Advanced Metering Infrastructure (AMI). One of the major challenges in developing ML applications for the AMI is to preserve user privacy while allowing active end-users participation. This paper addresses this challenge and proposes Differential Privacy-enabled AMI with Federated Learning (DP-AMI-FL), framework for ML-based applications in the AMI. This framework provides two layers of privacy protection: first, it keeps the raw data of consumers hosting ML applications at edge devices (smart meters) with Federated Learning (FL), and second, it obfuscates the ML models using Differential Privacy (DP) to avoid privacy leakage threats on the models posed by various inference attacks. The framework is evaluated by analyzing its performance on a use case aimed to improve Short-Term Load Forecasting (STLF) for residential consumers having smart meters and home energy management systems. Extensive experiments demonstrate that the framework when used with Long Short-Term Memory (LSTM) recurrent neural network models, achieves high forecasting accuracy while preserving users data privacy. 
    more » « less
  2. Low precision deep neural network (DNN) training is one of the most effective techniques for boosting DNNs’ training efficiency, as it trims down the training cost from the finest bit level. While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help NNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts. However, existing dynamic low precision training methods rely on manually designed precision schedules to achieve advantageous efficiency and accuracy trade-offs, limiting their more comprehensive practical applications and achievable performance. To this end, we propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs. It is worth noting that LDP-trained DNNs are by nature efficient during inference. Further more, we visualize the resulting temporal and spatial precision schedule and distribution of LDP trained DNNs on different tasks to better understand the corresponding DNNs’ characteristics at different training stages and DNN layers both during and after training, drawing insights for promoting further innovations. Extensive experiments and ablation studies (seven networks, five datasets, and three tasks) show that the proposed LDP consistently outperforms state-of-the-art (SOTA) low precision DNN training techniques in terms of training efficiency and achieved accuracy trade-offs. For example, in addition to having the advantage of being automated, our LDP achieves a 0.31% higher accuracy with a 39.1% lower computational cost when training ResNet-20 on CIFAR-10 as compared with the best SOTA method. 
    more » « less
  3. This paper studies a distributed optimization problem in the federated learning (FL) framework under differential privacy constraints, whereby a set of clients having local samples are connected to an untrusted server, who wants to learn a global model while preserving the privacy of clients’ local datasets. We propose a new client sampling called self-sampling that reflects the random availability of clients in the learning process in FL. We analyze the differential privacy of the SGD with client self-sampling by composing amplification by sub-sampling along with amplification by shuffling. Furthermore, we analyze the convergence of the proposed SGD algorithm showing that we can get a reasonable learning performance while preserving the privacy of clients’ data even with client self-sampling. 
    more » « less
  4. Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients’ transmitted model updates have to be clipped before adding privacy noise. Such clipping operation is substantially different from its counterpart of gradient clipping in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients’ updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients’ updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clipping operation in FL algorithms. 
    more » « less
  5. In recent decades, the advance of information technology and abundant personal data facilitate the application of algorithmic personalized pricing. However, this leads to the growing concern of potential violation of privacy because of adversarial attack. To address the privacy issue, this paper studies a dynamic personalized pricing problem with unknown nonparametric demand models under data privacy protection. Two concepts of data privacy, which have been widely applied in practices, are introduced: central differential privacy (CDP) and local differential privacy (LDP), which is proved to be stronger than CDP in many cases. We develop two algorithms that make pricing decisions and learn the unknown demand on the fly while satisfying the CDP and LDP guarantee, respectively. In particular, for the algorithm with CDP guarantee, the regret is proved to be at most [Formula: see text]. Here, the parameter T denotes the length of the time horizon, d is the dimension of the personalized information vector, and the key parameter [Formula: see text] measures the strength of privacy (smaller ε indicates a stronger privacy protection). Conversely, for the algorithm with LDP guarantee, its regret is proved to be at most [Formula: see text], which is near optimal as we prove a lower bound of [Formula: see text] for any algorithm with LDP guarantee. 
    more » « less