- Award ID(s):
- 2220433
- PAR ID:
- 10467379
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9781450394505
- Page Range / eLocation ID:
- 2249 to 2263
- Format(s):
- Medium: X
- Location:
- Los Angeles CA USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.more » « less
-
Abstract While the practicality of secure multi-party computation (MPC) has been extensively analyzed and improved over the past decade, we are hitting the limits of efficiency with the traditional approaches of representing the computed functionalities as generic arithmetic or Boolean circuits. This work follows the design principle of identifying and constructing fast and provably-secure MPC protocols to evaluate useful high-level algebraic abstractions; thus, improving the efficiency of all applications relying on them. We present Polymath, a constant-round secure computation protocol suite for the secure evaluation of (multi-variate) polynomials of scalars and matrices, functionalities essential to numerous data-processing applications. Using precise natural precomputation and high-degree of parallelism prevalent in the modern computing environments, Polymath can make latency of secure polynomial evaluations of scalars and matrices independent of polynomial degree and matrix dimensions. We implement our protocols over the HoneyBadgerMPC library and apply it to two prominent secure computation tasks: privacy-preserving evaluation of decision trees and privacy-preserving evaluation of Markov processes. For the decision tree evaluation problem, we demonstrate the feasibility of evaluating high-depth decision tree models in a general n -party setting. For the Markov process application, we demonstrate that Poly-math can compute large powers of transition matrices with better online time and less communication.more » « less
-
Abstract Summary Predictive learning from medical data incurs additional challenge due to concerns over privacy and security of personal data. Federated learning, intentionally structured to preserve high level of privacy, is emerging to be an attractive way to generate cross-silo predictions in medical scenarios. However, the impact of severe population-level heterogeneity on federated learners is not well explored. In this article, we propose a methodology to detect presence of population heterogeneity in federated settings and propose a solution to handle such heterogeneity by developing a federated version of Deep Regression Forests. Additionally, we demonstrate that the recently conceptualized REpresentation of Features as Images with NEighborhood Dependencies CNN framework can be combined with the proposed Federated Deep Regression Forests to provide improved performance as compared to existing approaches.
Availability and implementation The Python source code for reproducing the main results are available on GitHub: https://github.com/DanielNolte/FederatedDeepRegressionForests.
Contact ranadip.pal@ttu.edu
Supplementary information Supplementary data are available at Bioinformatics Advances online.
-
Machine Learning (ML) algorithms have shown quite promising applications in smart meter data analytics enabling intelligent energy management systems for the Advanced Metering Infrastructure (AMI). One of the major challenges in developing ML applications for the AMI is to preserve user privacy while allowing active end-users participation. This paper addresses this challenge and proposes Differential Privacy-enabled AMI with Federated Learning (DP-AMI-FL), framework for ML-based applications in the AMI. This framework provides two layers of privacy protection: first, it keeps the raw data of consumers hosting ML applications at edge devices (smart meters) with Federated Learning (FL), and second, it obfuscates the ML models using Differential Privacy (DP) to avoid privacy leakage threats on the models posed by various inference attacks. The framework is evaluated by analyzing its performance on a use case aimed to improve Short-Term Load Forecasting (STLF) for residential consumers having smart meters and home energy management systems. Extensive experiments demonstrate that the framework when used with Long Short-Term Memory (LSTM) recurrent neural network models, achieves high forecasting accuracy while preserving users data privacy.more » « less
-
Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the ’bag’ of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks.more » « less