We consider a hierarchical inference system with multiple clients connected to a server via a shared communication resource. When necessary, clients with low-accuracy machine learning models can offload classification tasks to a server for processing on a high-accuracy model. We propose a distributed online offloading algorithm which maximizes the accuracy subject to a shared resource utilization constraint thus indirectly realizing accuracy-delay tradeoffs possible given an underlying network scheduler. The proposed algorithm, named Lyapunov-EXP4, introduces a loss structure based on Lyapunov-drift minimization techniques to the bandits with expert advice framework. We prove that the algorithm converges to a near-optimal threshold policy on the confidence of the clients’ local inference without prior knowledge of the system’s statistics and efficiently solves a constrained bandit problem with sublinear regret. We further consider settings where clients may employ multiple thresholds, allowing more aggressive optimization of overall accuracy at a possible loss in fairness. Extensive simulation results on real and synthetic data demonstrate convergence of Lyapunov-EXP4, and show the
more »
« less
Glance: A Generative Approach to Interactive Visualization of Voluminous Satellite Imagery
Challenges in interactive visualizations over satellite data collections stem primarily from their inherent data volumes. Enabling interactive visualizations of such data results in both processing and I/O (network and disk) on the server side. These are further exacerbated by multiple, concurrent requests issued by different clients. Hotspots may also arise when multiple users are interested in a particular geographical extent. We propose a novel methodology to support interactive visualizations over voluminous satellite imagery. Our system, codenamed Glance, generates models that once installed on the client side, substantially alleviate resource requirements on the server side. Our system dynamically generates imagery during zoom-in operations. Glance also supports image refinements using partial high-resolution information when available. Glance is based broadly on a deep Generative Adversarial Network, and our model is space-efficient to facilitate memory-residency at the clients. We supplement Glance with a module to estimate rendering errors when using the model to generate imagery as opposed to a resource-intensive query-and-retrieve operation to the server. Benchmarks to profile our methodology show substantive improvements in interactivity with up to 23x reduction in time lags without utilizing GPU and 297x-6627x reduction while harnessing GPU. Further, the perceptual quality of the images from our generative model is robust with PSNR values ranging from 32.2-40.5, depending on the scenario and upscale factor.
more »
« less
- Award ID(s):
- 1931363
- PAR ID:
- 10352215
- Date Published:
- Journal Name:
- IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 359 to 367
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Interactive visual analytics over distributed systems housing voluminous datasets is hindered by three main factors - disk and network I/O, and data processing overhead. Requests over geospatial data are prone to erratic query load and hotspots due to users’ simultaneous interest over a small sub-domain of the overall data space at a time. Interactive analytics in a distributed setting is further hindered in cases of voluminous datasets with large/high-dimensional data objects, such as multi-spectral satellite imagery. The size of the data objects prohibits efficient caching mechanisms that could significantly reduce response latencies. Additionally, extracting information from these large data objects incurs significant data processing overheads and they often entail resource-intensive computational methods. Here, we present our framework, ARGUS, that extracts low- dimensional representation (embeddings) of high-dimensional satellite images during ingestion and houses them in the cache for use in model-driven analysis relating to wildfire detection. These embeddings are versatile and are used to perform model- based extraction of analytical information for a set of dif- ferent scenarios, to reduce the high computational costs that are involved with typical transformations over high-dimensional datasets. The models for each such analytical process are trained in a distributed manner in a connected, multi-task learning fashion, along with the encoder network that generates the original embeddings.more » « less
-
Ranzato, M.; Beygelzimer, A.; Liang, P.S.; Vaughan, J.W.; Dauphin, Y. (Ed.)Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients’ devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data. Without further privacy mechanisms such as differential privacy, this leaves the system vulnerable against an attacker who inverts those gradients to reveal clients’ sensitive data. However, a gradient is often insufficient to reconstruct the user data without any prior knowledge. By exploiting a generative model pretrained on the data distribution, we demonstrate that data privacy can be easily breached. Further, when such prior knowledge is unavailable, we investigate the possibility of learning the prior from a sequence of gradients seen in the process of FL training. We experimentally show that the prior in a form of generative model is learnable from iterative interactions in FL. Our findings demonstrate that additional mechanisms are necessary to prevent privacy leakage in FL.more » « less
-
Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.more » « less
-
Privacy-preserving machine learning (PPML) enables multiple distrusting parties to jointly train ML models on their private data without revealing any information beyond the final trained models. In this work, we study the client-aided two-server setting where two non-colluding servers jointly train an ML model on the data held by a large number of clients. By involving the clients in the training process, we develop efficient protocols for training algorithms including linear regression, logistic regression, and neural networks. In particular, we introduce novel approaches to securely computing inner product, sign check, activation functions (e.g., ReLU, logistic function), and division on secret shared values, leveraging lightweight computation on the client side. We present constructions that are secure against semi-honest clients and further enhance them to achieve security against malicious clients. We believe these new client-aided techniques may be of independent interest. We implement our protocols and compare them with the two-server PPML protocols presented in SecureML (Mohassel and Zhang, S&P’17) across various settings and ABY2.0 (Patra et al., Usenix Security’21) theoretically. We demonstrate that with the assistance of untrusted clients in the training process, we can significantly improve both the communication and computational efficiency by orders of magnitude. Our protocols compare favorably in all the training algorithms on both LAN and WAN networks.more » « less
An official website of the United States government

