NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Transmission and Reconstruction of Dependent Data Streams via Edge Sampling

https://doi.org/10.1109/IC2E55432.2022.00013

Wolfrath, Joel; Chandra, Abhishek (September 2022, 2022 IEEE International Conference on Cloud Engineering (IC2E))

Full Text Available
HACCS: Heterogeneity-Aware Clustered Client Selection for Accelerated Federated Learning

https://doi.org/10.1109/IPDPS53621.2022.00100

Wolfrath, Joel; Sreekumar, Nikhil; Kumar, Dhruv; Wang, Yuanli; Chandra, Abhishek (May 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
Towards WAN-aware join sampling over geo-distributed data

https://doi.org/10.1145/3517206.3526268

Kumar, Dhruv; Wolfrath, Joel; Chandra, Abhishek; Sitaraman, Ramesh K. (April 2022, EdgeSys '22: Proceedings of the 5th International Workshop on Edge Systems, Analytics and Networking)

Full Text Available
AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics

https://doi.org/10.1145/3453142.3491276

Dhruv Kumar; Sohaib Ahmad; Abhishek Chandra; Ramesh K. Sitaraman (December 2021, IEEE/ACM Symposium on Edge Computing (SEC))

Large-scale real-time analytics services continuously collect and analyze data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially aggregating the data closer to end-users. We propose aggregation networks for performing aggregation on a geo-distributed edge-cloud infrastructure consisting of edge servers, transit and destination DCs. We identify a rich set of research questions aimed at reducing the traffic costs in an aggregation network. We present an optimization formulation for solving these questions in a principled manner, and use insights from the optimization solutions to propose an efficient, near-optimal practical heuristic. We implement the heuristic in AggNet, built on top of Apache Flink. We evaluate our approach using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai shows that our approach is able to achieve 47% to 83% reduction in traffic cost over existing baselines without any compromise in timeliness.
more » « less
Full Text Available
DLion: Decentralized Distributed Deep Learning in Micro-Clouds

https://doi.org/10.1145/3431379.3460643

Hong, Rankyung; Chandra, Abhishek (June 2020, HPDC '21)
null (Ed.)
Deep learning (DL) is a popular technique for building models from large quantities of data such as pictures, videos, messages generated from edges devices at rapid pace all over the world. It is often infeasible to migrate large quantities of data from the edges to centralized data center(s) over WANs for training due to privacy, cost, and performance reasons. At the same time, training large DL models on edge devices is infeasible due to their limited resources. An attractive alternative for DL training distributed data is to use micro-clouds---small-scale clouds deployed near edge devices in multiple locations. However, micro-clouds present the challenges of both computation and network resource heterogeneity as well as dynamism. In this paper, we introduce DLion, a new and generic decentralized distributed DL system designed to address the key challenges in micro-cloud environments, in order to reduce overall training time and improve model accuracy. We present three key techniques in DLion: (1) Weighted dynamic batching to maximize data parallelism for dealing with heterogeneous and dynamic compute capacity, (2) Per-link prioritized gradient exchange to reduce communication overhead for model updates based on available network capacity, and (3) Direct knowledge transfer to improve model accuracy by merging the best performing model parameters. We build a prototype of DLion on top of TensorFlow and show that DLion achieves up to 4.2X speedup in an Amazon GPU cluster, and up to 2X speed up and 26% higher model accuracy in a CPU cluster over four state-of-the-art distributed DL systems.
more » « less
Full Text Available
WASP: Wide-area Adaptive Stream Processing

https://doi.org/10.1145/3423211.3425668

Jonathan, Albert; Chandra, Abhishek; Weissman, Jon (January 2020, Middleware'20)
null (Ed.)
Adaptability is critical for stream processing systems to ensure stable, low-latency, and high-throughput processing of long-running queries. Such adaptability is particularly challenging for wide-area stream processing due to the highly dynamic nature of the wide-area environment, which includes unpredictable workload patterns, variable network bandwidth, occurrence of stragglers, and failures. Unfortunately, existing adaptation techniques typically achieve these performance goals by compromising the quality/accuracy of the results, and they are often application-dependent. In this work, we rethink the adaptability property of wide-area stream processing systems and propose a resource-aware adaptation framework, called WASP. WASP adapts queries through a combination of multiple techniques: task re-assignment, operator scaling, and query re-planning, and applies them in a WAN-aware manner. It is able to automatically determine which adaptation action to take depending on the type of queries, dynamics, and optimization goals. We have implemented a WASP prototype on Apache Flink. Experimental evaluation with the YSB benchmark and a real Twitter trace shows that WASP can handle various dynamics without compromising the quality of the results.
more » « less
Full Text Available
A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics

https://doi.org/10.1145/3341617.3326144

Kumar, Dhruv; Li, Jian; Chandra, Abhishek; Sitaraman, Ramesh (June 2019, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Full Text Available
Multi-Query Optimization in Wide-Area Streaming Analytics

https://doi.org/10.1145/3267809.3267842

Jonathan, Albert; Chandra, Abhishek; Weissman, Jon (October 2018, ACM Symposium on Cloud Computing)

Full Text Available

Search for: All records