NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Plexus: Optimizing Join Approximation for Geo-Distributed Data Analytics

https://doi.org/10.1145/3620678.3624643

Wolfrath, Joel; Chandra, Abhishek (October 2023, IC2E)
AggFirstJoin: Optimizing Geo-Distributed Joins using Aggregation-Based Transformations

https://doi.org/10.1109/CCGrid57682.2023.00046

Kumar, Dhruv; Ahmad, Sohaib; Chandra, Abhishek; Sitaraman, Ramesh K. (May 2023, IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid))

Full Text Available
Efficient Transmission and Reconstruction of Dependent Data Streams via Edge Sampling

https://doi.org/10.1109/IC2E55432.2022.00013

Wolfrath, Joel; Chandra, Abhishek (September 2022, 2022 IEEE International Conference on Cloud Engineering (IC2E))

Full Text Available
Network Cost-Aware Geo-Distributed Data Analytics System

https://doi.org/10.1109/tpds.2021.3108893

Oh, Kwangsung; Zhang, Minmin; Chandra, Abhishek; Weissman, Jon (June 2022, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
HACCS: Heterogeneity-Aware Clustered Client Selection for Accelerated Federated Learning

https://doi.org/10.1109/IPDPS53621.2022.00100

Wolfrath, Joel; Sreekumar, Nikhil; Kumar, Dhruv; Wang, Yuanli; Chandra, Abhishek (May 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
Towards WAN-aware join sampling over geo-distributed data

https://doi.org/10.1145/3517206.3526268

Kumar, Dhruv; Wolfrath, Joel; Chandra, Abhishek; Sitaraman, Ramesh K. (April 2022, EdgeSys '22: Proceedings of the 5th International Workshop on Edge Systems, Analytics and Networking)

Full Text Available
AggNet: Cost-Aware Aggregation Networks for Geo-distributed Streaming Analytics

Kumar, Dhruv; Ahmad, Sohaib; Chandra, Abhishek; Sitaraman, Ramesh K. (January 2021, ACM/IEEE Symposium on Edge Computing (SEC'21))
null (Ed.)
Large-scale real-time analytics services continuously collect and analyze data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially aggregating the data closer to end-users. We propose aggregation networks for per- forming aggregation on a geo-distributed edge-cloud infrastructure consisting of edge servers, transit and destination DCs. We identify a rich set of research questions aimed at reducing the traffic costs in an aggregation network. We present an optimization formula- tion for solving these questions in a principled manner, and use insights from the optimization solutions to propose an efficient, near-optimal practical heuristic. We implement the heuristic in AggNet, built on top of Apache Flink. We evaluate our approach using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai shows that our approach is able to achieve 47% to 83% reduction in traffic cost over existing baselines without any compromise in timeliness.
more » « less
Full Text Available
On the Future of Cloud Engineering

https://doi.org/10.1109/IC2E52221.2021.00044

Bermbach, David; Chandra, Abhishek; Krintz, Chandra; Gokhale, Aniruddha; Slominski, Aleksander; Thamsen, Lauritz; Cavalcante, Everton; Guo, Tian; Brandic, Ivona; Wolski, Rich (October 2021, IEEE International Conference on Cloud Engineering)

Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape of cloud computing has been undergoing remarkable changes with the emergence of many different types of service offerings, developer productivity enhancement tools, and new application classes as well as the manifestation of cloud functionality closer to the user at the edge. The notion of utility computing, however, has remained constant throughout its evolution, which means that cloud users always seek to save costs of leasing cloud resources while maximizing their use. On the other hand, cloud providers try to maximize their profits while assuring service-level objectives of the cloud-hosted applications and keeping operational costs low. All these outcomes require systematic and sound cloud engineering principles. The aim of this paper is to highlight the importance of cloud engineering, survey the landscape of best practices in cloud engineering and its evolution, discuss many of the existing cloud engineering advances, and identify both the inherent technical challenges and research opportunities for the future of cloud computing in general and cloud engineering in particular.
more » « less
Full Text Available
DLion: Decentralized Distributed Deep Learning in Micro-Clouds

https://doi.org/10.1145/3431379.3460643

Hong, Rankyung; Chandra, Abhishek (June 2020, HPDC '21)
null (Ed.)
Deep learning (DL) is a popular technique for building models from large quantities of data such as pictures, videos, messages generated from edges devices at rapid pace all over the world. It is often infeasible to migrate large quantities of data from the edges to centralized data center(s) over WANs for training due to privacy, cost, and performance reasons. At the same time, training large DL models on edge devices is infeasible due to their limited resources. An attractive alternative for DL training distributed data is to use micro-clouds---small-scale clouds deployed near edge devices in multiple locations. However, micro-clouds present the challenges of both computation and network resource heterogeneity as well as dynamism. In this paper, we introduce DLion, a new and generic decentralized distributed DL system designed to address the key challenges in micro-cloud environments, in order to reduce overall training time and improve model accuracy. We present three key techniques in DLion: (1) Weighted dynamic batching to maximize data parallelism for dealing with heterogeneous and dynamic compute capacity, (2) Per-link prioritized gradient exchange to reduce communication overhead for model updates based on available network capacity, and (3) Direct knowledge transfer to improve model accuracy by merging the best performing model parameters. We build a prototype of DLion on top of TensorFlow and show that DLion achieves up to 4.2X speedup in an Amazon GPU cluster, and up to 2X speed up and 26% higher model accuracy in a CPU cluster over four state-of-the-art distributed DL systems.
more » « less
Full Text Available
WASP: Wide-area Adaptive Stream Processing

https://doi.org/10.1145/3423211.3425668

Jonathan, Albert; Chandra, Abhishek; Weissman, Jon (January 2020, Middleware'20)
null (Ed.)
Adaptability is critical for stream processing systems to ensure stable, low-latency, and high-throughput processing of long-running queries. Such adaptability is particularly challenging for wide-area stream processing due to the highly dynamic nature of the wide-area environment, which includes unpredictable workload patterns, variable network bandwidth, occurrence of stragglers, and failures. Unfortunately, existing adaptation techniques typically achieve these performance goals by compromising the quality/accuracy of the results, and they are often application-dependent. In this work, we rethink the adaptability property of wide-area stream processing systems and propose a resource-aware adaptation framework, called WASP. WASP adapts queries through a combination of multiple techniques: task re-assignment, operator scaling, and query re-planning, and applies them in a WAN-aware manner. It is able to automatically determine which adaptation action to take depending on the type of queries, dynamics, and optimization goals. We have implemented a WASP prototype on Apache Flink. Experimental evaluation with the YSB benchmark and a real Twitter trace shows that WASP can handle various dynamics without compromising the quality of the results.
more » « less
Full Text Available

« Prev Next »

Search for: All records