Optimal Aggregation via Overlay Trees: Delay-MSE Tradeoffs under Failures

Hegde, Parikshit; de_Veciana, Gustavo

doi:10.1145/3700423

Citation Details

This content will become publicly available on December 10, 2025

Optimal Aggregation via Overlay Trees: Delay-MSE Tradeoffs under Failures

Many applications, e.g., federated learning, require the aggregation of information across a large number of distributed nodes. In this paper, we explore efficient schemes to do this at scale leveraging aggregation at intermediate nodes across overlay trees. Files/updates are split into chunks which are in turn simultaneously aggregated over different trees. For a synchronous setting with homogeneous communications capabilities and deterministic link delays, we develop a delay optimal aggregation schedule. In the asynchronous setting, where delays are stochastic but i.i.d., across links, we show that for an asynchronous implementation of the above schedule, the expected aggregation delay is near-optimal. We then consider the impact that failures in the network have on the resulting Mean Square Error (MSE) for the estimated aggregates and how it can be controlled through the addition of redundancy, reattempts, and optimal failure-aware estimates for the desired aggregate. Based on the analysis of a natural model of failures, we show how to choose parameters to optimize the trade-off between aggregation delay and MSE. We present simulation results exhibiting the above mentioned tradeoffs. We also consider a more general class of correlated failures and demonstrate via simulation the applicability of our techniques in those settings as well. more »

Award ID(s):: 2148224

PAR ID:: 10570471

Author(s) / Creator(s):: Hegde, Parikshit; de_Veciana, Gustavo

Publisher / Repository:: ACM

Date Published:: 2024-12-10

Journal Name:: Proceedings of the ACM on Measurement and Analysis of Computing Systems

Volume:: 8

Issue:: 3

ISSN:: 2476-1249

Page Range / eLocation ID:: 1 to 37

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on December 10, 2025
Journal Article:
https://doi.org/10.1145/3700423

More Like this