skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 10, 2025

Title: Optimal Aggregation via Overlay Trees: Delay-MSE Tradeoffs under Failures
Many applications, e.g., federated learning, require the aggregation of information across a large number of distributed nodes. In this paper, we explore efficient schemes to do this at scale leveraging aggregation at intermediate nodes across overlay trees. Files/updates are split into chunks which are in turn simultaneously aggregated over different trees. For a synchronous setting with homogeneous communications capabilities and deterministic link delays, we develop a delay optimal aggregation schedule. In the asynchronous setting, where delays are stochastic but i.i.d., across links, we show that for an asynchronous implementation of the above schedule, the expected aggregation delay is near-optimal. We then consider the impact that failures in the network have on the resulting Mean Square Error (MSE) for the estimated aggregates and how it can be controlled through the addition of redundancy, reattempts, and optimal failure-aware estimates for the desired aggregate. Based on the analysis of a natural model of failures, we show how to choose parameters to optimize the trade-off between aggregation delay and MSE. We present simulation results exhibiting the above mentioned tradeoffs. We also consider a more general class of correlated failures and demonstrate via simulation the applicability of our techniques in those settings as well.  more » « less
Award ID(s):
2148224
PAR ID:
10570471
Author(s) / Creator(s):
;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Volume:
8
Issue:
3
ISSN:
2476-1249
Page Range / eLocation ID:
1 to 37
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Biological neural computation is inherently asynchronous due to large variations in neuronal spike timing and transmission delays. So-far, most theoretical work on neural networks assumes the synchronous setting where neurons fire simultaneously in discrete rounds. In this work we aim at understanding the barriers of asynchronous neural computation from an algorithmic perspective. We consider an extension of the widely studied model of synchronized spiking neurons [Maass, Neural= Networks 97] to the asynchronous setting by taking into account edge and node delays. Edge Delays: We define an asynchronous model for spiking neurons in which the latency values (i.e., transmission delays) of non self-loop edges vary adversarially over time. This extends the recent work of [Hitron and Parter, ESA’19] in which the latency values are restricted to be fixed over time. Our first contribution is an impossibility result that implies that the assumption that self-loop edges have no delays (as assumed in Hitron and Parter) is indeed necessary. Interestingly, in real biological networks self-loop edges (a.k.a. autapse) are indeed free of delays, and the latter has been noted by neuroscientists to be crucial for network synchronization. To capture the computational challenges in this setting, we first consider the implementation of a single NOT gate. This simple function already captures the fundamental difficulties in the asynchronous setting. Our key technical results are space and time upper and lower bounds for the NOT function, our time bounds are tight. In the spirit of the distributed synchronizers [Awerbuch and Peleg, FOCS’90] and following [Hitron and Parter, ESA’19], we then provide a general synchronizer machinery. Our construction is very modular and it is based on efficient circuit implementation of threshold gates. The complexity of our scheme is measured by the overhead in the number of neurons and the computation time, both are shown to be polynomial in the largest latency value, and the largest incoming degree ∆ of the original network. Node Delays: We introduce the study of asynchronous communication due to variations in the response rates of the neurons in the network. In real brain networks, the round duration varies between different neurons in the network. Our key result is a simulation methodology that allows one to transform the above mentioned synchronized solution under edge delays into a synchronized under node delays while incurring a small overhead w.r.t space and time. 
    more » « less
  2. Biological neural computation is inherently asynchronous due to large variations in neuronal spike timing and transmission delays. So-far, most theoretical work on neural networks assumes the synchronous setting where neurons fire simultaneously in discrete rounds. In this work we aim at understanding the barriers of asynchronous neural computation from an algorithmic perspective. We consider an extension of the widely studied model of synchronized spiking neurons [Maass, Neural Networks 97] to the asynchronous setting by taking into account edge and node delays. - Edge Delays: We define an asynchronous model for spiking neurons in which the latency values (i.e., transmission delays) of non self-loop edges vary adversarially over time. This extends the recent work of [Hitron and Parter, ESA'19] in which the latency values are restricted to be fixed over time. Our first contribution is an impossibility result that implies that the assumption that self-loop edges have no delays (as assumed in Hitron and Parter) is indeed necessary. Interestingly, in real biological networks self-loop edges (a.k.a. autapse) are indeed free of delays, and the latter has been noted by neuroscientists to be crucial for network synchronization. To capture the computational challenges in this setting, we first consider the implementation of a single NOT gate. This simple function already captures the fundamental difficulties in the asynchronous setting. Our key technical results are space and time upper and lower bounds for the NOT function, our time bounds are tight. In the spirit of the distributed synchronizers [Awerbuch and Peleg, FOCS'90] and following [Hitron and Parter, ESA'19], we then provide a general synchronizer machinery. Our construction is very modular and it is based on efficient circuit implementation of threshold gates. The complexity of our scheme is measured by the overhead in the number of neurons and the computation time, both are shown to be polynomial in the largest latency value, and the largest incoming degree Δ of the original network. - Node Delays: We introduce the study of asynchronous communication due to variations in the response rates of the neurons in the network. In real brain networks, the round duration varies between different neurons in the network. Our key result is a simulation methodology that allows one to transform the above mentioned synchronized solution under edge delays into a synchronized under node delays while incurring a small overhead w.r.t space and time. 
    more » « less
  3. Paolo Spagnolo (Ed.)
    This paper discusses asynchronous distributed inference in object tracking. Unlike many studies, which assume that the delay in communication between partial estimators and the central station is negligible, our study focuses on the problem of asynchronous distributed inference in the presence of delays. We introduce an efficient data fusion method for combining the distributed estimates, where delay in communications is not negligible. To overcome the delay, predictions are made for the state of the system based on the most current available information from partial estimators. Simulation results show the efficacy of the methods proposed. 
    more » « less
  4. This paper investigates the impact of mobility on underwater acoustic communication networks in which the propagation delay is comparable to or larger than the packet duration. An underwater acoustic wireless network, consisting of static and mobile nodes, is studied for its link-layer channel utilization. Synchronous and asynchronous media access control (MAC) protocols are employed with ALOHA, TDMA (time-division multiple access), and artificial intelligence (AI) agent nodes. The simulation results of a multi-node network show that the asynchronous MAC protocols achieve up to 6.66× higher channel utilization than synchronous protocols by allowing time slots to be shorter than the maximum propagation delay among nodes and permitting asynchronous transmission time. The high mobility of a few mobile nodes also favors asynchronous protocols and increases the overall channel utilization. However, node mobility causes more difficulties for the AI node to learn the environment, which may be ineffective to achieve higher gains in channel utilization. 
    more » « less
  5. Understanding the characteristics of air-traffic delays and disruptions is critical for developing ways to mitigate their significant economic and environmental impacts. Conventional delay-performance metrics reflect only the magnitude of incurred flight delays at airports; in this work, we show that it is also important to characterize the spatial distribution of delays across a network of airports. We analyze graph-supported signals, leveraging techniques from spectral theory and graph-signal processing to compute analytical and simulation-driven bounds for identifying outliers in spatial distribution. We then apply these methods to the case of airport-delay networks and demonstrate the applicability of our methods by analyzing U.S. airport delays from 2008 through 2017. We also perform an airline-specific analysis, deriving insights into the delay dynamics of individual airline subnetworks. Through our analysis, we highlight key differences in delay dynamics between different types of disruptions, ranging from nor’easters and hurricanes to airport outages. We also examine delay interactions between airline subnetworks and the system-wide network and compile an inventory of outlier days that could guide future aviation operations and research. In doing so, we demonstrate how our approach can provide operational insights in an air-transportation setting. Our analysis provides a complementary metric to conventional aviation-delay benchmarks and aids airlines, traffic-flow managers, and transportation-system planners in quantifying off-nominal system performance. 
    more » « less