Machine learning today involves massive distributed computations running on cloud servers, which are highly susceptible to slowdown or straggling. Recent work has demonstrated the effectiveness of erasure codes in mitigating such slowdown for linear computations, by adding redundant computations such that the entire computation can be recovered as long as a subset of nodes finish their assigned tasks. However, most machine learning algorithms typically involve non-linear computations that cannot be directly handled by these coded computing approaches. In this work, we propose a coded computing strategy for mitigating the effect of stragglers on non-linear distributed computations. Our strategy relies on the observation that many expensive non-linear functions can be decomposed into sums of cheap non-linear functions. We show that erasure codes, specifically rateless codes can be used to generate and compute random linear combinations of these functions at the nodes such that the original function can be computed as long as a subset of nodes return their computations. Simulations and experiments on AWS Lambda demonstrate the superiority of our approach over various uncoded baselines.
more »
« less
This content will become publicly available on December 20, 2026
Reversible computations are computations
Causality serves as an abstract notion of time for concurrent systems. A computation is causal, or simply valid, if each observation of a computation event is preceded by the observation of its causes. The present work establishes that this simple requirement is equally relevant when the occurrence of an event is invertible. We propose a conservative extension of causal models for concurrency that accommodates reversible computations. We first model reversible computations using a symmetric residuation operation in the general model of configuration structures. We show that stable configuration structures, which correspond to prime algebraic domains, remain stable under the action of this residuation. We then derive a semantics of reversible computations for prime event structures, which is shown to coincide with a switch operation that dualizes conflict and causality.
more »
« less
- Award ID(s):
- 2242786
- PAR ID:
- 10658095
- Publisher / Repository:
- Electronic Notes in Theoretical Informatics and Computer Science
- Date Published:
- Journal Name:
- Electronic Notes in Theoretical Informatics and Computer Science
- Volume:
- Volume 5 - Proceedings of...
- ISSN:
- 2969-2431
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Streaming computations on massive data sets are an attractive candidate for parallelization, particularly when they exhibit independence (and hence data parallelism) between items in the stream. However, some streaming computations are stateful, which disrupts independence and can limit parallelism. In this work, we consider how to extract data parallelism from streaming computations with a common, limited form of statefulness. The stream is assumed to be divided into variably-sized regions, and items in the same region are processed in a common context of state. In general, the computation to be performed on a stream is also irregular, with each item potentially undergoing different, data-dependent processing. This work describes mechanisms to implement such computations efficiently on a SIMD-parallel architecture such as a GPU. We first develop a low-level protocol by which a data stream can be augmented with control signals that are delivered to each stage of a computation at precise points in the stream. We then describe an abstraction, enumeration and aggregation, by which an application developer can specify the behavior of a streaming application with region-based state. Finally, we study an implementation of our ideas as part of the MERCATOR system for irregular streaming computations on GPUs, investigating how the frequency of region boundaries in a stream impacts SIMD occupancy and hence application performance.more » « less
-
null (Ed.)When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a two-phase execution strategy for numerical computations that are expressed relationally, as aggregated join trees (that is, expressed as a series of relational joins followed by an aggregation). In a pilot run, lineage information is collected; this lineage is used to optimally plan the computation at the level of individual records. Then, the computation is actually executed. We show experimentally that a relational system making use of this two-phase strategy can be an excellent platform for distributed ML computations.more » « less
-
Mezzina, Claudio Antares; Schmitt, Alan (Ed.)Concurrency and causality can be expressed within a labelled transition system by exploiting reversibility of transitions. It is natural to ask what behavioural equivalences can be captured by bisimulations in the reversible setting. In this paper we work with keyed configuration structures and CCSK, establish an operational correspondence between the two models, and give definitions of hereditary history-preserving bisimulation and history-preserving bisimulation in both models. We then present several characterisation results for the two bisimulations in terms of previously proposed, as well as new, “reverse” bisimulations.more » « less
-
We address the problem of slowdown caused by straggling nodes in distributed non-linear computations. Many common non-linear computations can be written as a sum of inexpensive non-linear functions (e.g. Taylor series). Based on this observation, we propose a new class of rateless codes called rateless sum-recovery codes whose aim is to recover the sum of source symbols, without necessarily recovering individual symbols. Source symbols correspond to individual inexpensive functions and each encoded symbol is the sum of a subset of source symbols. Encoded symbols are computed in a distributed fashion and for a computation that can be written as a sum of m inexpensive functions, successful sum-recovery is possible with high probability as long as slightly more than m encoded symbols are received. Our code is rateless, systematic and has sparse parities. Moreover, encoded symbols are constructed by sampling without replacement at individual nodes, thereby making decoding superfluous if the encoded symbols from any node cover all source symbols. We validate our claims through a range of simulations and also discuss open questions for future works.more » « less
An official website of the United States government
