skip to main content

Title: ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning
ASYNC is a framework that supports the implementation of asynchrony and history for optimization methods on distributed computing platforms. The popularity of asynchronous optimization methods has increased in distributed machine learning. However, their applicability and practical experimentation on distributed systems are limited because current bulk-processing cloud engines do not provide a robust support for asynchrony and history. With introducing three main modules and bookkeeping system-specific and application parameters, ASYNC provides practitioners with a framework to implement asynchronous machine learning methods. To demonstrate ease-of-implementation in ASYNC, the synchronous and asynchronous variants of two well-known optimization methods, stochastic gradient descent and SAGA, are demonstrated in ASYNC.  more » « less
Award ID(s):
1814888 1723085
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Page Range / eLocation ID:
429 to 439
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This papers studies multi-agent (convex and nonconvex) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from their neighbors. Delays need not be known to the agents or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the sum of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are employed. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchonous setting. 
    more » « less
  2. Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker’s task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks. 
    more » « less
  3. In general, the performance of parallel graph processing is determined by three pairs of critical parameters, namely synchronous or asynchronous execution mode (Sync or Async), Push or Pull communication mechanism (Push or Pull), and Data-driven or Topology-driven traversing scheme (DD or TD), which increases the complexity and sophistication of programming and system implementation of GPU. Existing graph-processing frameworks mainly use a single combination in the entire execution for a given application, but we have observed their variable and suboptimal performance. In this paper, we present SEP-Graph, a highly efficient software framework for graph-processing on GPU. The hybrid execution mode is automatically switched among three pairs of parameters, with an objective to achieve the shortest execution time in each iteration. We also apply a set of optimizations to SEP-Graph, considering the characteristics of graph algorithms and underlying GPU architectures. We show the effectiveness of SEP-Graph based on our intensive and comparative performance evaluation on NVIDIA 1080, P100, and V100 GPUs. Compared with existing and representative GPU graph-processing framework Groute and Gunrock, SEP-Graph can reduce execution time up to 45.8 times and 39.4 times. 
    more » « less
  4. Abstract Aim

    Climate variability threatens to destabilize production in many ecosystems. Asynchronous species dynamics may buffer against such variability when a decrease in performance by some species is offset by an increase in performance of others. However, high climatic variability can eliminate species through stochastic extinctions or cause similar stress responses among species that reduce buffering. Local conditions, such as soil nutrients, can also alter production stability directly or by influencing asynchrony. We test these hypotheses using a globally distributed sampling experiment.


    Grasslands in North America, Europe and Australia.

    Time period

    Annual surveys over 5 year intervals occurring between 2007 and 2014.

    Major taxa studied

    Herbaceous plants.


    We sampled annually the per species cover and aboveground community biomass [net primary productivity (NPP)], plus soil chemical properties, in 29 grasslands. We tested how soil conditions, combined with variability in precipitation and temperature, affect species richness, asynchrony and temporal stability of primary productivity. We used bivariate relationships and structural equation modelling to examine proximate and ultimate relationships.


    Climate variability strongly predicted asynchrony, whereas NPP stability was more related to soil conditions. Species richness was structured by both climate variability and soils and, in turn, increased asynchrony. Variability in temperature and precipitation caused a unimodal asynchrony response, with asynchrony being lowest at low and high climate variability. Climate impacted stability indirectly, through its effect on asynchrony, with stability increasing at higher asynchrony owing to lower inter‐annual variability in NPP. Soil conditions had no detectable effect on asynchrony but increased stability by increasing the mean NPP, especially when soil organic matter was high.

    Main conclusions

    We found globally consistent evidence that climate modulates species asynchrony but that the direct effect on stability is low relative to local soil conditions. Nonetheless, our observed unimodal responses to variability in temperature and precipitation suggest asynchrony thresholds, beyond which there are detectable destabilizing impacts of climate on primary productivity.

    more » « less
  5. null (Ed.)
    Regularization by denoising (RED) is a recently developed framework for solving inverse problems by integrating advanced denoisers as image priors. Recent work has shown its state-of-the-art performance when combined with pre-trained deep denoisers. However, current RED algorithms are inadequate for parallel processing on multicore systems. We address this issue by proposing a new asynchronous RED (ASYNC-RED) algorithm that enables asynchronous parallel processing of data, making it significantly faster than its serial counterparts for large-scale inverse problems. The computational complexity of ASYNC-RED is further reduced by using a random subset of measurements at every iteration. We present complete theoretical analysis of the algorithm by establishing its convergence under explicit assumptions on the data-fidelity and the denoiser. We validate ASYNC-RED on image recovery using pre-trained deep denoisers as priors. 
    more » « less