skip to main content


Title: Distributed Stochastic Multi-Task Learning with Graph Regularization
We propose methods for distributed graph-based multi-task learning that are based on weighted averaging of messages from other machines. Uniform averaging or diminishing stepsize in these methods would yield consensus (single task) learning. We show how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines.  more » « less
Award ID(s):
1718970
PAR ID:
10085930
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
arXiv.org
ISSN:
2331-8422
Page Range / eLocation ID:
arXiv:1802.03830
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stich, Sebastian U (Ed.)
    Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called ``tail-index”) in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules. 
    more » « less
  2. Despite the recent success of Graph Neural Networks (GNNs), training GNNs on large graphs remains challenging. The limited resource capacities of the existing servers, the dependency between nodes in a graph, and the privacy concern due to the centralized storage and model learning have spurred the need to design an effective distributed algorithm for GNN training. However, existing distributed GNN training methods impose either excessive communication costs or large memory overheads that hinders their scalability. To overcome these issues, we propose a communication-efficient distributed GNN training technique named (LLCG). To reduce the communication and memory overhead, each local machine in LLCG first trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging. However, ignoring node dependency could result in significant performance degradation. To solve the performance degradation, we propose to apply on the server to refine the locally learned models. We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error. However, this residual error can be eliminated by utilizing the proposed global corrections to entail fast convergence rate. Extensive experiments on real-world datasets show that LLCG can significantly improve the efficiency without hurting the performance. One-sentence Summary: We propose LLCG a communication efficient distributed algorithm for training GNNs. 
    more » « less
  3. Abstract Background

    Human-human (HH) interaction mediated by machines (e.g., robots or passive sensorized devices), which we call human-machine-human (HMH) interaction, has been studied with increasing interest in the last decade. The use of machines allows the implementation of different forms of audiovisual and/or physical interaction in dyadic tasks. HMH interaction between two partners can improve the dyad’s ability to accomplish a joint motor task (task performance) beyond either partner’s ability to perform the task solo. It can also be used to more efficiently train an individual to improve their solo task performance (individual motor learning). We review recent research on the impact of HMH interaction on task performance and individual motor learning in the context of motor control and rehabilitation, and we propose future research directions in this area.

    Methods

    A systematic search was performed on the Scopus, IEEE Xplore, and PubMed databases. The search query was designed to find studies that involve HMH interaction in motor control and rehabilitation settings. Studies that do not investigate the effect of changing the interaction conditions were filtered out. Thirty-one studies met our inclusion criteria and were used in the qualitative synthesis.

    Results

    Studies are analyzed based on their results related to the effects of interaction type (e.g., audiovisual communication and/or physical interaction), interaction mode (collaborative, cooperative, co-active, and competitive), and partner characteristics. Visuo-physical interaction generally results in better dyadic task performance than visual interaction alone. In cases where the physical interaction between humans is described by a spring, there are conflicting results as to the effect of the stiffness of the spring. In terms of partner characteristics, having a more skilled partner improves dyadic task performance more than having a less skilled partner. However, conflicting results were observed in terms of individual motor learning.

    Conclusions

    Although it is difficult to draw clear conclusions as to which interaction type, mode, or partner characteristic may lead to optimal task performance or individual motor learning, these results show the possibility for improved outcomes through HMH interaction. Future work that focuses on selecting the optimal personalized interaction conditions and exploring their impact on rehabilitation settings may facilitate the transition of HMH training protocols to clinical implementations.

     
    more » « less
  4. Function-as-a-Service or FaaS is a popular delivery model of serverless computing where developers upload code to be executed in the cloud as short running stateless functions. Using smaller functions to decompose processing of larger tasks or workflows introduces the question of how to instrument application control flow to orchestrate an overall task or workflow. In this paper, we examine implications of using different methods to orchestrate the control flow of a serverless data processing pipeline composed as a set of independent FaaS functions. We performed experiments on the AWS Lambda FaaS platform and compared how four different patterns of control flow impact the cost and performance of the pipeline. We investigate control flow using client orchestration, microservice controllers, event-based triggers, and state-machines. Overall, we found that asynchronous methods led to lower orchestration costs, and that event-based orchestration incurred a performance penalty. 
    more » « less
  5. We consider the distributed statistical learning problem in a high-dimensional adversarial scenario. At each iteration, $m$ worker machines compute stochastic gradients and send them to a master machine. However, an $\alpha$-fraction of $m$ worker machines, called Byzantine machines, may act adversarially and send faulty gradients. To guard against faulty information sharing, we develop a distributed robust learning algorithm based on Nesterov's dual averaging. This algorithms is provably robust against Byzantine machines whenever $\alpha\in[0, 1/2)$. For smooth convex functions, we show that running the proposed algorithm for $T$ iterations achieves a statistical error bound $\tilde{O}\big(1/\sqrt{mT}+\alpha/\sqrt{T}\big)$. This result holds for a large class of normed spaces and it matches the known statistical error bound for Byzantine stochastic gradient in the Euclidean space setting. A key feature of the algorithm is that the dimension dependence of the bound scales with the dual norm of the gradient; in particular, for probability simplex, we show that it depends logarithmically on the problem dimension $d$. Such a weak dependence on the dimension is desirable in high-dimensional statistical learning and it has been known to hold for the classical mirror descent but it appears to be new for the Byzantine gradient scenario. 
    more » « less