We study strongly convex distributed optimization problems where a set of agents are interested in solving a separable optimization problem collaboratively. In this article, we propose and study a two-time-scale decentralized gradient descent algorithm for a broad class of lossy sharing of information over time-varying graphs. One time-scale fades out the (lossy) incoming information from neighboring agents, and one time-scale regulates the local loss functions' gradients. We show that assuming a proper choice of step-size sequences, certain connectivity conditions, and bounded gradients along the trajectory of the dynamics, the agents' estimates converge to the optimal solution with the rate of O(T^{−1/2}) . We also provide novel tools to study distributed optimization with diminishing averaging weights over time-varying graphs.
more »
« less
Decentralized Dictionary Learning Over Time-Varying Digraphs
This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs
more »
« less
- PAR ID:
- 10124391
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 20
- ISSN:
- 1532-4435
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.more » « less
-
null (Ed.)Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems,giving an approximate representation of complex data sets in terms of a reduced number ofextracted features. Convergence guarantees for most of the OMF algorithms in the litera-ture assume independence between data matrices, and the case of dependent data streamsremains largely unexplored. In this paper, we show that a non-convex generalization ofthe well-known OMF algorithm for i.i.d. stream of data in (Mairal et al., 2010) convergesalmost surely to the set of critical points of the expected loss function, even when the datamatrices are functions of some underlying Markov chain satisfying a mild mixing condition.This allows one to extract features more efficiently from dependent data streams, as thereis no need to subsample the data sequence to approximately satisfy the independence as-sumption. As the main application, by combining online non-negative matrix factorizationand a recent MCMC algorithm for sampling motifs from networks, we propose a novelframework ofNetwork Dictionary Learning, which extracts “network dictionary patches”from a given network in an online manner that encodes main features of the network. Wedemonstrate this technique and its application to network denoising problems on real-worldnetwork datamore » « less
-
null (Ed.)Recently decentralized optimization attracts much attention in machine learning because it is more communication-efficient than the centralized fashion. Quantization is a promising method to reduce the communication cost via cutting down the budget of each single communication using the gradient compression. To further improve the communication efficiency, more recently, some quantized decentralized algorithms have been studied. However, the quantized decentralized algorithm for nonconvex constrained machine learning problems is still limited. Frank-Wolfe (a.k.a., conditional gradient or projection-free) method is very efficient to solve many constrained optimization tasks, such as low-rank or sparsity-constrained models training. In this paper, to fill the gap of decentralized quantized constrained optimization, we propose a novel communication-efficient Decentralized Quantized Stochastic Frank-Wolfe (DQSFW) algorithm for non-convex constrained learning models. We first design a new counterexample to show that the vanilla decentralized quantized stochastic Frank-Wolfe algorithm usually diverges. Thus, we propose DQSFW algorithm with the gradient tracking technique to guarantee the method will converge to the stationary point of non-convex optimization safely. In our theoretical analysis, we prove that to achieve the stationary point our DQSFW algorithm achieves the same gradient complexity as the standard stochastic Frank-Wolfe and centralized Frank-Wolfe algorithms, but has much less communication cost. Experiments on matrix completion and model compression applications demonstrate the efficiency of our new algorithm.more » « less
-
Abstract Dictionary learning, aiming at representing a signal in terms of the atoms of a dictionary, has gained popularity in a wide range of applications, including, but not limited to, image denoising, face recognition, remote sensing, medical imaging and feature extraction. Dictionary learning can be seen as a possible data-driven alternative to solve inverse problems by identifying the data with possible outputs that are either generated numerically using a forward model or the results of earlier observations of controlled experiments. Sparse dictionary learning is particularly interesting when the underlying signal is known to be representable in terms of a few vectors in a given basis. In this paper, we propose to use hierarchical Bayesian models for sparse dictionary learning that can capture features of the underlying signals, e.g. sparse representation and nonnegativity. The same framework can be employed to reduce the dimensionality of an annotated dictionary through feature extraction, thus reducing the computational complexity of the learning task. Computed examples where our algorithms are applied to hyperspectral imaging and classification of electrocardiogram data are also presented.more » « less
An official website of the United States government

