NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nonasymptotic Concentration Rates in Cooperative Learning—Part II: Inference on Compact Hypothesis Sets

https://doi.org/10.1109/TCNS.2022.3140698

Uribe, Cesar A.; Olshevsky, Alexander; Nedic, Angelia (September 2022, IEEE Transactions on Control of Network Systems)

Full Text Available
Nonasymptotic Concentration Rates in Cooperative Learning–Part I: Variational Non-Bayesian Social Learning

https://doi.org/10.1109/TCNS.2022.3140683

Uribe, Cesar A.; Olshevsky, Alexander; Nedic, Angelia (September 2022, IEEE Transactions on Control of Network Systems)

Full Text Available
Asymptotic Network Independence and Step-Size for a Distributed Subgradient Method

Alex Olshevsky (January 2022, Journal of machine learning research)

We consider whether distributed subgradient methods can achieve a linear speedup over a centralized subgradient method. While it might be hoped that distributed network of n nodes that can compute n times more subgradients in parallel compared to a single node might, as a result, be n times faster, existing bounds for distributed optimization methods are often consistent with a slowdown rather than speedup compared to a single node. We show that a distributed subgradient method has this “linear speedup” property when using a class of square-summable-but-not-summable step-sizes which include 1/t^β when β ∈ (1/2,1); for such step-sizes, we show that after a transient period whose size depends on the spectral gap of the network, the method achieves a performance guarantee that does not depend on the network or the number of nodes. We also show that the same method can fail to have this “asymptotic network independence” property under the optimally decaying step-size 1/t^{1/2} and, as a consequence, can fail to provide a linear speedup compared to a single node with 1/t^{1/2} step-size.
more » « less
Full Text Available
A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent

https://doi.org/10.1109/TAC.2021.3126253

Pu, Shi; Olshevsky, Alexander; Paschalidis, Ioannis Ch. (November 2021, IEEE Transactions on Automatic Control)

Full Text Available
Temporal Difference Learning as Gradient Splitting

Liu, Rui; Olshevsky, Alex (November 2021, International Conference on Machine Learning (ICML))

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with 1/T^{1/2} step-size, where previous comparable finite-time convergence time bounds for temporal difference learning had the multiplicative factor 1/(1-\gamma) in front of the bound, with γ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where 1/(1-\gamma) only multiplies an asymptotically negligible term.
more » « less
Full Text Available
Asymptotic Convergence Rate of Alternating Minimization for Rank One Matrix Completion

https://doi.org/10.1109/LCSYS.2020.3016626

Liu, Rui; Olshevsky, Alex (October 2021, IEEE Control Systems Letters)
null (Ed.)
Full Text Available
Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

Ma, Yao; Olshevsky, Alex; Saligrama, Venkatesh; Czepesvari, Csoba (June 2021, Journal of machine learning research)

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.
more » « less
Full Text Available
Communication-efficient SGD: From Local SGD to One-Shot Averaging

Spiridonoff, Artin; Olshevsky, Alex; Paschalidis, Ioannis Ch. (January 2021, Advances in neural information processing systems)

Full Text Available
Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent

https://doi.org/10.1109/MSP.2020.2975212

Pu, Shi; Olshevsky, Alex; Paschalidis, Ioannis Ch. (May 2020, IEEE Signal Processing Magazine)

Full Text Available
Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions

Spiridonoff, Artin; Olshevsky, Alex; Paschalidis, Ioannis Ch. (April 2020, Journal of machine learning research)

Full Text Available

« Prev Next »

Search for: All records