We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learn ing (FL) framework. We propose a distributed communicationefficient and local differentially private stochastic gradient descent (CLDPSGD) algorithm and analyze its communication, privacy, and convergence tradeoffs. Since each iteration of the CLDP SGD aggregates the clientside local gradients, we develop (optimal) communicationefficient schemes for mean estimation for several lp spaces under local differential privacy (LDP). To overcome performance limitation of LDP, CLDPSGD takes advantage of the inherent privacy amplification provided by client sub sampling and data subsampling at each se lected client (through SGD) as well as the recently developed shuffled model of privacy. For convex loss functions, we prove that the proposed CLDPSGD algorithm matches the known lower bounds on the centralized private ERM while using a finite number of bits per iteration for each client, i.e., effectively get ting communication efficiency for “free”. We also provide preliminary experimental results supporting the theory.
more »
« less
ParameterFree Locally Differentially Private Stochastic Subgradient Descent
We consider the problem of minimizing a convex risk with stochastic subgradients guaranteeing $\epsilon$locally differentially private ($\epsilon$LDP). While it has been shown that stochastic optimization is possible with $\epsilon$LDP via the standard SGD, its convergence rate largely depends on the learning rate, which must be tuned via repeated runs. Further, tuning is detrimental to privacy loss since it significantly increases the number of gradient requests. In this work, we propose BANCO (Betting Algorithm for Noisy COins), the first $\epsilon$LDP SGD algorithm that essentially matches the convergence rate of the tuned SGD without any learning rate parameter, reducing privacy loss and saving privacy budget.
more »
« less
 Award ID(s):
 1908111
 NSFPAR ID:
 10208411
 Date Published:
 Journal Name:
 Workshop on Privacy in Machine Learning at NeurIPS'19
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical analysis of convergence of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016) a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds. Moreover, we propose an alternative convergence analysis of SGD with diminishing learning rate regime, which is results in more relaxed conditions that those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of the Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.more » « less

The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Realworld matrix completion is often a hugescale optimization problem, with $d$ so large that even the simplest fulldimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slowdown when the underlying ground truth is illconditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for hugescale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our numerical experiments, we observe a similar acceleration for illconditioned matrix completion under the 1bit crossentropy loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss.more » « less

We propose a tuningfree dynamic SGD step size formula, which we call Distance over Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from the initial point and norms of gradients) and have no “learning rate” parameter. Theoretically, we show that, for stochastic convex optimization, a slight variation of the DoG formula enjoys strong, highprobability parameterfree convergence guarantees and iterate movement bounds. Empirically, we consider a broad range of vision and language transfer learning tasks, and show that DoG’s performance is close to that of SGD with tuned learning rate. We also propose a perlayer variant of DoG that generally outperforms tuned SGD, approaching the performance of tuned Adam. A PyTorch implementation of our algorithms is available at https://github.com/formll/dog.more » « less

Ruiz, Francisco and (Ed.)Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that noregret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an $\epsilon$JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the informationtheoretic lower bound of nonprivate learning for all choices of $\epsilon> S^{1.5}A^{0.5} H^2/\sqrt{T}$. In the above, $S$, $A$ denote the number of states and actions, $H$ denotes the planning horizon, and $T$ is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves privacy for free asymptotically as $T\rightarrow \infty$. Our techniques — which could be of independent interest — include privately releasing Bernsteintype exploration bonuses and an improved method for releasing visitation statistics. The same techniques also imply a slightly improved regret bound for the LDP case.more » « less