Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

In this work, we revisit the generalization error of stochastic mirror descent for quadratically bounded losses studied in Telgarsky (2022). Quadratically bounded losses is a broad class of loss functions, capturing both Lipschitz and smooth functions, for both regression and classification problems. We study the high probability generalization for this class of losses on linear predictors in both realizable and nonrealizable cases when the data are sampled IID or from a Markov chain. The prior work relies on an intricate coupling argument between the iterates of the original problem and those projected onto a bounded domain. This approach enables blackbox application of concentration inequalities, but also leads to suboptimal guarantees due in part to the use of a union bound across all iterations. In this work, we depart significantly from the prior work of Telgarsky (2022), and introduce a novel approach for establishing high probability generalization guarantees. In contrast to the prior work, our work directly analyzes the moment generating function of a novel supermartingale sequence and leverages the structure of stochastic mirror descent. As a result, we obtain improved bounds in all aforementioned settings. Specifically, in the realizable case and nonrealizable case with lighttailed subGaussian data, we improve the bounds by a $\log T$ factor, matching the correct rates of $1/T$ and $1/\sqrt{T}$, respectively. In the more challenging case of heavytailed polynomial data, we improve the existing bound by a $\mathrm{poly}\ T$ factor.more » « lessFree, publiclyaccessible full text available December 10, 2024

In this work, we study the convergence in high probability of clipped gradient methods when the noise distribution has heavy tails, i.e., with bounded $p$th moments, for some $1< p \leq 2$. Prior works in this setting follow the same recipe of using concentration inequalities and an inductive argument with union bound to bound the iterates across all iterations. This method results in an increase in the failure probability by a factor of $T$, where $T$ is the number of iterations. We instead propose a new analysis approach based on bounding the moment generating function of a well chosen supermartingale sequence. We improve the dependency on $T$ in the convergence guarantee for a wide range of algorithms with clipped gradients, including stochastic (accelerated) mirror descent for convex objectives and stochastic gradient descent for nonconvex objectives. Our high probability bounds achieve the optimal convergence rates and match the best currently known inexpectation bounds. Our approach naturally allows the algorithms to use timevarying step sizes and clipping parameters when the time horizon is unknown, which appears difficult or even impossible using existing techniques from prior works. Furthermore, we show that in the case of clipped stochastic mirror descent, several problem constants, including the initial distance to the optimum, are not required when setting step sizes and clipping parameters.more » « lessFree, publiclyaccessible full text available December 10, 2024

We consider the problem of clustering in the learningaugmented setting. We are given a data set in $d$dimensional Euclidean space, and a label for each data point given by a predictor indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an $\alpha \in (0,c)$ for some $c<1$ fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost $\mathrm{OPT}$. For a dataset of size $m$, we propose a deterministic $k$means algorithm that produces centers with an improved bound on the clustering cost compared to the previous randomized stateoftheart algorithm while preserving the $O( d m \log m)$ runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e., our cost bound holds for $\alpha$ up to $1/2$, an improvement from $\alpha$ being at most $1/7$ in previous work. For the $k$medians problem we again improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter $\alpha$ to get a cost of $(1+O(\alpha))\mathrm{OPT}$, while requiring essentially just $O(md \log^3 m/\alpha)$ runtime.more » « less

In this paper, we study the finitesum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated ExtraGradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta}{\epsilon}}\right)$ and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta\log\beta}{\epsilon}}\right)$ gradient evaluations to attain an $\mathcal{O}(\epsilon)$suboptimal solution, where $n$ is the number of functions in the finite sum and $\beta$ is the smoothness parameter. This result matches the bestknown convergence rate of nonadaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on realworld datasets.more » « less

We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convexconcave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the nonsmooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods and match the optimal nonadaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings.more » « less

We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convexconcave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the nonsmooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods and match the optimal nonadaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings.more » « less

Given a data set of size n in d'dimensional Euclidean space, the kmeans problem asks for a set of k points (called centers) such that the sum of the l_2^2distances between the data points and the set of centers is minimized. Previous work on this problem in the local differential privacy setting shows how to achieve multiplicative approximation factors arbitrarily close to optimal, but suffers high additive error. The additive error has also been seen to be an issue in implementations of differentially private kmeans clustering algorithms in both the central and local settings. In this work, we introduce a new locally private kmeans clustering algorithm that achieves nearoptimal additive error whilst retaining constant multiplicative approximation factors and round complexity. Concretely, given any c>sqrt(2), our algorithm achieves O(k^(1 + O(1/(2c^21))) * sqrt(d' n) * log d' * poly log n) additive error with an O(c^2) multiplicative approximation factor.more » « less

We study the problem of maximizing a nonmonotone submodular function subject to a cardinality constraint in the streaming model. Our main contribution is a singlepass (semi)streaming algorithm that uses roughly $O(k / \varepsilon^2)$ memory, where $k$ is the size constraint. At the end of the stream, our algorithm postprocesses its data structure using any offline algorithm for submodular maximization, and obtains a solution whose approximation guarantee is $\frac{\alpha}{1+\alpha}\varepsilon$, where $\alpha$ is the approximation of the offline algorithm. If we use an exact (exponential time) postprocessing algorithm, this leads to $\frac{1}{2}\varepsilon$ approximation (which is nearly optimal). If we postprocess with the algorithm of \cite{buchbinder2019constrained}, that achieves the stateoftheart offline approximation guarantee of $\alpha=0.385$, we obtain $0.2779$approximation in polynomial time, improving over the previously best polynomialtime approximation of $0.1715$ due to \cite{feldman2018less}. It is also worth mentioning that our algorithm is combinatorial and deterministic, which is rare for an algorithm for nonmonotone submodular maximization, and enjoys a fast update time of $O(\frac{\log k + \log (1/\alpha {\varepsilon^2})$ per element.more » « less

In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For universe size of k and with n users, our epsLDP algorithm has communication cost ceil(log_2 k) and computation cost O(n + k\exp(eps) log k) for the server to approximately reconstruct the frequency histogram, while achieve optimal privacyutility tradeoff. In many practical settings this is a significant improvement over the O (n+k^2) computation cost that is achieved by the recent PIRAPPOR algorithm (Feldman and Talwar; 2021). Our empirical evaluation shows a speedup of over 50x over PIRAPPOR while using approximately 75x less memory. In addition, the running time of our algorithm is comparable to that of HadamardResponse (Acharya, Sun, and Zhang; 2019) and RecursiveHadamardResponse (Chen, Kairouz, and Ozgur; 2020) which have significantly worse reconstruction error. The error of our algorithm essentially matches that of the communication and timeinefficient but utilityoptimal SubsetSelection (SS) algorithm (Ye and Barg; 2017). Our new algorithm is based on using Projective Planes over a finite field to define a small collection of sets that are close to being pairwise independent and a dynamic programming algorithm for approximate histogram reconstruction for the server.more » « less