skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Between hard and soft thresholding: optimal iterative thresholding algorithms
Abstract Iterative thresholding algorithms seek to optimize a differentiable objective function over a sparsity or rank constraint by alternating between gradient steps that reduce the objective and thresholding steps that enforce the constraint. This work examines the choice of the thresholding operator and asks whether it is possible to achieve stronger guarantees than what is possible with hard thresholding. We develop the notion of relative concavity of a thresholding operator, a quantity that characterizes the worst-case convergence performance of any thresholding operator on the target optimization problem. Surprisingly, we find that commonly used thresholding operators, such as hard thresholding and soft thresholding, are suboptimal in terms of worst-case convergence guarantees. Instead, a general class of thresholding operators, lying between hard thresholding and soft thresholding, is shown to be optimal with the strongest possible convergence guarantee among all thresholding operators. Examples of this general class includes $$\ell _q$$ thresholding with appropriate choices of $$q$$ and a newly defined reciprocal thresholding operator. We also investigate the implications of the improved optimization guarantee in the statistical setting of sparse linear regression and show that this new class of thresholding operators attain the optimal rate for computationally efficient estimators, matching the Lasso.  more » « less
Award ID(s):
1654076
PAR ID:
10126886
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
ISSN:
2049-8764
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For obtaining optimal first-order convergence guarantees for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any further property (e.g., independence, exponential mixing, and reshuffling) beyond recurrence in data sampling to guarantee optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions with constraints, the expected optimality gap converges at an optimal rate $$O(n^{-1/2})$$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the ’speed of recurrence’, measured by the expected amount of time to visit a data point, either averaged (’target time’) or supremized (’hitting time’) over the starting locations. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization. 
    more » « less
  2. For obtaining optimal first-order convergence guarantees for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling algorithms (e.g., i.i.d., MCMC, random reshuffling) are indeed recurrent under mild assumptions. In this work, we show that for a particular class of stochastic optimization algorithms, we do not need any further property (e.g., independence, exponential mixing, and reshuffling) beyond recurrence in data sampling to guarantee optimal rate of first-order convergence. Namely, using regularized versions of Minimization by Incremental Surrogate Optimization (MISO), we show that for non-convex and possibly non-smooth objective functions with constraints, the expected optimality gap converges at an optimal rate $$O(n^{-2})$$ under general recurrent sampling schemes. Furthermore, the implied constant depends explicitly on the ’speed of recurrence’, measured by the expected amount of time to visit a data point, either averaged (’target time’) or supremized (’hitting time’) over the target locations. We discuss applications of our general framework to decentralized optimization and distributed non-negative matrix factorization. 
    more » « less
  3. Regularized sparse learning with the ℓ0-norm is important in many areas, including statistical learning and signal processing. Iterative hard thresholding (IHT) methods are the state-of-the-art for nonconvex-constrained sparse learning due to their capability of recovering true support and scalability with large datasets. The current theoretical analysis of IHT assumes the use of centralized IID data. In realistic large-scale scenarios, however, data are distributed, seldom IID, and private to edge computing devices at the local level. Consequently, it is required to study the property of IHT in a federated environment, where local devices update the sparse model individually and communicate with a central server for aggregation infrequently without sharing local data. In this paper, we propose the first group of federated IHT methods: Federated Hard Thresholding (Fed-HT) and Federated Iterative Hard Thresholding (FedIter-HT) with theoretical guarantees. We prove that both algorithms have a linear convergence rate and guarantee for recovering the optimal sparse estimator, which is comparable to classic IHT methods, but with decentralized, non-IID, and unbalanced data. Empirical results demonstrate that the Fed-HT and FedIter-HT outperform their competitor—a distributed IHT, in terms of reducing objective values with fewer communication rounds and bandwidth requirements. 
    more » « less
  4. We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convex-concave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the non-smooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods and match the optimal non-adaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings. 
    more » « less
  5. null (Ed.)
    The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter s. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. Moreover, we establish sharp optimal results for both estimation and support recovery. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is scaled minimax optimal (achieves the estimation error of the oracle that knows the sparsity pattern if possible), fast and adaptive. 
    more » « less