skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The slope robustly determines convex functions
We show that the deviation between the slopes of two convex functions controls the deviation between the functions themselves. This result reveals that the slope—a one dimensional construct—robustly determines convex functions, up to a constant of integration.  more » « less
Award ID(s):
2023166
PAR ID:
10435356
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the American Mathematical Society
ISSN:
0002-9939
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of inferring the conditional independence graph (CIG) of high-dimensional Gaussian vectors from multi-attribute data. Most existing methods for graph estimation are based on single-attribute models where one associates a scalar random variable with each node. In multi-attribute graphical models, each node represents a random vector. In this paper we provide a unified theoretical analysis of multi-attribute graph learning using a penalized log-likelihood objective function. We consider both convex (sparse-group lasso) and sparse-group non-convex (log-sum and smoothly clipped absolute deviation (SCAD) penalties) penalty/regularization functions. An alternating direction method of multipliers (ADMM) approach coupled with local linear approximation to non-convex penalties is presented for optimization of the objective function. For non-convex penalties, theoretical analysis establishing local consistency in support recovery, local convexity and precision matrix estimation in high-dimensional settings is provided under two sets of sufficient conditions: with and without some irrepresentability conditions. We illustrate our approaches using both synthetic and real-data numerical examples. In the synthetic data examples the sparse-group log-sum penalized objective function significantly outperformed the lasso penalized as well as SCAD penalized objective functions with F1 -score and Hamming distance as performance metrics. 
    more » « less
  2. Switching between finitely many continuous-time autonomous steepest descent dynamics for convex functions is considered. Convergence of complete solutions to common minimizers of the convex functions, if such minimizers exist, is shown. The convex functions need not be smooth and may be subject to constraints. Since the common minimizers may represent consensus in a multi-agent system modeled by an undirected communication graph, several known results about asymptotic consensus are deduced as special cases. Extensions to time-varying convex functions and to dynamics given by set-valued mappings more general than subdifferentials of convex functions are included. 
    more » « less
  3. null (Ed.)
    Abstract We study a 1-dimensional discrete signal denoising problem that consists of minimizing a sum of separable convex fidelity terms and convex regularization terms, the latter penalize the differences of adjacent signal values. This problem generalizes the total variation regularization problem. We provide here a unified approach to solve the problem for general convex fidelity and regularization functions that is based on the Karush–Kuhn–Tucker optimality conditions. This approach is shown here to lead to a fast algorithm for the problem with general convex fidelity and regularization functions, and a faster algorithm if, in addition, the fidelity functions are differentiable and the regularization functions are strictly convex. Both algorithms achieve the best theoretical worst case complexity over existing algorithms for the classes of objective functions studied here. Also in practice, our C++ implementation of the method is considerably faster than popular C++ nonlinear optimization solvers for the problem. 
    more » « less
  4. Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks. 
    more » « less
  5. We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives and show that our convergence rates match that of vanilla SGD in the Byzantine-free setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model. 
    more » « less