Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective functionfthat is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functionsfwith only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds.
more »
« less
Max-affine regression with universal parameter estimation for small-ball designs
We study the max-affine regression model, where the unknown regression function is modeled as a maximum of a fixed number of affine functions. In recent work [1], we showed that end-to-end parameter estimates were obtainable using this model with an alternating minimization (AM) algorithm provided the covariates (or designs) were normally distributed, and chosen independently of the underlying parameters. In this paper, we show that AM is significantly more robust than the setting of [1]: It converges locally under small-ball design assumptions (which is a much broader class, including bounded log-concave distributions), and even when the underlying parameters are chosen with knowledge of the realized covariates. Once again, the final rate obtained by the procedure is near-parametric and minimax optimal (up to a polylogarithmic factor) as a function of the dimension, sample size, and noise variance. As a by-product of our analysis, we obtain convergence guarantees on a classical algorithm for the (real) phase retrieval problem in the presence of noise under considerably weaker assumptions on the design distribution than was previously known.
more »
« less
- Award ID(s):
- 1654589
- PAR ID:
- 10271627
- Date Published:
- Journal Name:
- IEEE International Symposium on Information Theory
- Page Range / eLocation ID:
- 2706 to 2710
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider optimal transport-based distributionally robust optimization (DRO) problems with locally strongly convex transport cost functions and affine decision rules. Under conventional convexity assumptions on the underlying loss function, we obtain structural results about the value function, the optimal policy, and the worst-case optimal transport adversarial model. These results expose a rich structure embedded in the DRO problem (e.g., strong convexity even if the non-DRO problem is not strongly convex, a suitable scaling of the Lagrangian for the DRO constraint, etc., which are crucial for the design of efficient algorithms). As a consequence of these results, one can develop efficient optimization procedures that have the same sample and iteration complexity as a natural non-DRO benchmark algorithm, such as stochastic gradient descent.more » « less
-
K. Ellis, W. Ferrell (Ed.)Fused deposition modeling (FDM) is one of the widely used additive manufacturing (AM) processes but shares major shortcomings typical due to its layer-by-layer fabrication. These challenges (poor surface finishes, presence of pores, inconsistent mechanical properties, etc.) have been attributed to FDM input process parameters, machine parameters, and material properties. Deep learning, a type of machine learning algorithm has proven to help reveal complex and nonlinear input-output relationships without the need for the underlying physics. This research explores the power of multilayer perceptron deep learning algorithm to create a prediction model for critical input process parameters (layer thickness, extrusion temperature, build temperature, build orientation, and print speed) to predict three functional output parameters (dimension accuracy, porosity, and tensile strength) of FDM printed part. A fractional factorial design of experiment was performed and replicated three times per run (n=3). The number of neurons for the hidden layers, learning rate, and epoch were varied. The computational run time, loss function, and root mean square error (RMSE) were used to select the best prediction model for each FDM output parameter. The findings of this work are being extended to online monitoring and real-time control of the AM process enabling an AM digital twin.more » « less
-
K. Ellis, W. Ferrell (Ed.)Fused deposition modeling (FDM) is one of the widely used additive manufacturing (AM) processes but shares major shortcomings typical due to its layer-by-layer fabrication. These challenges (poor surface finishes, presence of pores, inconsistent mechanical properties, etc.) have been attributed to FDM input process parameters, machine parameters, and material properties. Deep learning, a type of machine learning algorithm has proven to help reveal complex and nonlinear input-output relationships without the need for the underlying physics. This research explores the power of multilayer perceptron deep learning algorithm to create a prediction model for critical input process parameters (layer thickness, extrusion temperature, build temperature, build orientation, and print speed) to predict three functional output parameters (dimension accuracy, porosity, and tensile strength) of FDM printed part. A fractional factorial design of experiment was performed and replicated three times per run (n=3). The number of neurons for the hidden layers, learning rate, and epoch were varied. The computational run time, loss function, and root mean square error (RMSE) were used to select the best prediction model for each FDM output parameter. The findings of this work are being extended to online monitoring and real-time control of the AM process enabling an AM digital twin.more » « less
-
Convex regression is the problem of fitting a convex function to a data set consisting of input-output pairs. We present a new approach to this problem called spectrahedral regression, in which we fit a spectrahedral function to the data, i.e., a function that is the maximum eigenvalue of an affine matrix expression of the input. This method represents a significant generalization of polyhedral (also called max-affine) regression, in which a polyhedral function (a maximum of a fixed number of affine functions) is fit to the data. We prove bounds on how well spectrahedral functions can approximate arbitrary convex functions via statistical risk analysis. We also analyze an alternating minimization algorithm for the nonconvex optimization problem of fitting the best spectrahedral function to a given data set. We show that this algorithm converges geometrically with high probability to a small ball around the optimal parameter given a good initialization. Finally, we demonstrate the utility of our approach with experiments on synthetic data sets as well as real data arising in applications such as economics and engineering design.more » « less
An official website of the United States government

