In deep learning, often the training process nds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem y = Ax+ noise with sparse x , neither minimum l_1 orl_`2 norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of l_1 and l_2 interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.
more »
« less
Model Misfit Minimization
Abstract In geophysical applications, solutions to ill‐posed inverse problems Ax=b are often obtained by analyzing the trade‐off between data residue ‖Ax−b‖2 and model norm ‖x‖2. In this study, we show that the traditional L‐curve analysis does not lead to solutions closest to the true models because the maximum curvature (or the corner of the L‐curve) depends on the relative scaling between data residue and model norm. A Bayes approach based on empirical risk function minimization using training datasets may be designed to find a statistically optimal solution, but its success depends on the true realization of the model. To overcome this limitation, we construct training models using eigenvectors of matrix ATA as well as spectral coefficients calculated from the correlation between observations and eigenvector projected data. This approach accounts for data noise level but does not require it as a priori knowledge. Using global tomography as an example, we show that the solutions are closest to true models.
more »
« less
- Award ID(s):
- 1737737
- PAR ID:
- 10158242
- Date Published:
- Journal Name:
- Bulletin of the Seismological Society of America
- Volume:
- 109
- Issue:
- 5
- ISSN:
- 0037-1106
- Page Range / eLocation ID:
- 1729 to 1737
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present a new class of preconditioned iterative methods for solving linear systems of the form Ax = b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of A, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any n × n linear system that is well-conditioned except for k outlying large singular values in Õ (n2.065 + kω) time, improving on a recent result of [Derezmski, Yang, STOC 2024] for all k ≳ n0.78. 2. We give the first Õ (n2 + dλω) time algorithm for solving a regularized linear system (A+λΙ)x = b, where A is positive semidefinite with effective dimension dλ = tr(A(A + λΙ)-1). This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten p-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in Õ (n2.11) time, improving on an Õ (n2.18) method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.more » « less
-
We present a new class of preconditioned iterative methods for solving linear systems of the form Ax=b. Our methods are based on constructing a low-rank Nyström approximation to A using sparse random matrix sketching. This approximation is used to construct a preconditioner, which itself is inverted quickly using additional levels of random sketching and preconditioning. We prove that the convergence of our methods depends on a natural average condition number of A, which improves as the rank of the Nyström approximation increases. Concretely, this allows us to obtain faster runtimes for a number of fundamental linear algebraic problems: 1. We show how to solve any n×n linear system that is well-conditioned except for k outlying large singular values in O~(n^2.065+k^ω) time, improving on a recent result of [Dereziński, Yang, STOC 2024] for all k≳n^0.78. 2. We give the first O~(n^2+d_λ^ω) time algorithm for solving a regularized linear system (A+λI)x=b, where A is positive semidefinite with effective dimension d_λ=tr(A(A+λI)^{−1}). This problem arises in applications like Gaussian process regression. 3. We give faster algorithms for approximating Schatten p-norms and other matrix norms. For example, for the Schatten 1-norm (nuclear norm), we give an algorithm that runs in O~(n ^{2.11}) time, improving on an O~(n ^{2.18}) method of [Musco et al., ITCS 2018]. All results are proven in the real RAM model of computation. Interestingly, previous state-of-the-art algorithms for most of the problems above relied on stochastic iterative methods, like stochastic coordinate and gradient descent. Our work takes a completely different approach, instead leveraging tools from matrix sketching.more » « less
-
Registering functions (curves) using time warpings (re-parameterizations) is central to many computer vision and shape analysis solutions. While traditional registration methods minimize penalized-L2 norm, the elastic Riemannian metric and square-root velocity functions (SRVFs) have resulted in significant improvements in terms of theory and practical performance. This solution uses the dynamic programming algorithm to minimize the L2 norm between SRVFs of given functions. However, the computational cost of this elastic dynamic programming framework – O(nT 2 k) – where T is the number of time samples along curves, n is the number of curves, and k < T is a parameter – limits its use in applications involving big data. This paper introduces a deep-learning approach, named SRVF Registration Net or SrvfRegNet to overcome these limitations. SrvfRegNet architecture trains by optimizing the elastic metric-based objective function on the training data and then applies this trained network to the test data to perform fast registration. In case the training and the test data are from different classes, it generalizes to the test data using transfer learning, i.e., retraining of only the last few layers of the network. It achieves the state-of-the-art alignment performance albeit at much reduced computational cost. We demonstrate the efficiency and efficacy of this framework using several standard curve datasets.more » « less
-
Abstract We establish several useful estimates for a non-conforming 2-norm posed on piecewise linear surface triangulations with boundary, with the main result being a Poincaré inequality.We also obtain equivalence of the non-conforming 2-norm posed on the true surface with the norm posed on a piecewise linear approximation.Moreover, we allow for free boundary conditions.The true surface is assumed to be C 2 , 1 C^{2,1} when free conditions are present; otherwise, C 2 C^{2} is sufficient.The framework uses tools from differential geometry and the closest point map (see [G. Dziuk, Finite elements for the Beltrami operator on arbitrary surfaces, Partial Differential Equations and Calculus of Variations , Lecture Notes in Math. 1357, Springer, Berlin (1988), 142–155]) for approximating the full surface Hessian operator.We also present a novel way of applying the closest point map when dealing with surfaces with boundary.Connections with surface finite element methods for fourth-order problems are also noted.more » « less
An official website of the United States government

