Sparsity of a learning solution is a desirable feature in machine learning. Certain reproducing kernel Banach spaces (RKBSs) are appropriate hypothesis spaces for sparse learning methods. The goal of this paper is to understand what kind of RKBSs can promote sparsity for learning solutions. We consider two typical learning models in an RKBS: the minimum norm interpolation (MNI) problem and the regularization problem. We first establish an explicit representer theorem for solutions of these problems, which represents the extreme points of the solution set by a linear combination of the extreme points of the subdifferential set, of the norm function, which is data-dependent. We then propose sufficient conditions on the RKBS that can transform the explicit representation of the solutions to a sparse kernel representation having fewer terms than the number of the observed data. Under the proposed sufficient conditions, we investigate the role of the regularization parameter on sparsity of the regularized solutions. We further show that two specific RKBSs, the sequence space l_1(N) and the measure space, can have sparse representer theorems for both MNI and regularization models.
more »
« less
This content will become publicly available on November 4, 2025
Flexible Krylov methods for group sparsity regularization
Abstract This paper introduces new solvers for efficiently computing solutions to large-scale inverse problems with group sparsity regularization, including both non-overlapping and overlapping groups. Group sparsity regularization refers to a type of structured sparsity regularization, where the goal is to impose additional structure in the regularization process by assigning variables to predefined groups that may represent graph or network structures. Special cases of group sparsity regularization includeℓ1and isotropic total variation regularization. In this work, we develop hybrid projection methods based on flexible Krylov subspaces, where we first recast the group sparsity regularization term as a sequence of 2-norm penalization terms using adaptive regularization matrices in an iterative reweighted norm fashion. Then we exploit flexible preconditioning techniques to efficiently incorporate the weight updates. The main advantages of these methods are that they are computationally efficient (leveraging the advantages of flexible methods), they are general (and therefore very easily adaptable to new regularization term choices), and they are able to select the regularization parameters automatically and adaptively (exploiting the advantages of hybrid methods). Extensions to multiple regularization terms and solution decomposition frameworks (e.g. for anomaly detection) are described, and a variety of numerical examples demonstrate both the efficiency and accuracy of the proposed approaches compared to existing solvers.
more »
« less
- Award ID(s):
- 2208294
- PAR ID:
- 10627398
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Physica Scripta
- Volume:
- 99
- Issue:
- 12
- ISSN:
- 0031-8949
- Page Range / eLocation ID:
- 125006
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the training of multiple threelayer ReLU sub-networks with weight decay regularization can be equivalently cast as a convex optimization problem in a higher dimensional space, where sparsity is enforced via a group `1- norm regularization. Consequently, ReLU networks can be interpreted as high dimensional feature selection methods. More importantly, we then prove that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity with respect to the number of samples and data dimension when the width of the network is fixed. Finally, we numerically validate our theoretical results via experiments involving both synthetic and real datasets.more » « less
-
Abstract We consider a regularization problem whose objective function consists of a convex fidelity term and a regularization term determined by the ℓ 1 norm composed with a linear transform. Empirical results show that the regularization with the ℓ 1 norm can promote sparsity of a regularized solution. The goal of this paper is to understand theoretically the effect of the regularization parameter on the sparsity of the regularized solutions. We establish a characterization of the sparsity under the transform matrix of the solution. When the objective function is block-separable or an error bound of the regularized solution to a known function is available, the resulting characterization can be taken as a regularization parameter choice strategy with which the regularization problem has a solution having a sparsity of a certain level. When the objective function is not block-separable, we propose an iterative algorithm which simultaneously determines the regularization parameter and its corresponding solution with a prescribed sparsity level. Moreover, we study choices of the regularization parameter so that the regularization term can alleviate the ill-posedness and promote sparsity of the resulting regularized solution. Numerical experiments demonstrate that the proposed algorithm is effective and efficient, and the choices of the regularization parameters can balance the sparsity of the regularized solution and its approximation to the minimizer of the fidelity function.more » « less
-
Since the cost of labeling data is getting higher and higher, we hope to make full use of the large amount of unlabeled data and improve image classification effect through adding some unlabeled samples for training. In addition, we expect to uniformly realize two tasks, namely the clustering of the unlabeled data and the recognition of the query image. We achieve the goal by designing a novel sparse model based on manifold assumption, which has been proved to work well in many tasks. Based on the assumption that images of the same class lie on a sub-manifold and an image can be approximately represented as the linear combination of its neighboring data due to the local linear property of manifold, we proposed a sparse representation model on manifold. Specifically, there are two regularizations, i.e., a variant Trace lasso norm and the manifold Laplacian regularization. The first regularization term enables the representation coefficients satisfying sparsity between groups and density within a group. And the second term is manifold Laplacian regularization by which label can be accurately propagated from labeled data to unlabeled data. Augmented Lagrange Multiplier (ALM) scheme and Gauss Seidel Alternating Direction Method of Multiplier (GS-ADMM) are given to solve the problem numerically. We conduct some experiments on three human face databases and compare the proposed work with several state-of-the-art methods. For each subject, some labeled face images are randomly chosen for training for those supervised methods, and a small amount of unlabeled images are added to form the training set of the proposed approach. All experiments show our method can get better classification results due to the addition of unlabeled samples.more » « less
-
Abstract We present a state-of-the-art calculation of the unpolarized pion valence-quark distribution in the framework of large-momentum effective theory (LaMET) with improved handling of systematic errors as well as two-loop perturbative matching. We use lattice ensembles generated by the MILC collaboration at lattice spacinga≈ 0.09 fm, lattice volume 643× 96,Nf= 2 + 1 + 1 flavors of highly-improved staggered quarks and a physical pion mass. The LaMET matrix elements are calculated with pions boosted to momentumPz≈ 1.72 GeV with high-statistics ofO(106) measurements. We study the pion PDF in both hybrid-ratio and hybrid-regularization-independent momentum subtraction (hybrid-RI/MOM) schemes and also compare the systematic errors with and without the addition of leading-renormalon resummation (LRR) and renormalization-group resummation (RGR) in both the renormalization and lightcone matching. The final lightcone PDF results are presented in the modified minimal-subtraction scheme at renormalization scaleμ= 2.0 GeV. We show that thex-dependent PDFs are compatible between the hybrid-ratio and hybrid-RI/MOM renormalization with the same improvements. We also show that systematics are greatly reduced by the simultaneous inclusion of RGR and LRR and that these methods are necessary if improved precision is to be reached with higher-order terms in renormalization and matching.more » « less
An official website of the United States government
