Sparse coding refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary. Sparse coding has proven to be a successful and interpretable approach in many applications, such as signal processing, computer vision, and medical imaging. While this success has spurred much work on sparse coding with provable guarantees, work on the setting where the learned dictionary is larger (or over-realized) with respect to the ground truth is comparatively nascent. Existing theoretical results in the over-realized regime are limited to the case of noise-less data. In this paper, we show that for over-realized sparse coding in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the ground-truth dictionary, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective and we prove that minimizing this new objective can recover the ground-truth dictionary. We corroborate our theoretical results with experiments across several parameter regimes, showing that our proposed objective enjoys better empirical performance than the standard reconstruction objective.
more »
« less
Hiding Data Helps: On the Benefits of Masking for Sparse Coding
Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the ground-truth dictionary, work on the setting where the learned dictionary is larger (or over-realized) with respect to the ground truth is comparatively nascent. Existing theoretical results in this setting have been constrained to the case of noise-less data. We show in this work that, in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the elements of the ground-truth dictionary in the over-realized regime, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes. We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective.
more »
« less
- Award ID(s):
- 2307106
- PAR ID:
- 10477449
- Publisher / Repository:
- Proceedings of Machine Learning Research
- Date Published:
- Journal Name:
- Proceedings of the 40th International Conference on Machine Learning
- Volume:
- 202
- Page Range / eLocation ID:
- 5600--5615
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)It has recently been shown that periodicity in discrete-time data can be analyzed using Ramanujan sums and associated dictionaries. This paper explores the role of dictionary learning methods in the context of period estimation and periodic signal representation using dictionaries. It is shown that a wellknown dictionary learning algorithm, namely K-SVD, is able to learn Ramanujan and Farey periodicity dictionaries from the noisy, sparse coefficient data generated from them without imposing any periodicity structure in the learning stage. This similarity between the learned dictionary and the underlying original periodicity dictionary reaffirms the power of the KSVD in predicting the right dictionary from data without explicit application-specific constraints. The paper also examines how the choice of different parameter values affect the similarity of the learned dictionary to the underlying dictionary. Two versions of K-SVD along with different initializations are analyzed for their effect on representation and denoising error for the data.more » « less
-
Abstract Dictionary learning, aiming at representing a signal in terms of the atoms of a dictionary, has gained popularity in a wide range of applications, including, but not limited to, image denoising, face recognition, remote sensing, medical imaging and feature extraction. Dictionary learning can be seen as a possible data-driven alternative to solve inverse problems by identifying the data with possible outputs that are either generated numerically using a forward model or the results of earlier observations of controlled experiments. Sparse dictionary learning is particularly interesting when the underlying signal is known to be representable in terms of a few vectors in a given basis. In this paper, we propose to use hierarchical Bayesian models for sparse dictionary learning that can capture features of the underlying signals, e.g. sparse representation and nonnegativity. The same framework can be employed to reduce the dimensionality of an annotated dictionary through feature extraction, thus reducing the computational complexity of the learning task. Computed examples where our algorithms are applied to hyperspectral imaging and classification of electrocardiogram data are also presented.more » « less
-
Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.more » « less
-
Unsupervised denoising is a crucial challenge in real-world imaging applications. Unsupervised deep-learning methods have demonstrated impressive performance on benchmarks based on synthetic noise. However, no metrics are available to evaluate these methods in an unsupervised fashion. This is highly problematic for the many practical applications where ground-truth clean images are not available. In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data. We provide a theoretical analysis of these metrics, showing that they are asymptotically consistent estimators of the supervised MSE and PSNR. Controlled numerical experiments with synthetic noise confirm that they provide accurate approximations in practice. We validate our approach on real-world data from two imaging modalities: videos in raw format and transmission electron microscopy. Our results demonstrate that the proposed metrics enable unsupervised evaluation of denoising methods based exclusively on noisy data.more » « less
An official website of the United States government

