skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods
Autoencoders are a popular model in many branches of machine learning and lossy data com- pression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly under- stood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanish- ing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two- layer autoencoders trained in the challenging pro- portional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the pop- ulation risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise descrip- tion of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shal- low) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.  more » « less
Award ID(s):
1910056
PAR ID:
10490384
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research (PMLR)
Date Published:
Journal Name:
Proceedings of the 40th International Conference on Machine Learning
Format(s):
Medium: X
Location:
Hawaii
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss. 
    more » « less
  2. Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8× speedup for searching for a specific compression ratio and 7.8× speedup for determining the best compressor out of a collection. 
    more » « less
  3. In many real-world applications, e.g., monitoring of individual health, climate, brain activity, environmental exposures, among others, the data of interest change smoothly over a continuum, e.g., time, yielding multi-dimensional functional data. Solving clustering, classification, and regression problems with functional data calls for effective methods for learning compact representations of functional data. Existing methods for representation learning from functional data, e.g., functional principal component analysis, are generally limited to learning linear mappings from the data space to the representation space. However, in many applications, such linear methods do not suffice. Hence, we study the novel problem of learning non-linear representations of functional data. Specifically, we propose functional autoencoders, which generalize neural network autoencoders so as to learn non-linear representations of functional data. We derive from first principles, a functional gradient based algorithm for training functional autoencoders. We present results of experiments which demonstrate that the functional autoencoders outperform the state-of-the-art baseline methods. 
    more » « less
  4. The objective of this work is to develop error-bounded lossy compression methods to preserve topological features in 2D and 3D vector fields. Specifically, we explore the preservation of critical points in piecewise linear and bilinear vector fields. We define the preservation of critical points as, without any false positive, false negative, or false type in the decompressed data, (1) keeping each critical point in its original cell and (2) retaining the type of each critical point (e.g., saddle and attracting node). The key to our method is to adapt a vertex-wise error bound for each grid point and to compress input data together with the error bound field using a modified lossy compressor. Our compression algorithm can be also embarrassingly parallelized for large data handling and in situ processing. We benchmark our method by comparing it with existing lossy compressors in terms of false positive/negative/type rates, compression ratio, and various vector field visualizations with several scientific applications. 
    more » « less
  5. As the scale and complexity of high-performance computing (HPC) systems keep growing, data compression techniques are often adopted to reduce the data volume and processing time. While lossy compression becomes preferable to a lossless one because of the potential benefit of generating a high compression ratio, it would lose its worth the effort without finding an optimal balance between volume reduction and information loss. Among many lossy compression techniques, transform-based lossy algorithms utilize spatial redundancy better. However, the transform-based lossy compressor has received relatively less attention because there is a lack of understanding of its compression performance on scientific data sets. The insight of this paper is that, in transform-based lossy compressors, quantifying dominant coefficients at the block level reveals the right balance, potentially impacting overall compression ratios. Motivated by this, we characterize three transformation-based lossy compression mechanisms with different information compaction methods using the statistical features that capture data characteristics. And then, we build several prediction models using the statistical features and the characteristics of dominant coefficients and evaluate the effectiveness of each model using six HPC datasets from three production-level simulations at scale. Our results demonstrate that the random forest classifier captures the behavior of dominant coefficients precisely, achieving nearly 99% of prediction accuracy. 
    more » « less