Autoencoders are a popular model in many branches of machine learning and lossy data com- pression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly under- stood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanish- ing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two- layer autoencoders trained in the challenging pro- portional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the pop- ulation risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise descrip- tion of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shal- low) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
more »
« less
Functional Autoencoders for Functional Data Representation Learning
In many real-world applications, e.g., monitoring of individual health, climate, brain activity, environmental exposures, among others, the data of interest change smoothly over a continuum, e.g., time, yielding multi-dimensional functional data. Solving clustering, classification, and regression problems with functional data calls for effective methods for learning compact representations of functional data. Existing methods for representation learning from functional data, e.g., functional principal component analysis, are generally limited to learning linear mappings from the data space to the representation space. However, in many applications, such linear methods do not suffice. Hence, we study the novel problem of learning non-linear representations of functional data. Specifically, we propose functional autoencoders, which generalize neural network autoencoders so as to learn non-linear representations of functional data. We derive from first principles, a functional gradient based algorithm for training functional autoencoders. We present results of experiments which demonstrate that the functional autoencoders outperform the state-of-the-art baseline methods.
more »
« less
- PAR ID:
- 10287274
- Date Published:
- Journal Name:
- Proceedings of the SIAM International Conference on Data Mining
- Page Range / eLocation ID:
- DOI:10.1137/1.9781611976700.75
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Network alignment is a fundamental task in many high-impact applications. Most of the existing approaches either explicitly or implicitly consider the alignment matrix as a linear transformation to map one network to another, and might overlook the complicated alignment relationship across networks. On the other hand, node representation learning based alignment methods are hampered by the incomparability among the node representations of different networks. In this paper, we propose a unified semi-supervised deep model (ORIGIN) that simultaneously finds the non-rigid network alignment and learns node representations in multiple networks in a mutually beneficial way. The key idea is to learn node representations by the effective graph convolutional networks, which subsequently enable us to formulate network alignment as a point set alignment problem. The proposed method offers two distinctive advantages. First (node representations), unlike the existing graph convolutional networks that aggregate the node information within a single network, we can effectively aggregate the auxiliary information from multiple sources, achieving far-reaching node representations. Second (network alignment), guided by the highquality node representations, our proposed non-rigid point set alignment approach overcomes the bottleneck of the linear transformation assumption. We conduct extensive experiments that demonstrate the proposed non-rigid alignment method is (1) effective, outperforming both the state-of-the-art linear transformation-based methods and node representation based methods, and (2) efficient, with a comparable computational time between the proposed multi-network representation learning component and its single-network counterpart.more » « less
-
Finding overcomplete latent representations of data has applications in data analysis, signal processing, machine learning, theoretical neuroscience and many other fields. In an overcomplete representation, the number of latent features exceeds the data dimensionality, which is useful when the data is undersampled by the measurements (compressed sensing or information bottlenecks in neural systems) or composed from multiple complete sets of linear features, each spanning the data space. Independent Components Analysis (ICA) is a linear technique for learning sparse latent representations, which typically has a lower computational cost than sparse coding, a linear generative model which requires an iterative, nonlinear inference step. While well suited for finding complete representations, we show that overcompleteness poses a challenge to existing ICA algorithms. Specifically, the coherence control used in existing ICA and other dictionary learning algorithms, necessary to prevent the formation of duplicate dictionary features, is ill-suited in the overcomplete case. We show that in the overcomplete case, several existing ICA algorithms have undesirable global minima that maximize coherence. We provide a theoretical explanation of these failures and, based on the theory, propose improved coherence control costs for overcomplete ICA algorithms. Further, by comparing ICA algorithms to the computationally more expensive sparse coding on synthetic data, we show that the limited applicability of overcomplete, linear inference can be extended with the proposed cost functions. Finally, when trained on natural images, we show that the coherence control biases the exploration of the data manifold, sometimes yielding suboptimal, coherent solutions. All told, this study contributes new insights into and methods for coherence control for linear ICA, some of which are applicable to many other nonlinear models.more » « less
-
null (Ed.)Unsupervised anomaly detection plays a crucial role in many critical applications. Driven by the success of deep learning, recent years have witnessed growing interests in applying deep neural networks (DNNs) to anomaly detection problems. A common approach is using autoencoders to learn a feature representation for the normal observations in the data. The reconstruction error of the autoencoder is then used as outlier scores to detect the anomalies. However, due to the high complexity brought upon by the over-parameterization of DNNs, the reconstruction error of the anomalies could also be small, which hampers the effectiveness of these methods. To alleviate this problem, we propose a robust framework using collaborative autoencoders to jointly identify normal observations from the data while learning its feature representation. We investigate the theoretical properties of the framework and empirically show its outstanding performance as compared to other DNN-based methods. Our experimental results also show the resiliency of the framework to missing values compared to other baseline methods.more » « less
-
null (Ed.)In this paper, we propose a supervised graph representation learning method to model the relationship between brain functional connectivity (FC) and structural connectivity (SC) through a graph encoder-decoder system. The graph convolutional network (GCN) model is leveraged in the encoder to learn lower-dimensional node representations (i.e. node embeddings) integrating information from both node attributes and network topology. In doing so, the encoder manages to capture both direct and indirect interactions between brain regions in the node embeddings which later help reconstruct empirical FC networks. From node embeddings, graph representations are learnt to embed the entire graphs into a vector space. Our end-to-end model utilizes a multi-objective loss function to simultaneously learn node representations for FC network reconstruction and graph representations for subject classification. The experiment on a large population of non-drinkers and heavy drinkers shows that our model can provide a characterization of the population pattern in the SC-FC relationship, while also learning features that capture individual uniqueness for subject classification. The identified key brain subnetworks show significant between-group difference and support the promising prospect of GCN-based graph representation learning on brain networks to model human brain activity and function.more » « less
An official website of the United States government

