Generative models based on latent variables, such as generative adversarial networks (GANs) and variationalauto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields.However, many data such as natural images usually do not populate the ambient Euclidean space but insteadreside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncoverthe structure of the data, possibly resulting in mismatch of latent representations and poor generativequalities. Toward addressing these problems, we propose a novel framework called the latent WassersteinGAN (LWGAN) that fuses the Wasserstein auto-encoder and the Wasserstein GAN so that the intrinsicdimension of the data manifold can be adaptively learned by a modified informative latent distribution. Weprove that there exist an encoder network and a generator network in such a way that the intrinsic dimensionof the learned encoding distribution is equal to the dimension of the data manifold. We theoreticallyestablish that our estimated intrinsic dimension is a consistent estimate of the true dimension of the datamanifold. Meanwhile, we provide an upper bound on the generalization error of LWGAN, implying that weforce the synthetic data distribution to be similar to the real data distribution from a population perspective.Comprehensive empirical experiments verify our framework and show that LWGAN is able to identify thecorrect intrinsic dimension under several scenarios, and simultaneously generate high-quality syntheticdata by sampling from the learned latent distribution. Supplementary materials for this article are availableonline, including a standardized description of the materials available for reproducing the work. 
                        more » 
                        « less   
                    
                            
                            Differentiable VQ-VAE's for Robust White Matter Streamline Encodings
                        
                    
    
            Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of the encoder architecture that completely disregards the global geometric structure of streamlines at the expense of individual fibers. Moreover, the latent space may not be well structured which leads to doubt into their interpretability. In this paper we propose a novel Differentiable Vector Quantized Variational Autoencoder, which are engineered to ingest entire bundles of streamlines as single data-point and provides reliable trustworthy encodings that can then be later used to analyze streamlines in the latent space. Comparisons with several state of the art Autoencoders demonstrate superior performance in both encoding and synthesis. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2015577
- PAR ID:
- 10561803
- Publisher / Repository:
- IEEE International Symposium on Biomedical Imaging (ISBI)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Mathelier, Anthony (Ed.)Abstract Motivation Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here, we extend these methods for use with gene expression time series data. Results We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and from which we can generate new gene expression data by sampling the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury. Availability and implementation All datasets analyzed in this manuscript are publicly available and have been published previously. RVAgene is available in Python, at GitHub: https://github.com/maclean-lab/RVAgene; Zenodo archive: http://doi.org/10.5281/zenodo.4271097. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
- 
            Oh, Alice; Naumann, Tristan; Globerson, Amir; Saenko, Kate; Hardt, Moritz; Levine, Sergey (Ed.)Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing pretrained language models. We view diffusion and existing language models as complementary. We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders. We then demonstrate that continuous diffusion models can be learned in the latent space of the language autoencoder, enabling us to sample continuous latent representations that can be decoded into natural language with the pretrained decoder. We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation. We demonstrate across multiple diverse data sets that our latent language diffusion models are significantly more effective than previous diffusion language models. Our code is available at https://github.com/justinlovelace/latent-diffusion-for-language .more » « less
- 
            Autoencoders represent a significant category of deep learning models and are widely utilized for dimensionality reduction. However, standard Autoencoders are complicated architectures that normally have several layers and many hyper-parameters that require tuning. In this paper, we introduce a new type of autoencoder that we call dynamical system autoencoder (DSAE). Similar to classic autoencoders, DSAEs can effectively handle dimensionality reduction and denoising tasks, and they demonstrate strong performance in several benchmark tasks. However, DSAEs, in some sense, have a more flexible architecture than standard AEs. In particular, in this paper we study simple DSAEs that only have a single layer. In addition, DSAEs provide several theoretical and practical advantages arising from their implementation as iterative maps, which have been well studied over several decades. Beyond the inherent simplicity of DSAEs, we also demonstrate how to use sparse matrices to reduce the number of parameters for DSAEs without sacrificing the performance of our methods. Our simulation studies indicate that DSAEs achieved better performance than the classic autoencoders when the encoding dimension or training sample size was small. Additionally, we illustrate how to use DSAEs, and denoising autoencoders in general. to nerform sunervised learning tasks.more » « less
- 
            null (Ed.)Semi-supervised learning (SSL) is an appealing approach to resolve generalization problem for speech emotion recognition (SER) systems. By utilizing large amounts of unlabeled data, SSL is able to gain extra information about the prior distribution of the data. Typically, it can lead to better and robust recognition performance. Existing SSL approaches for SER include variations of encoder-decoder model structures such as autoencoder (AE) and variational autoencoders (VAEs), where it is difficult to interpret the learning mechanism behind the latent space. In this study, we introduce a new SSL framework, which we refer to as the DeepEmoCluster framework, for attribute-based SER tasks. The DeepEmoCluster framework is an end-to-end model with mel-spectrogram inputs, which combines a self-supervised pseudo labeling classification network with a supervised emotional attribute regressor. The approach encourages the model to learn latent representations by maximizing the emotional separation of K-means clusters. Our experimental results based on the MSP-Podcast corpus indicate that the DeepEmoCluster framework achieves competitive prediction performances in fully supervised scheme, outperforming baseline methods in most of the conditions. The approach can be further improved by incorporating extra unlabeled set. Moreover, our experimental results explicitly show that the latent clusters have emotional dependencies, enriching the geometric interpretation of the clusters.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    