skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Parallel Gumbel-Softmax VAE Framework with Performance-Based Tuning
Traditional training algorithms for Gumbel Softmax Variational Autoencoders (GS-VAEs) typically rely on an annealing scheme that gradually reduces the Softmax temperature τ according to a given function. This approach can lead to suboptimal results. To improve the performance, we propose a parallel framework for GS-VAEs, which embraces dual latent layers and multiple sub-models with diverse temperature strategies. Instead of relying on a fixed function for adjusting τ, our training algorithm uses loss difference as performance feedback to dynamically update each sub-model’s temperature τ, which is inspired by the need to balance exploration and exploitation in learning. By combining diversity in temperature strategies with the performance-based tuning method, our design helps prevent sub-models from becoming trapped in local optima and finds the GS-VAE model that best fits the given dataset. In experiments using four classic image datasets, our model significantly surpasses a standard GS-VAE that employs a temperature annealing scheme across multiple tasks, including data reconstruction, generalization capabilities, anomaly detection, and adversarial robustness. Our implementation is publicly available at https://github.com/wxzg7045/Gumbel-Softmax-VAE-2024/tree/main.  more » « less
Award ID(s):
2245853
PAR ID:
10581841
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IOS Press
Date Published:
ISBN:
978-1-64368-548-9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chaudhuri, Kamalika and (Ed.)
    While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process. Monte Carlo is a common solution used in most variance reduction approaches. However, this involves time-consuming resampling and multiple function evaluations. We propose a Gapped Straight-Through (GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. We determine these properties and show via an ablation study that they are essential. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks, MNIST-VAE and ListOps. 
    more » « less
  2. Generative models, such as Variational Autoencoders (VAEs), are increasingly employed for atypical pattern detection in brain imaging. During training, these models learn to capture the underlying patterns within “normal” brain images and generate new samples from those patterns. Neurodivergent states can be observed by measuring the dissimilarity between the generated/reconstructed images and the input images. This paper leverages VAEs to conduct Functional Connectivity (FC) analysis from functional Magnetic Resonance Imaging (fMRI) scans of individuals with Autism Spectrum Disorder (ASD), aiming to uncover atypical interconnectivity between brain regions. In the first part of our study, we compare multiple VAE architectures—Conditional VAE, Recurrent VAE, and a hybrid of CNN parallel with RNN VAE—aiming to establish the effectiveness of VAEs in application FC analysis. Given the nature of the disorder, ASD exhibits a higher prevalence among males than females. Therefore, in the second part of this paper, we investigate if introducing phenotypic data could improve the performance of VAEs and, consequently, FC analysis. We compare our results with the findings from previous studies in the literature. The results showed that CNN-based VAE architecture is more effective for this application than the other models. 
    more » « less
  3. The underwater acoustic (UWA) channel is a complex and stochastic process with large spatial and temporal dynamics. This work studies the adaptation of the communication strategy to the channel dynamics. Specifically, a set of communication strategies are considered, including frequency shift keying (FSK), single-carrier communication, and multicarrier communication. Based on the channel condition, a reinforcement learning (RL) algorithm, the Depth Determined Strategy Gradient (DDPG) method along with a Gumbel-softmax scheme is employed for intelligent and adaptive switching among those communication strategies. The adaptive switching is performed on a transmission block-by-block basis, with the goal of maximizing a long-term system performance. The reward function is defined based on the energy efficiency and the spectral efficiency of the communication strategies. Simulation results reveal that the proposed method outperforms a random selection method in time-varying channels. 
    more » « less
  4. null (Ed.)
    Disentangled generative models map a latent code vector to a target space, while enforcing that a subset of the learned latent codes are interpretable and associated with distinct properties of the target distribution. Recent advances have been dominated by Variational AutoEncoder (VAE)-based methods, while training disentangled generative adversarial networks (GANs) remains challenging. In this work, we show that the dominant challenges facing disentangled GANs can be mitigated through the use of self-supervision. We make two main contributions: first, we design a novel approach for training disentangled GANs with self-supervision. We propose contrastive regularizer, which is inspired by a natural notion of disentanglement: latent traversal. This achieves higher disentanglement scores than state-of-the-art VAE- and GAN-based approaches. Second, we propose an unsupervised model selection scheme called ModelCentrality, which uses generated synthetic samples to compute the medoid (multi-dimensional generalization of median) of a collection of models. The current common practice of hyper-parameter tuning requires using ground-truths samples, each labelled with known perfect disentangled latent codes. As real datasets are not equipped with such labels, we propose an unsupervised model selection scheme and show that it finds a model close to the best one, for both VAEs and GANs. Combining contrastive regularization with ModelCentrality, we improve upon the state-of-the-art disentanglement scores significantly, without accessing the supervised data. 
    more » « less
  5. null (Ed.)
    Within the context of event modeling and understanding, we propose a new method for neural sequence modeling that takes partially-observed sequences of discrete, external knowledge into account. We construct a sequential neural variational autoencoder, which uses Gumbel-Softmax reparametrization within a carefully defined encoder, to allow for successful backpropagation during training. The core idea is to allow semi-supervised external discrete knowledge to guide, but not restrict, the variational latent parameters during training. Our experiments indicate that our approach not only outperforms multiple baselines and the state-of-the-art in narrative script induction, but also converges more quickly. 
    more » « less