skip to main content

Title: Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Distributed devices such as mobile phones can produce and store large amounts of data that can enhance machine learning models; however, this data may contain private information specific to the data owner that prevents the release of the data. We wish to reduce the correlation between user-specific private information and data while maintaining the useful information. Rather than learning a large model to achieve privatization from end to end, we introduce a decoupling of the creation of a latent representation and the privatization of data that allows user-specific privatization to occur in a distributed setting with limited computation and minimal disturbance on the utility of the data. We leverage a Variational Autoencoder (VAE) to create a compact latent representation of the data; however, the VAE remains fixed for all devices and all possible private labels. We then train a small generative filter to perturb the latent representation based on individual preferences regarding the private and utility information. The small filter is trained by utilizing a GAN-type robust optimization that can take place on a distributed device. We conduct experiments on three popular datasets: MNIST, UCI-Adult, and CelebA, and give a thorough evaluation including visualizing the geometry of the latent embeddings and estimating the empirical mutual information to show the effectiveness of our approach. 
    more » « less
  2. A key advance in learning generative models is the use of amortized inference distributions that are jointly trained with the models. We find that existing training objectives for variational autoencoders can lead to inaccurate amortized inference distributions and, in some cases, improving the objective provably degrades the inference quality. In addition, it has been observed that variational autoencoders tend to ignore the latent variables when combined with a decoding distribution that is too flexible. We again identify the cause in existing training criteria and propose a new class of objectives (Info-VAE) that mitigate these problems. We show that our model can significantly improve the quality of the variational posterior and can make effective use of the latent features regardless of the flexibility of the decoding distribution. Through extensive qualitative and quantitative analyses, we demonstrate that our models outperform competing approaches on multiple performance metrics 
    more » « less
  3. Latent variable models for text, when trained successfully, accurately model the data distribution and capture global semantic and syntactic features of sentences. The prominent approach to train such models is variational autoencoders (VAE). It is nevertheless challenging to train and often results in a trivial local optimum where the latent variable is ignored and its posterior collapses into the prior, an issue known as posterior collapse. Various techniques have been proposed to mitigate this issue. Most of them focus on improving the inference model to yield latent codes of higher quality. The present work proposes a short run dynamics for inference. It is initialized from the prior distribution of the latent variable and then runs a small number (e.g., 20) of Langevin dynamics steps guided by its posterior distribution. The major advantage of our method is that it does not require a separate inference model or assume simple geometry of the posterior distribution, thus rendering an automatic, natural and flexible inference engine. We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse. Analyses of the latent space show that interpolation in the latent space is able to generate coherent sentences with smooth transition and demonstrate improved classification over strong baselines with latent features from unsupervised pretraining. These results together expose a well-structured latent space of our generative model. 
    more » « less
  4. In this work, we present a new approach for latent system dynamics and remaining useful life (RUL) estimation of complex degrading systems using generative modeling and reinforcement learning. The main contributions of the proposed method are two-fold. First, we show how a deep generative model can approximate the functionality of high-fidelity simulators and, thus, is able to substitute expensive and complex physics-based models with data-driven surrogate ones. In other words, we can use the generative model in lieu of the actual system as a surrogate model of the system. Furthermore, we show how to use such surrogate models for predictive analytics. Our method follows two main steps. First, we use a deep variational autoencoder (VAE) to learn the distribution over the latent state-space that characterizes the dynamics of the system under monitoring. After model training, the probabilistic VAE decoder becomes the surrogate system model. Then, we develop a scalable reinforcement learning framework using the decoder as the environment, to train an agent for identifying adequate approximate values of the latent dynamics, as well as the RUL.To our knowledge, the method presented in this paper is the first in industrial prognostics that utilizes generative models and reinforcement learning in that capacity. While the process requires extensive data preprocessing and environment tailored design, which is not always possible, it demonstrates the ability of generative models working in conjunction with reinforcement learning to provide proper value estimations for system dynamics and their RUL. To validate the quality of the proposed method, we conducted numerical experiments using the train_FD002 dataset provided by the NASA CMAPSS data repository. Different subsets were used to train the VAE and the RL agent, and a leftover set was then used for model validation. The results shown prove the merit of our method and will further assist us in developing a data-driven RL environment that incorporates more complex latent dynamic layers, such as normal/faulty operating conditions and hazard processes. 
    more » « less
  5. Abstract In this manuscript we develop a deep learning algorithm to improve estimation of rates of progression and prediction of future patterns of visual field loss in glaucoma. A generalized variational auto-encoder (VAE) was trained to learn a low-dimensional representation of standard automated perimetry (SAP) visual fields using 29,161 fields from 3,832 patients. The VAE was trained on a 90% sample of the data, with randomization at the patient level. Using the remaining 10%, rates of progression and predictions were generated, with comparisons to SAP mean deviation (MD) rates and point-wise (PW) regression predictions, respectively. The longitudinal rate of change through the VAE latent space (e.g., with eight dimensions) detected a significantly higher proportion of progression than MD at two (25% vs. 9%) and four (35% vs 15%) years from baseline. Early on, VAE improved prediction over PW, with significantly smaller mean absolute error in predicting the 4 th , 6 th and 8 th visits from the first three (e.g., visit eight: VAE8: 5.14 dB vs. PW: 8.07 dB; P < 0.001). A deep VAE can be used for assessing both rates and trajectories of progression in glaucoma, with the additional benefit of being a generative technique capable of predicting future patterns of visual field damage. 
    more » « less