We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the noise model is known or not. When the distribution of the noise is known, we introduce a novel architecture which we call Robust Conditional GAN (RCGAN). The main idea is to corrupt the label of the generated sample before feeding to the adversarial discriminator, forcing the generator to produce samples with clean labels. This approach of passing through a matching noisy channel is justified by corresponding multiplicative approximation bounds between the loss of the RCGAN and the distance between the clean real distribution and the generator distribution. This shows that the proposed approach is robust, when used with a carefully chosen discriminator architecture, known as projection discriminator. When the distribution of the noise is not known, we provide an extension of our architecture, which we call RCGANU, that learns the noise model simultaneously while training the generator. We show experimentally on MNIST and CIFAR10 datasets that both the approaches consistently improve uponmore »
Robustness of conditional GANs to noisy labels
We study the problem of learning conditional generators from noisy labeled samples, where the labels are corrupted by random noise. A standard training of
conditional GANs will not only produce samples with wrong labels, but also generate poor quality samples. We consider two scenarios, depending on whether the
noise model is known or not. When the distribution of the noise is known, we
introduce a novel architecture which we call Robust Conditional GAN (RCGAN).
The main idea is to corrupt the label of the generated sample before feeding to
the adversarial discriminator, forcing the generator to produce samples with clean
labels. This approach of passing through a matching noisy channel is justified
by accompanying multiplicative approximation bounds between the loss of the
RCGAN and the distance between the clean real distribution and the generator
distribution. This shows that the proposed approach is robust, when used with
a carefully chosen discriminator architecture, known as projection discriminator.
When the distribution of the noise is not known, we provide an extension of our
architecture, which we call RCGANU, that learns the noise model simultaneously
while training the generator. We show experimentally on MNIST and CIFAR10
datasets that both the approaches consistently improve upon baseline approaches,
and RCGANU closely matches the performance of RCGAN.
 Award ID(s):
 1929955
 Publication Date:
 NSFPAR ID:
 10105782
 Journal Name:
 Advances in neural information processing systems
 ISSN:
 10495258
 Sponsoring Org:
 National Science Foundation
More Like this


Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples. Despite remarkable recent improvements in generating realistic images, one of their major shortcomings is the fact that in practice, they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the main focus of several recent advances in GANs. Yet there is little understanding of why mode collapse happens and why recently proposed approaches are able to mitigate mode collapse. We propose a principled approach to handling mode collapse, which we call packing. The main idea is to modify the discriminator to make decisions based on multiple samples from the same class, either real or artificially generated. We borrow analysis tools from binary hypothesis testing—in particular the seminal result of Blackwell [6]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggests that packing provides significant improvements in practice as well.

Noisy labels are inevitable in large realworld datasets. In this work, we explore an area understudied by previous works  how the network's architecture impacts its robustness to noisy labels. We provide a formal framework connecting the robustness of a network to the alignments between its architecture and target/noise functions. Our framework measures a network's robustness via the predictive power in its representations  the test performance of a linear model trained on the learned representations using a small set of clean labels. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise. To support our hypothesis, we provide both theoretical and empirical evidence across various neural network architectures and different domains. We also find that when the network is wellaligned with the target function, its predictive power in representations could improve upon stateoftheart (SOTA) noisylabeltraining methods in terms of test accuracy and even outperform sophisticated methods that use clean labels.

Generative adversarial networks (GANs) are a technique for learning generative models of complex data distributions from samples. Despite remarkable advances in generating realistic images, a major shortcoming of GANs is the fact that they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the focus of much recent work. We study a principled approach to handling mode collapse, which we call packing. The main idea is to modify the discriminator to make decisions based on multiple samples from the same class, either real or artificially generated. We draw analysis tools from binary hypothesis testing—in particular the seminal result of Blackwell [4]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggest that packing provides significant improvements.

Daumé III, Hal ; Singh, Aarti (Ed.)Learning with noisy labels is a common challenge in supervised learning. Existing approaches often require practitioners to specify noise rates, i.e., a set of parameters controlling the severity of label noises in the problem, and the specifications are either assumed to be given or estimated using additional steps. In this work, we introduce a new family of loss functions that we name as peer loss functions, which enables learning from noisy labels and does not require a priori specification of the noise rates. Peer loss functions work within the standard empirical risk minimization (ERM) framework. We show that, under mild conditions, performing ERM with peer loss functions on the noisy data leads to the optimal or a nearoptimal classifier as if performing ERM over the clean training data, which we do not have access to. We pair our results with an extensive set of experiments. Peer loss provides a way to simplify model development when facing potentially noisy training labels, and can be promoted as a robust candidate loss function in such situations.