skip to main content


Title: Dense Crowd Counting Convolutional Neural Networks with Minimal Data using Semi-Supervised Dual-Goal Generative Adversarial Networks
In this work, we generalize semi-supervised generative adversarial networks (GANs) from classification problems to regression for use in dense crowd counting. In the last several years, the importance of improving the training of neural networks using semi-supervised training has been thoroughly demonstrated for classification problems. This work presents a dual-goal GAN which seeks both to provide the number of individuals in a densely crowded scene and distinguish between real and generated images. This method allows the dual-goal GAN to benefit from unlabeled data in the training process, improving the predictive capabilities of the discriminating network compared to the fully-supervised version of the network. Typical semi-supervised GANs are unable to function in the regression regime due to biases introduced when using a single prediction goal. Using the proposed approach, the amount of data which needs to be annotated for dense crowd counting can be significantly reduced.  more » « less
Award ID(s):
1827505 1737533
NSF-PAR ID:
10110611
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Conference on Computer Vision and Pattern Recognition: Learning with Imperfect Data Workshop
Page Range / eLocation ID:
21-28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we use a generative adversarial network (GAN) to train crowd counting networks using minimal data. We describe how GAN objectives can be modified to allow for the use of unlabeled data to benefit inference training in semi-supervised learning. More generally, we explain how these same methods can be used in more generic multiple regression target semi-supervised learning, with crowd counting being a demonstrative example. Given a convolutional neural network (CNN) with capabilities equivalent to the discriminator in the GAN, we provide experimental results which show that our GAN is able to outperform the CNN even when the CNN has access to significantly more labeled data. This presents the potential of training such networks to high accuracy with little data. Our primary goal is not to outperform the state-of-the-art using an improved method on the entire dataset, but instead we work to show that through semi-supervised learning we can reduce the data required to train an inference network to a given accuracy. To this end, systematic experiments are performed with various numbers of images and cameras to show under which situations the semi-supervised GANs can improve results. 
    more » « less
  2. null (Ed.)
    A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear timevarying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving overparameterized models. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board. 
    more » « less
  3. Abstract

    Generative adversarial networks (GANs) have been impactful on many problems and applications but suffer from unstable training. The Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. We introduce a novel inferential Wasserstein GAN (iWGAN) model, which is a principled framework to fuse autoencoders and WGANs. The iWGAN model jointly learns an encoder network and a generator network motivated by the iterative primal-dual optimization process. The encoder network maps the observed samples to the latent space and the generator network maps the samples from the latent space to the data space. We establish the generalization error bound of the iWGAN to theoretically justify its performance. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that the iWGAN greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of the iWGAN by obtaining competitive and stable performances for benchmark datasets.

     
    more » « less
  4. Pancreatic ductal adenocarcinoma (PDAC) presents a critical global health challenge, and early detection is crucial for improving the 5-year survival rate. Recent medical imaging and computational algorithm advances offer potential solutions for early diagnosis. Deep learning, particularly in the form of convolutional neural networks (CNNs), has demonstrated success in medical image analysis tasks, including classification and segmentation. However, the limited availability of clinical data for training purposes continues to represent a significant obstacle. Data augmentation, generative adversarial networks (GANs), and cross-validation are potential techniques to address this limitation and improve model performance, but effective solutions are still rare for 3D PDAC, where the contrast is especially poor, owing to the high heterogeneity in both tumor and background tissues. In this study, we developed a new GAN-based model, named 3DGAUnet, for generating realistic 3D CT images of PDAC tumors and pancreatic tissue, which can generate the inter-slice connection data that the existing 2D CT image synthesis models lack. The transition to 3D models allowed the preservation of contextual information from adjacent slices, improving efficiency and accuracy, especially for the poor-contrast challenging case of PDAC. PDAC’s challenging characteristics, such as an iso-attenuating or hypodense appearance and lack of well-defined margins, make tumor shape and texture learning challenging. To overcome these challenges and improve the performance of 3D GAN models, our innovation was to develop a 3D U-Net architecture for the generator, to improve shape and texture learning for PDAC tumors and pancreatic tissue. Thorough examination and validation across many datasets were conducted on the developed 3D GAN model, to ascertain the efficacy and applicability of the model in clinical contexts. Our approach offers a promising path for tackling the urgent requirement for creative and synergistic methods to combat PDAC. The development of this GAN-based model has the potential to alleviate data scarcity issues, elevate the quality of synthesized data, and thereby facilitate the progression of deep learning models, to enhance the accuracy and early detection of PDAC tumors, which could profoundly impact patient outcomes. Furthermore, the model has the potential to be adapted to other types of solid tumors, hence making significant contributions to the field of medical imaging in terms of image processing models.

     
    more » « less
  5. Generative Adversarial Networks (GANs) recently demonstrated a great opportunity toward unsupervised learning with the intention to mitigate the massive human efforts on data labeling in supervised learning algorithms. GAN combines a generative model and a discriminative model to oppose each other in an adversarial situation to refine their abilities. Existing nonvolatile memory based machine learning accelerators, however, could not support the computational needs required by GAN training. Specifically, the generator utilizes a new operator, called transposed convolution, which introduces significant resource underutilization when executed on conventional neural network accelerators as it inserts massive zeros in its input before a convolution operation. In this work, we propose a novel computational deformation technique that synergistically optimizes the forward and backward functions in transposed convolution to eliminate the large resource underutilization. In addition, we present dedicated control units - a dataflow mapper and an operation scheduler, to support the proposed execution model with high parallelism and low energy consumption. ZARA is implemented with commodity ReRAM chips, and experimental results show that our design can improve GAN’s training performance by averagely 1.6x~23x over CMOS-based GAN accelerators. Compared to state-of-the-art ReRAM-based accelerator designs, ZARA also provides 1.15x~2.1x performance improvement. 
    more » « less