skip to main content


Title: Evaluating generative networks using Gaussian mixtures of image features
We develop a measure for evaluating the performance of generative networks given two sets of images. A popular performance measure currently used to do this is the Fréchet Inception Distance (FID). FID assumes that images featurized using the penultimate layer of Inception-v3 follow a Gaussian distribution, an assumption which cannot be violated if we wish to use FID as a metric. However, we show that Inception-v3 features of the ImageNet dataset are not Gaussian; in particular, every single marginal is not Gaussian. To remedy this problem, we model the featurized images using Gaussian mixture models (GMMs) and compute the 2-Wasserstein distance restricted to GMMs. We define a performance measure, which we call WaM, on two sets of images by using Inception-v3 (or another classifier) to featurize the images, estimate two GMMs, and use the restricted 2-Wasserstein distance to compare the GMMs. We experimentally show the advantages of WaM over FID, including how FID is more sensitive than WaM to imperceptible image perturbations. By modelling the non-Gaussian features obtained from Inception-v3 as GMMs and using a GMM metric, we can more accurately evaluate generative network performance.  more » « less
Award ID(s):
1911094 1838177 1730574
NSF-PAR ID:
10466262
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Page Range / eLocation ID:
279 to 288
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batch-normalized mini-batch. However, given the rectifying non-linearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respect to the statistics of disentangled sub-populations. We refer to our proposed soft piecewise version of batch normalization as Mixture Normalization (MN). Through extensive set of experiments on CIFAR-10 and CIFAR-100, using both a 5-layers deep CNN and modern Inception-V3 architecture, we show that mixture normalization reduces required number of gradient updates to reach the maximum test accuracy of the batch normalized model by ∼31%-47% across a variety of training scenarios. Replacing even a few BN modules with MN in the 48-layers deep Inception-V3 architecture is sufficient to not only obtain considerable training acceleration but also better final test accuracy. We show that similar observations are valid for 40 and 100-layers deep DenseNet architectures as well. We complement our study by evaluating the application of mixture normalization to the Generative Adversarial Networks (GANs), where "mode collapse" hinders the training process. We solely replace a few batch normalization layers in the generator with our proposed mixture normalization. Our experiments using Deep Convolutional GAN (DCGAN) on CIFAR-10 show that mixture normalized DCGAN not only provides an acceleration of ∼58% but also reaches lower (better) "Fréchet Inception Distance" (FID) of 33.35 compared to 37.56 of its batch normalized counterpart. 
    more » « less
  2. Abstract

    One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integrated result based on the notion of Wasserstein barycenter. However, the precise barycenter of GMMs, a distribution on the same sample space, is computationally infeasible to solve. Importantly, the barycenter of GMMs may not be a GMM containing a reasonable number of components. We thus propose to use the minimized aggregated Wasserstein (MAW) distance to approximate the Wasserstein metric and develop a new algorithm for computing the barycenter of GMMs under MAW. Recent theoretical advances further justify using the MAW distance as an approximation for the Wasserstein metric between GMMs. We also prove that the MAW barycenter of GMMs has the same expectation as the Wasserstein barycenter. Our proposed algorithm for clustering integration scales well with the data dimension and the number of mixture components, with complexity independent of data size. We demonstrate that the new method achieves better clustering results on several single-cell RNA-seq data sets than some other popular methods.

     
    more » « less
  3. In this paper, we study the 3D volumetric modeling problem by adopting the Wasserstein introspective neural networks method (WINN) that was previously applied to 2D static im ages. We name our algorithm 3DWINN which enjoys the same properties as WINN in the 2D case: being simultaneously generative and discriminative. Compared to the existing 3D volumetric modeling approaches, 3DWINN demonstrates competitive results on several benchmarks in both the generation and the classification tasks. In addition to the standard inception score, the Fréchet Inception Distance (FID) metric is also adopted to measure the quality of 3D volumetric generations. In addition, we study adversarial attacks for volumetric data and demonstrate the robustness of 3DWINN against ad- versarial examples while achieving appealing results in both classification and generation within a single model. 3DWINN is a general framework and it can be applied to the emerging tasks for 3D object and scene modeling 
    more » « less
  4. Abstract

    Remanufacturing sites often receive products with different brands, models, conditions, and quality levels. Proper sorting and classification of the waste stream is a primary step in efficiently recovering and handling used products. The correct classification is particularly crucial in future electronic waste (e-waste) management sites equipped with Artificial Intelligence (AI) and robotic technologies. Robots should be enabled with proper algorithms to recognize and classify products with different features and prepare them for assembly and disassembly tasks. In this study, two categories of Machine Learning (ML) and Deep Learning (DL) techniques are used to classify consumer electronics. ML models include Naïve Bayes with Bernoulli, Gaussian, Multinomial distributions, and Support Vector Machine (SVM) algorithms with four kernels of Linear, Radial Basis Function (RBF), Polynomial, and Sigmoid. While DL models include VGG-16, GoogLeNet, Inception-v3, Inception-v4, and ResNet-50. The above-mentioned models are used to classify three laptop brands, including Apple, HP, and ThinkPad. First the Edge Histogram Descriptor (EHD) and Scale Invariant Feature Transform (SIFT) are used to extract features as inputs to ML models for classification. DL models use laptop images without pre-processing on feature extraction. The trained models are slightly overfitting due to the limited dataset and complexity of model parameters. Despite slight overfitting, the models can identify each brand. The findings prove that DL models outperform them of ML. Among DL models, GoogLeNet has the highest performance in identifying the laptop brands.

     
    more » « less
  5. null (Ed.)
    Remanufacturing sites often receive products with different brands, models, conditions, and quality levels. Proper sorting and classification of the waste stream is a primary step in efficiently recovering and handling used products. The correct classification is particularly crucial in future electronic waste (e-waste) management sites equipped with Artificial Intelligence (AI) and robotic technologies. Robots should be enabled with proper algorithms to recognize and classify products with different features and prepare them for assembly and disassembly tasks. In this study, two categories of Machine Learning (ML) and Deep Learning (DL) techniques are used to classify consumer electronics. ML models include Naïve Bayes with Bernoulli, Gaussian, Multinomial distributions, and Support Vector Machine (SVM) algorithms with four kernels of Linear, Radial Basis Function (RBF), Polynomial, and Sigmoid. While DL models include VGG16, GoogLeNet, Inception-v3, Inception-v4, and ResNet-50. The above-mentioned models are used to classify three laptop brands, including Apple, HP, and ThinkPad. First, the Edge Histogram Descriptor (EHD) and Scale Invariant Feature Transform (SIFT) are used to extract features as inputs to ML models for classification. DL models use laptop images without pre-processing on feature extraction. The trained models are slightly overfitting due to the limited dataset and complexity of model parameters. Despite slight overfitting, the models can identify each brand. The findings prove that DL models outperform ML. Among DL models, GoogLeNet has the highest performance in identifying the laptop brands. 
    more » « less