skip to main content


Title: ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN
Video summarization aims to simplify large-scale video browsing by generating con- cise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations, recent progress for video summarization concentrates on unsupervised methods, among which the GAN-based methods are most prevalent. This type of methods includes a summarizer and a discriminator. The summarized video from the summarizer will be assumed as the final output, only if the video reconstructed from this summary cannot be discriminated from the original one by the discriminator. The primary problems of this GAN-based methods are two-folds. First, the summarized video in this way is a subset of original video with low redundancy and contains high priority events/entities. This summarization criterion is not enough. Second, the training of the GAN framework is not stable. This paper proposes a novel Entity–relationship Aware video summarization method (ERA) to address the above problems. To be more spe- cific, we introduce an Adversarial Spatio-Temporal network to construct the relationship among entities, which we think should also be given high priority in the summarization. The GAN training problem is solved by introducing the Wasserstein GAN and two newly proposed video-patch/score-sum losses. In addition, the score-sum loss can also relieve the model sensitivity to the varying video lengths, which is an inherent problem for most current video analysis tasks. Our method substantially lifts the performance on the target benchmark datasets and exceeds the current state-of-the-art. We hope our straightfor- ward yet effective approach will shed some light on the future research of unsupervised video summarization. The code is available online.  more » « less
Award ID(s):
1828576
NSF-PAR ID:
10311382
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The 32nd British Machine Vision Conference 2021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A Generative Adversarial Network (GAN) is an unsupervised generative framework to generate a sample distribution that is identical to the data distribution. Recently, mix strategy multi-generator/discriminator GANs have been shown to outperform single pair GANs. However, the mixed model suffers from the problem of linearly growing training time. Also, imbalanced training among generators makes it difficult to parallelize. In this paper, we propose a balanced mix-generator GAN that works in parallel by mixing multiple disjoint generators to approximate the real distribution. The weights of the discriminator and the classifier are controlled by a balance strategy. We also present an efficient loss function, to force each generator to embrace few modes with a high probability. Our model is naturally adaptive to large parallel computation frameworks. Each generator can be trained on multiple GPUs asynchronously. We have performed extensive experiments on synthetic datasets, MNIST1000, CIFAR-10, and ImageNet. The results establish that our model can achieve the state-of-the-art performance (in terms of the modes coverage and the inception score), with significantly reduced training time. We also show that the missing mode problem can be relieved with a growing number of generators. 
    more » « less
  2. null (Ed.)
    A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear timevarying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving overparameterized models. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board. 
    more » « less
  3. Novel machine learning algorithms that make the best use of a significantly less amount of data are of great interest. For example, active learning (AL) aims at addressing this problem by iteratively training a model using a small number of labeled data, testing the whole data on the trained model, and then querying the labels of some selected data, which then are used for training a new model. This paper presents a fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data. We propose a new selection algorithm, referred to as iterative projection and matching (IPM), with linear complexity w.r.t. the number of data, and without any parameters to be tuned. In our algorithm, at each iteration, the maximum information from the structure of the data is captured by one selected sample, and the captured information is neglected in the next iterations by projection on the null-space of previously selected samples. The computational efficiency and the selection accuracy of our proposed algorithm outperform those of the conventional methods. Furthermore, the superiority of the proposed algorithm is shown on active learning for video action recognition dataset on UCF-101; learning using representatives on ImageNet; training a generative adversarial network (GAN) to generate multi-view images from a single-view input on CMU Multi-PIE dataset; and video summarization on UTE Egocentric dataset. 
    more » « less
  4. In this paper, we introduce DuelGAN, a generative adversarial network (GAN) solution to improve the stability of the generated samples and to mitigate mode collapse. Built upon the Vanilla GAN’s two-player game between the discriminator D1 and the generator G, we introduce a peer discriminator D2 to the min-max game. Similar to previous work using two discriminators, the first role of both D1, D2 is to distinguish between generated samples and real ones, while the generator tries to generate high-quality samples which are able to fool both discriminators. Different from existing methods, we introduce a duel between D1 and D2 to discourage their agreement and therefore increase the level of diversity of the generated samples. This property alleviates the issue of early mode collapse by preventing D1 and D2 from converging too fast. We provide theoretical analysis for the equilibrium of the min-max game formed among G,D1,D2. We offer convergence behavior of DuelGAN as well as stability of the min-max game. It’s worth mentioning that DuelGAN operates in the unsupervised setting, and the duel between D1 and D2 does not need any label supervision. Experiments results on a synthetic dataset and on real-world image datasets (MNIST, Fashion MNIST, CIFAR-10, STL-10, CelebA, VGG) demonstrate that DuelGAN outperforms competitive baseline work in generating diverse and high-quality samples, while only introduces negligible computation cost. Our code is publicly available at https://github.com/UCSC-REAL/DuelGAN. 
    more » « less
  5. Generative adversarial networks (GANs) are powerful tools for learning generative models. In practice, the training may suffer from lack of convergence. GANs are commonly viewed as a two-player zero-sum game between two neural networks. Here, we leverage this game theoretic view to study the convergence behavior of the training process. Inspired by the fictitious play learning process, a novel training method, referred to as Fictitious GAN, is introduced. Fictitious GAN trains the deep neural networks using a mixture of historical models. Specifically, the discriminator (resp. generator) is updated according to the best-response to the mixture outputs from a sequence of previously trained generators (resp. discriminators). It is shown that Fictitious GAN can effectively resolve some convergence issues that cannot be resolved by the standard training approach. It is proved that asymptotically the average of the generator outputs has the same distribution as the data samples. 
    more » « less