skip to main content

Search for: All records

Creators/Authors contains: "Singh, Krishna Kumar"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Our goal is to predict the camera wearer’s location and pose in his/her environment based on what’s captured by the camera wearer’s first-person wearable camera. Toward this goal, we first collect a new dataset in which the camera wearer performs various activities (e.g., opening a fridge, reading a book) in different scenes with time-synchronized first-person and stationary third-person cameras. We then propose a novel deep network architecture, which takes as input the first-person video frames and empty third-person scene image (without the camera wearer) to predict the location and pose of the camera wearer. We explore and compare our approach with several intuitive baselines and show initial promising results on this novel, challenging problem. 
    more » « less
  2. We propose a new approach for high resolution semantic image synthesis. It consists of one base image generator and multiple class-specific generators. The base generator generates high quality images based on a segmentation map. To further improve the quality of different objects, we create a bank of Generative Adversarial Networks (GANs) by separately training class-specific models. This has several benefits including – dedicated weights for each class; centrally aligned data for each model; additional training data from other sources, potential of higher resolution and quality; and easy manipulation of a specific object in the scene. Experiments show that our approach can generate high quality images in high resolution while having flexibility of object-level control by using class-specific generators. Project page: 
    more » « less
  3. We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. We augment an existing approach that can disentangle factors within a single domain but struggles to do so across domains. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains. 
    more » « less
  4. We propose PartGAN, a novel generative model that disentangles and generates background, object shape, object texture, and decomposes objects into parts without any mask or part annotations. To achieve object-level disentanglement, we build upon prior work and maximize the mutual information between the generated factors and sampled latent prior codes. To achieve part-level decomposition, we learn a part generator, which decomposes an object into parts that are spatially localized, disjoint, and consistent across instances. Extensive experiments on multiple datasets demonstrate that PartGAN discovers consistent object parts, which enable part-based controllable image generation. 
    more » « less
  5. We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at 
    more » « less
  6. We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN [10], and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as a signal to learn the appropriate latent distribution representing object identity. Experiments on both artificial (MNIST, 3D cars, 3D chairs, ShapeNet) and real-world (YouTube-Faces) imbalanced datasets demonstrate the effectiveness of our method in disentangling object identity as a latent factor of variation. 
    more » « less
  7. null (Ed.)
  8. null (Ed.)