NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

https://doi.org/10.1109/CVPR42600.2020.01108

Singh, K. K.; Mahajan, D.; Grauman, K.; Lee, Y. J.; Feiszli, M.; Ghadiyaram, D. (June 2020, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

https://doi.org/10.1109/CVPR42600.2020.00806

Li, Yuheng; Singh, Krishna Kumar; Ojha, Utkarsh; Lee, Yong Jae (June 2020, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch
more » « less
Full Text Available
Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection

https://doi.org/10.1109/CVPR42600.2020.01061

Ren, Zhongzheng; Yu, Zhiding; Yang, Xiaodong; Liu, Ming-Yu; Lee, Yong Jae; Schwing, Alexander G.; Kautz, Jan (June 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
null (Ed.)
Full Text Available
Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

Ojha, Utkarsh; Singh, Krishna Kumar; Hsieh, Cho-Jui; Lee, Yong Jae (January 2020, NeurIPS)
null (Ed.)
Full Text Available
YOLACT++: Better Real-time Instance Segmentation

https://doi.org/10.1109/TPAMI.2020.3014297

Bolya, Daniel; Zhou, Chong; Xiao, Fanyi; Lee, Yong Jae (January 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence)
null (Ed.)
Full Text Available
Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers

https://doi.org/10.1007/978-3-030-58592-1_43

Gu, Xiuye; Luo, Weixin; Ryoo, Michael; Lee, Yong Jae (January 2020, ECCV 2020)

Full Text Available
Delving Deeper into Anti-aliasing in ConvNets

Zou, Xueyan; Xiao, Fanyi; Yu, Zhiding; Lee, Yong Jae (January 2020, BMVC)
null (Ed.)
Full Text Available
FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

https://doi.org/10.1109/CVPR.2019.00665

Singh, Krishna Kumar; Ojha, Utkarsh; Lee, Yong Jae (June 2019, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without any supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery.
more » « less
Full Text Available
You Reap What You Sow: Using Videos to Generate High Precision Object Proposals for Weakly-Supervised Object Detection

https://doi.org/10.1109/CVPR.2019.00964

Singh, Krishna Kumar; Lee, Yong Jae (June 2019, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

We propose a novel way of using videos to obtain high precision object proposals for weakly-supervised object detection. Existing weakly-supervised detection approaches use off-the-shelf proposal methods like edge boxes or selective search to obtain candidate boxes. These methods provide high recall but at the expense of thousands of noisy proposals. Thus, the entire burden of finding the few relevant object regions is left to the ensuing object mining step. To mitigate this issue, we focus instead on improving the precision of the initial candidate object proposals. Since we cannot rely on localization annotations, we turn to video and leverage motion cues to automatically estimate the extent of objects to train a Weakly-supervised Region Proposal Network (W-RPN). We use the W-RPN to generate high precision object proposals, which are in turn used to re-rank high recall proposals like edge boxes or selective search according to their spatial overlap. Our W-RPN proposals lead to significant improvement in performance for state-of-the-art weakly-supervised object detection approaches on PASCAL VOC 2007 and 2012.
more » « less
Full Text Available
A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Zhou, Mingyang; Cheng, Runxiang; Lee, Yong Jae; Yu, Zhou (January 2018, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP))

We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. Our model jointly optimizes the learning of a shared visual-language embedding and a translator. The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics. Our approach achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. On this dataset, our visual attention grounding model outperforms other methods by a large margin.
more » « less
Full Text Available

« Prev Next »

Search for: All records