Toward Multimodal Image-to-Image Translation
- Award ID(s):
- 1633310
- NSF-PAR ID:
- 10072420
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Page Range / eLocation ID:
- 465-476
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired image-to-image translation problem due to the lack of paired images captured under the exact same camera poses and semantic layouts. While perfectly- aligned images are not available, one can easily obtain coarsely- paired images. For instance, many people drive the same routes daily in both good and adverse weather; thus, images captured at close-by GPS locations can form a pair. Though data from repeated traversals are unlikely to capture the same foreground objects, we posit that they provide rich contextual information to supervise the image translation model. To this end, we propose a novel training objective leveraging coarsely- aligned image pairs. We show that our coarsely-aligned training scheme leads to a better image translation quality and improved downstream tasks, such as semantic segmentation, monocular depth estimation, and visual localization.more » « less
-
Recently, image-to-image translation (I2I) has met with great success in computer vision, but few works have paid attention to the geometric changes that occur during translation. The geometric changes are necessary to reduce the geometric gap between domains at the cost of breaking correspondence between translated images and original ground truth. We propose a novel geometry-aware semi-supervised method to preserve this correspondence while still allowing geometric changes. The proposed method takes a synthetic image-mask pair as input and produces a corresponding real pair. We also utilize an objective function to ensure consistent geometric movement of the image and mask through the translation. Extensive experiments illustrate that our method yields a 11.23% higher mean Intersection-Over-Union than the current methods on the downstream eye segmentation task. The generated image has a 15.9% decrease in Frechet Inception Distance indicating higher image quality.more » « less