NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adaptive Length Image Tokenization via Recurrent Allocation

Duggal, Shivam; Isola, Phillip; Torralba, Antonio; Freeman, William T (January 2025, Open Review)

Free, publicly-accessible full text available January 22, 2026
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

Shen, William; Yang, Ge; Yu, Alan; Wong, Jansen; Kaelbling, Leslie; Isola, Phillip (November 2023, Proceedings of Machine Learning Research: Conference on Robot Learning (CoRL) 2023)

Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manip- ulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using fea- tures distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects. Project website: https://f3rm.csail.mit.edu
more » « less
Full Text Available
Learning To Generate Line Drawings That Convey Geometry and Semantics

https://doi.org/10.1109/CVPR52688.2022.00776

Chan, Caroline; Durand, Frédo; Isola, Phillip (June 2022, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Zhang, Richard; Isola, Phillip; Efros, Alexei (July 2017, IEEE Computer Society Conference on Computer Vision and Pattern Recognition)

We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning. The method adds a split to the network, resulting in two disjoint sub-networks. Each sub-network is trained to perform a difficult task -- predicting one subset of the data channels from another. Together, the sub-networks extract features from the entire input signal. By forcing the network to solve cross-channel prediction tasks, we induce a representation within the network which transfers well to other, unseen tasks. This method achieves state-of-the-art performance on several large-scale transfer learning benchmarks.
more » « less
Full Text Available
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Zhang, Richard; Isola, Phillip; Efros, Alexei; Shechtman, Eli; Wang, Oliver (January 2018, CVPR)

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.
more » « less
Full Text Available
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Zhu, Jun-Yan; Park, Taesung; Isola, Phillip; Efros, Alexei A (October 2017, IEEE International Conference on Computer Vision)

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $$X$$ to a target domain $$Y$$ in the absence of paired examples. Our goal is to learn a mapping $$G: X \rightarrow Y$$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $$Y$$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $$F: Y \rightarrow X$$ and introduce a {\em cycle consistency loss} to push $$F(G(X)) \approx X$$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
more » « less
Full Text Available
CyCADA: Cycle-Consistent Adversarial Domain Adaptation

Hoffman, Judy; Tzeng, Eric; Park, Taesung; Zhu, Jun-Yan; Isola, Phillip; Saenko, Kate; Efros, Alexei; Darrell, Trevor (January 2018, Proceedings of the 35th International Conference on Machine Learning)

Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models have shown tremendous progress towards adapting to new environments by focusing either on discovering domain invariant representations or by mapping between unpaired image domains. While feature space methods are difficult to interpret and sometimes fail to capture pixel-level and low-level domain shifts, image space methods sometimes fail to incorporate high level semantic knowledge relevant for the end task. We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. Our approach, Cycle-Consistent Adversarial Domain Adaptation (CyCADA), guides transfer between domains according to a specific discriminatively trained task and avoids divergence by enforcing consistency of the relevant semantics before and after adaptation. We evaluate our method on a variety of visual recognition and prediction settings, including digit classification and semantic segmentation of road scenes, advancing state-of-the-art performance for unsupervised adaptation from synthetic to real world driving domains.
more » « less
Full Text Available

Search for: All records