NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation

PNVR, Koutilya; Singh, Bharat; Ghosh, Pallabi; Siddiquie, Behjat; Jacobs, David (October 2023, IEEE)

Full Text Available
Preserve your own correlation: A noise prior for video diffusion models.

Ge S; Nah S; Liu G; Poon T; Tao A; Catanzaro B; Jacobs D; Huang JB; Liu MY; Balaji Y. (October 2023, International Conference of Computer Vision)

Full Text Available
Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

https://doi.org/10.1109/ICCP56744.2023.10233761

Wu, Jiaye; Chowdhury, Sanjoy; Shanmugaraja, Hariharmano; Jacobs, David; Sengupta, Soumyadip (July 2023, IEEE)

Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics. We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction. Code and Data available at: https://measuredalbedo.github.io/
more » « less
Hyperbolic Contrastive Learning for Visual Representations beyond Objects

https://doi.org/10.1109/CVPR52729.2023.00661

Ge, Songwei; Mishra, Shlok; Kornblith, Simon; Li, Chun-Liang; Jacobs, David (June 2023, IEEE)

Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion.
more » « less
Full Text Available
HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions

https://doi.org/10.1109/CVPR52729.2023.01807

Shah, Anshul; Roy, Aniket; Shah, Ketul; Mishra, Shlok; Jacobs, David; Cherian, Anoop; Chellappa, Rama (June 2023, IEEE)

Supervised learning of skeleton sequence encoders for action recognition has received significant attention in recent times. However, learning such encoders without labels continues to be a challenging problem. While prior works have shown promising results by applying contrastive learning to pose sequences, the quality of the learned representations is often observed to be closely tied to data augmentations that are used to craft the positives. However, augmenting pose sequences is a difficult task as the geometric constraints among the skeleton joints need to be enforced to make the augmentations realistic for that action. In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP – to Hallucinate Latent Positives for contrastive learning. Specifically, HaLP explores the latent space of poses in suitable directions to generate new positives. To this end, we present a novel optimization formulation to solve for the synthetic positives with an explicit control on their hardness. We propose approximations to the objective, making them solvable in closed form with minimal overhead. We show via experiments that using these generated positives within a standard contrastive learning framework leads to consistent improvements across benchmarks such as NTU-60, NTU- 120, and PKU-II on tasks like linear evaluation, transfer learning, and kNN evaluation. Our code can be found at https://github.com/anshulbshah/HaLP.
more » « less
Full Text Available
On the spectral bias of convolutional neural tangent and gaussian process kernels

Geifman, A; Galun, M; Jacobs, D; Basri, R (December 2022, Advances in neural information processing systems)

Full Text Available
Object-aware cropping for self-supervised learning

Mishra, S; Shah, A; Bansal, A; Jagannatha, A; Sharma, A; Jacobs, D; Krishnan, D. (December 2022, Transactions on machine learning research)

Full Text Available
Autoregressive Perturbations for Data Poisoning

Sandoval-Segura, P; Singla, V; Geiping, J; Goldblum, M; Goldstein, T; Jacobs, D (November 2022, Thirty-sixth Conference on Neural Information Processing Systems)

The prevalence of data scraping from social media as a means to obtain datasets has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping, as they make data "unlearnable'' by adding small, imperceptible perturbations. Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack. In this work, we introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.
more » « less
Full Text Available
Fast Light-Weight Near-Field Photometric Stereo

Daniel Lichy, Soumyadip Sengupta (July 2022, IEEE Conference on Computer Vision and Pattern Recognition)

We introduce the first end-to-end learning-based solution to near-field Photometric Stereo (PS), where the light sources are close to the object of interest. This setup is especially useful for reconstructing large immobile objects. Our method is fast, producing a mesh from 52 512x384 resolution images in about 1 second on a commodity GPU, thus potentially unlocking several AR/VR applications. Existing approaches rely on optimization coupled with a far-field PS network operating on pixels or small patches. Using optimization makes these approaches slow and memory intensive (requiring 17GB GPU and 27GB of CPU memory) while using only pixels or patches makes them highly susceptible to noise and calibration errors. To address these issues, we develop a recursive multi-resolution scheme to estimate surface normal and depth maps of the whole image at each step. The predicted depth map at each scale is then used to estimate 'per-pixel lighting' for the next scale. This design makes our approach almost 45x faster and 2 degrees more accurate (11.3 vs. 13.3 degrees Mean Angular Error) than the state-of-the-art near-field PS reconstruction technique, which uses iterative optimization.
more » « less
Full Text Available
Shape and Material Capture at Home

https://doi.org/10.1109/CVPR46437.2021.00606

Lichy, Daniel; Wu, Jiaye; Sengupta, Soumyadip; Jacobs, David W. (June 2021, Computer Vision and Pattern Recognition)

In this paper, we present a technique for estimating the geometry and reflectance of objects using only a camera, flashlight, and optionally a tripod. We propose a simple data capture technique in which the user goes around the object, illuminating it with a flashlight and capturing only a few images. Our main technical contribution is the introduction of a recursive neural architecture, which can predict geometry and reflectance at 2 k ×2 k resolution given an input image at 2 k ×2 k and estimated geometry and reflectance from the previous step at 2 k−1 ×2 k−1 . This recursive architecture, termed RecNet, is trained with 256×256 resolution but can easily operate on 1024×1024 images during inference. We show that our method produces more accurate surface normal and albedo, especially in regions of specular highlights and cast shadows, compared to previous approaches, given three or fewer input images.
more » « less
Full Text Available

« Prev Next »

Search for: All records