NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MyTimeMachine: Personalized Facial Age Transformation

https://doi.org/10.1145/3731172

Qi, Luchao; Wu, Jiaye; Gong, Bang; Wang, Annie N; Jacobs, David W; Sengupta, Roni (August 2025, ACM Transactions on Graphics)

Facial aging is a complex process, highly dependent on multiple factors like gender, ethnicity, lifestyle, etc., making it extremely challenging to learn a global aging prior to predict aging for any individual accurately. Existing techniques often produce realistic and plausible aging results, but the re-aged images often do not resemble the person's appearance at the target age and thus need personalization. In many practical applications of virtual aging, e.g. VFX in movies and TV shows, access to a personal photo collection of the user depicting aging in a small time interval (20~40 years) is often available. However, naive attempts to personalize global aging techniques on personal photo collections often fail. Thus, we propose MyTimeMachine (MyTM), a method that combines a global aging prior with a personalized photo collection (ranging from as few as 10 images, ideally 50) to learn individualized age transformations. We introduce a novel Adapter Network that combines personalized aging features with global aging features and generates a re-aged image with StyleGAN2. We also introduce three loss functions to personalize the Adapter Network with personalized aging loss, extrapolation regularization, and adaptive w-norm regularization. Our method demonstrates strong performance on fair-use imagery of widely recognizable individuals, producing photorealistic and identity-consistent age transformations that generalize well across diverse appearances. It also extends naturally to video, delivering high-quality, temporally consistent results that closely resemble actual appearances at target ages—outperforming state-of-the-art approaches.
more » « less
Free, publicly-accessible full text available August 1, 2026
CALVIN: Improved Contextual Video Captioning via Instruction Tuning

https://doi.org/10.52202/079017-2952

Somepalli, Gowthami; Chowdhury, Arkabandhu; Basri, Ronen; Geiping, Jonas; Goldstein, Tom; Jacobs, David (December 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS))

The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning. Some of these models are extended to caption videos as well. However, their capabilities to understand complex scenes are limited, and the descriptions they provide for scenes tend to be overly verbose and focused on the superficial appearance of objects. Scene descriptions, especially in movies, require a deeper contextual understanding unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully “contextual” scene descriptions. To achieve this, we train our model on a suite of tasks that integrate both image-based question-answering and video captioning within a unified framework, before applying instruction tuning to refine the model’s ability to provide scene captions. Lastly, we observe that our model responds well to prompt engineering and few-shot in-context learning techniques, enabling the user to adapt it to any new movie with very little additional annotation.
more » « less
Full Text Available
Rethinking Score Distillation as a Bridge Between Image Distributions

https://doi.org/10.52202/079017-1064

McAllister, David; Ge, Songwei; Huang, Jia-Bin; Jacobs, David; Efros, Alexei; Holynski, Aleksander; Kanazawa, Angjoo (December 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS))

Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods’ characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.
more » « less
Full Text Available
CALVIN: Improved Contextual Video Captioning via Instruction Tuning

Somepalli, Gowthami; Chowdhury, Arkabandhu; Geiping, Jonas; Basri, Ronen; Goldstein, Tom; Jacobs, David W (November 2024, Advances in Neural Information Processing Systems)

Full Text Available
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation

PNVR, Koutilya; Singh, Bharat; Ghosh, Pallabi; Siddiquie, Behjat; Jacobs, David (October 2023, IEEE)

Full Text Available
Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

https://doi.org/10.1109/ICCP56744.2023.10233761

Wu, Jiaye; Chowdhury, Sanjoy; Shanmugaraja, Hariharmano; Jacobs, David; Sengupta, Soumyadip (July 2023, IEEE)

Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics. We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction. Code and Data available at: https://measuredalbedo.github.io/
more » « less
Hyperbolic Contrastive Learning for Visual Representations beyond Objects

https://doi.org/10.1109/CVPR52729.2023.00661

Ge, Songwei; Mishra, Shlok; Kornblith, Simon; Li, Chun-Liang; Jacobs, David (June 2023, IEEE)

Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion.
more » « less
Full Text Available
HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions

https://doi.org/10.1109/CVPR52729.2023.01807

Shah, Anshul; Roy, Aniket; Shah, Ketul; Mishra, Shlok; Jacobs, David; Cherian, Anoop; Chellappa, Rama (June 2023, IEEE)

Supervised learning of skeleton sequence encoders for action recognition has received significant attention in recent times. However, learning such encoders without labels continues to be a challenging problem. While prior works have shown promising results by applying contrastive learning to pose sequences, the quality of the learned representations is often observed to be closely tied to data augmentations that are used to craft the positives. However, augmenting pose sequences is a difficult task as the geometric constraints among the skeleton joints need to be enforced to make the augmentations realistic for that action. In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP – to Hallucinate Latent Positives for contrastive learning. Specifically, HaLP explores the latent space of poses in suitable directions to generate new positives. To this end, we present a novel optimization formulation to solve for the synthetic positives with an explicit control on their hardness. We propose approximations to the objective, making them solvable in closed form with minimal overhead. We show via experiments that using these generated positives within a standard contrastive learning framework leads to consistent improvements across benchmarks such as NTU-60, NTU- 120, and PKU-II on tasks like linear evaluation, transfer learning, and kNN evaluation. Our code can be found at https://github.com/anshulbshah/HaLP.
more » « less
Full Text Available
Shape and Material Capture at Home

https://doi.org/10.1109/CVPR46437.2021.00606

Lichy, Daniel; Wu, Jiaye; Sengupta, Soumyadip; Jacobs, David W. (June 2021, Computer Vision and Pattern Recognition)

In this paper, we present a technique for estimating the geometry and reflectance of objects using only a camera, flashlight, and optionally a tripod. We propose a simple data capture technique in which the user goes around the object, illuminating it with a flashlight and capturing only a few images. Our main technical contribution is the introduction of a recursive neural architecture, which can predict geometry and reflectance at 2 k ×2 k resolution given an input image at 2 k ×2 k and estimated geometry and reflectance from the previous step at 2 k−1 ×2 k−1 . This recursive architecture, termed RecNet, is trained with 256×256 resolution but can easily operate on 1024×1024 images during inference. We show that our method produces more accurate surface normal and albedo, especially in regions of specular highlights and cast shadows, compared to previous approaches, given three or fewer input images.
more » « less
Full Text Available
Fossil Corals With Various Degrees of Preservation Can Retain Information About Biomineralization-Related Organic Material

https://doi.org/10.3389/feart.2021.643864

Drake, Jeana L.; Guillermic, Maxence; Eagle, Robert A.; Jacobs, David K. (June 2021, Frontiers in Earth Science)

Scleractinian corals typically form a robust calcium carbonate skeleton beneath their living tissue. This skeleton, through its trace element composition and isotope ratios, may record environmental conditions of water surrounding the coral animal. While bulk unrecrystallized aragonite coral skeletons can be used to reconstruct past ocean conditions, corals that have undergone significant diagenesis have altered geochemical signatures and are typically assumed to retain insufficient meaningful information for bulk or macrostructural analysis. However, partially recrystallized skeletons may retain organic molecular components of the skeletal organic matrix (SOM), which is secreted by the animal and directs aspects of the biomineralization process. Some SOM proteins can be retained in fossil corals and can potentially provide past oceanographic, ecological, and indirect genetic information. Here, we describe a dataset of scleractinian coral skeletons, aged from modern to Cretaceous plus a Carboniferous rugosan, characterized for their crystallography, trace element composition, and amino acid compositions. We show that some specimens that are partially recrystallized to calcite yield potentially useful biochemical information whereas complete recrystalization or silicification leads to significant alteration or loss of the SOM fraction. Our analysis is informative to biochemical-paleoceanographers as it suggests that previously discounted partially recrystallized coral skeletons may indeed still be useful at the microstructural level.
more » « less
Full Text Available

« Prev Next »

Search for: All records