How can one visually characterize photographs of people over time? In this work, we describe the
Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO [Caron et al. 2021] features coupled with a proposed Cross-Similarity Feature Weighting module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials.
more » « less- Award ID(s):
- 2019786
- PAR ID:
- 10505452
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- ACM Transactions on Graphics
- Volume:
- 42
- Issue:
- 4
- ISSN:
- 0730-0301
- Page Range / eLocation ID:
- 1 to 14
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Faces Through Time dataset, which contains over a thousand portrait images per decade from the 1880s to the present day. Using our new dataset, we devise a framework for resynthesizing portrait images across time, imagining how a portrait taken during a particular decade might have looked like had it been taken in other decades. Our framework optimizes a family of per‐decade generators that reveal subtle changes that differentiate decades—such as different hairstyles or makeup—while maintaining the identity of the input portrait. Experiments show that our method can more effectively resynthesizing portraits across time compared to state‐of‐the‐art image‐to‐image translation methods, as well as attribute‐based and language‐guided portrait editing models. Our code and data will be available at facesthroughtime.github.io. -
Abstract Despite the ubiquitous use of materials maps in modern rendering pipelines, their editing and control remains a challenge. In this paper, we present an example‐based material control method to augment input material maps based on user‐provided material photos. We train a tileable version of MaterialGAN and leverage its material prior to guide the appearance transfer, optimizing its latent space using differentiable rendering. Our method transfers the micro and meso‐structure textures of user provided target(s) photographs, while preserving the structure and quality of the input material. We show our methods can control existing material maps, increasing realism or generating new, visually appealing materials.
-
The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image edits. We present GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image-based object editing capabilities into a single method. Our key insight is to view image editing operations as geometric transformations. We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations. Our training-free optimization method uses an objective function that seeks to preserve object style but generate plausible images, for instance with accurate lighting and shadows. It also inpaints disoccluded parts of the image where the object was originally located. Given a natural image and user input, we segment the foreground object using SAM and estimate a corresponding transform which is used by our optimization approach for editing. GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal. We present quantitative results, including a perceptual study, that shows how our approach is better than existing methods.more » « less
-
Abstract Many vision‐based indoor localization methods require tedious and comprehensive pre‐mapping of built environments. This research proposes a mapping‐free approach to estimating indoor camera poses based on a 3D style‐transferred building information model (BIM) and photogrammetry technique. To address the cross‐domain gap between virtual 3D models and real‐life photographs, a CycleGAN model was developed to transform BIM renderings into photorealistic images. A photogrammetry‐based algorithm was developed to estimate camera pose using the visual and spatial information extracted from the style‐transferred BIM. The experiments demonstrated the efficacy of CycleGAN in bridging the cross‐domain gap, which significantly improved performance in terms of image retrieval and feature correspondence detection. With the 3D coordinates retrieved from BIM, the proposed method can achieve near real‐time camera pose estimation with an accuracy of 1.38 m and 10.1° in indoor environments.
-
Abstract We propose a regularization-based deblurring method that works efficiently for galaxy images. The spatial resolution of a ground-based telescope is generally limited by seeing conditions and is much worse than space-based telescopes. This circumstance has generated considerable research interest in the restoration of spatial resolution. Since image deblurring is a typical inverse problem and often ill-posed, solutions tend to be unstable. To obtain a stable solution, much research has adopted regularization-based methods for image deblurring, but the regularization term is not necessarily appropriate for galaxy images. Although galaxies have an exponential or Sérsic profile, the conventional regularization assumes the image profiles to behave linearly in space. The significant deviation between the assumption and real situations leads to blurring of the images and smoothing out the detailed structures. Clearly, regularization on logarithmic domain, i.e., magnitude domain, should provide a more appropriate assumption, which we explore in this study. We formulate a problem of deblurring galaxy images by an objective function with a Tikhonov regularization term on a magnitude domain. We introduce an iterative algorithm minimizing the objective function with a primal–dual splitting method. We investigate the feasibility of the proposed method using simulation and observation images. In the simulation, we blur galaxy images with a realistic point spread function and add both Gaussian and Poisson noise. For the evaluation with the observed images, we use galaxy images taken by the Subaru HSC-SSP. Both of these evaluations show that our method successfully recovers the spatial resolution of the deblurred images and significantly outperforms the conventional methods. The code is publicly available from the GitHub 〈https://github.com/kzmurata-astro/PSFdeconv_amag〉.