The ability to edit 3D assets with natural language presents a compelling paradigm to aid in the democratization of 3D content creation. However, while natural language is often effective at communicating general intent, it is poorly suited for specifying exact manipulation. To address this gap, we introduce ParSEL, a system that enablescontrollableediting of high-quality 3D assets with natural language. Given a segmented 3D mesh and an editing request, ParSEL produces aparameterizedediting program. Adjusting these parameters allows users to explore shape variations with exact control over the magnitude of the edits. To infer editing programs which align with an input edit request, we leverage the abilities of large-language models (LLMs). However, we find that although LLMs excel at identifying the initial edit operations, they often fail to infer complete editing programs, resulting in outputs that violate shape semantics. To overcome this issue, we introduce Analytical Edit Propagation (AEP), an algorithm which extends a seed edit with additional operations until a complete editing program has been formed. Unlike prior methods, AEP searches for analytical editing operations compatible with a range of possible user edits through the integration of computer algebra systems for geometric analysis. Experimentally, we demonstrate ParSEL's effectiveness in enabling controllable editing of 3D objects through natural language requests over alternative system designs.
more »
« less
GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image edits. We present GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image-based object editing capabilities into a single method. Our key insight is to view image editing operations as geometric transformations. We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations. Our training-free optimization method uses an objective function that seeks to preserve object style but generate plausible images, for instance with accurate lighting and shadows. It also inpaints disoccluded parts of the image where the object was originally located. Given a natural image and user input, we segment the foreground object using SAM and estimate a corresponding transform which is used by our optimization approach for editing. GeoDiffuser can perform common 2D and 3D edits like object translation, 3D rotation, and removal. We present quantitative results, including a perceptual study, that shows how our approach is better than existing methods.
more »
« less
- Award ID(s):
- 2038897
- PAR ID:
- 10514735
- Publisher / Repository:
- arxiv 2024
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper addresses the challenge of precisely swapping objects in videos, particularly those involved in hand-object interactions (HOI), using a single user-provided reference object image. While diffusion models have advanced video editing, they struggle with the complexities of HOI, often failing to generate realistic edits when object swaps involve changes in shape or functionality. To overcome this, the authors propose HOI-Swap, a novel diffusion-based video editing framework trained in a self-supervised manner. The framework operates in two stages: (1) single-frame object swapping with HOI awareness, where the model learns to adjust interaction patterns (e.g., hand grasp) based on object property changes; and (2) sequence-wide extension, where motion alignment is achieved by warping a sequence from the edited frame using sampled motion points and conditioning generation on the warped sequence. Extensive qualitative and quantitative evaluations demonstrate that HOI-Swap significantly outperforms prior methods, producing high-quality, realistic HOI video edits.more » « less
-
In the realm of neuroscience, mapping the three-dimensional (3D) neural circuitry and architecture of the brain is important for advancing our understanding of neural circuit organization and function. This study presents a novel pipeline that transforms mouse brain samples into detailed 3D brain models using a collaborative data analytics platform called “Texera.” The user-friendly Texera platform allows for effective interdisciplinary collaboration between team members in neuroscience, computer vision, and data processing. Our pipeline utilizes the tile images from a serial two-photon tomography/TissueCyte system, then stitches tile images into brain section images, and constructs 3D whole-brain image datasets. The resulting 3D data supports downstream analyses, including 3D whole-brain registration, atlas-based segmentation, cell counting, and high-resolution volumetric visualization. Using this platform, we implemented specialized optimization methods and obtained significant performance enhancement in workflow operations. We expect the neuroscience community can adopt our approach for large-scale image-based data processing and analysis.more » « less
-
Abstract Modern CAD tools represent 3D designs not only as geometry, but also as a program composed of geometric operations, each of which depends on a set of parameters. Program representations enable meaningful and controlled shape variations via parameter changes. However, achieving desired modifications solely through parameter editing is challenging when CAD models have not been explicitly authored to expose select degrees of freedom in advance. We introduce a novel bidirectional editing system for 3D CAD programs. In addition to editing the CAD program, users can directly manipulate 3D geometry and our system infers parameter updates to keep both representations in sync. We formulate inverse edits as a set of constrained optimization objectives, returning plausible updates to program parameters that both match user intent and maintain program validity. Our approach implements an automatically differentiable domain‐specific language for CAD programs, providing derivatives for this optimization to be performed quickly on any expressed program. Our system enables rapid, interactive exploration of a constrained 3D design space by allowing users to manipulate the program and geometry interchangeably during design iteration. While our approach is not designed to optimize across changes in geometric topology, we show it is expressive and performant enough for users to produce a diverse set of design variants, even when the CAD program contains a relatively large number of parameters.more » « less
-
Reasoning about 3D objects based on 2D images is challenging due to variations in appearance caused by viewing the object from different orientations. Tasks such as object classification are invariant to 3D rotations and other such as pose estimation are equivariant. However, imposing equivariance as a model constraint is typically not possible with 2D image input because we do not have an a priori model of how the image changes under out-of-plane object rotations. The only SO(3)-equivariant models that currently exist require point cloud or voxel input rather than 2D images. In this paper, we propose a novel architecture based on icosahedral group convolutions that reasons in SO(3) by learning a projection of the input image onto an icosahedron. The resulting model is approximately equivariant to rotation in SO(3). We apply this model to object pose estimation and shape classification tasks and find that it outperforms reasonable baselines.more » « less
An official website of the United States government
