NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Zhang, Tianyuan; Yu, Hong-Xing; Wu, Rundi; Feng, Brandon Y; Zheng, Changxi; Snavely, Noah; Wu, Jiajun; Freeman, William T (October 2024, Springer Nature Link)

Full Text Available
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Zhang, Tianyuan; Yu, Hong-Xing; Wu, Rundi; Feng, Brandon Y; Zheng, Changxi; Snavely, Noah; Wu, Jiajun; Freeman, William T (October 2024, European Conference on Computer Vision (ECCV))

Full Text Available
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Van_Hoorick, Basile; Wu, Rundi; Ozguroglu, Ege; Sargent, Kyle; Liu, Ruoshi; Tokmakov, Pavel; Dave, Achal; Zheng, Changxi; Vondrick, Carl (September 2024, European Conference on Computer Vision)

Full Text Available
Learning to Generate 3D Shapes from a Single Example

https://doi.org/10.1145/3550454.3555480

Wu, Rundi; Zheng, Changxi (December 2022, ACM Transactions on Graphics)

Existing generative models for 3D shapes are typically trained on a large 3D dataset, often of a specific object category. In this paper, we investigate the deep generative model that learns from only a single reference 3D shape. Specifically, we present a multi-scale GAN-based model designed to capture the input shape's geometric features across a range of spatial scales. To avoid large memory and computational cost induced by operating on the 3D volume, we build our generator atop the tri-plane hybrid representation, which requires only 2D convolutions. We train our generative model on a voxel pyramid of the reference shape, without the need of any external supervision or manual annotation. Once trained, our model can generate diverse and high-quality 3D shapes possibly of different sizes and aspect ratios. The resulting shapes present variations across different scales, and at the same time retain the global structure of the reference shape. Through extensive evaluation, both qualitative and quantitative, we demonstrate that our model can generate 3D shapes of various types. 1
more » « less
Full Text Available
DeepCAD: A Deep Generative Network for Computer-Aided Design Models

https://doi.org/10.1109/ICCV48922.2021.00670

Wu, Rundi; Xiao, Chang; Zheng, Changxi (October 2021, International Conference on Computer Vision)

Deep generative models of 3D shapes have received a great deal of research interest. Yet, almost all of them generate discrete shape representations, such as voxels, point clouds, and polygon meshes. We present the first 3D generative model for a drastically different shape representation—describing a shape as a sequence of computer-aided design (CAD) operations. Unlike meshes and point clouds, CAD models encode the user creation process of 3D shapes, widely used in numerous industrial and engineering design tasks. However, the sequential and irregular structure of CAD operations poses significant challenges for existing 3D generative models. Drawing an analogy between CAD operations and natural language, we propose a CAD generative network based on the Transformer. We demonstrate the performance of our model for both shape autoencoding and random shape generation. To train our network, we create a new CAD dataset consisting of 178,238 models and their CAD construction sequences. We have made this dataset publicly available to promote future research on this topic.
more » « less
Full Text Available
Listening to Sounds of Silence for Speech Denoising

Xu, Ruilin; Wu, Rundi; Ishiwaka, Yuko; Vondrick, Carl; Zheng, Changxi (December 2020, Advances in neural information processing systems)

We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications. Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time-varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. Experiments on multiple datasets confirm the pivotal role of silent interval detection for speech denoising, and our method outperforms several state-of-the-art denoising methods, including those that accept only audio input (like ours) and those that denoise based on audiovisual input (and hence require more information). We also show that our method enjoys excellent generalization properties, such as denoising spoken languages not seen during training.
more » « less
Full Text Available

Search for: All records