We present a method to reconstruct dynamic scenes from monocular continuous-wave time-of-flight (C-ToF) cameras using raw sensor samples that achieves similar or better accuracy than neural volumetric approaches and is 100×faster. Quickly achieving high-fidelity dynamic 3D reconstruction from a single viewpoint is a significant challenge in computer vision. In C-ToF radiance field reconstruction, the property of interest—depth—is not directly measured, causing an additional challenge. This problem has a large and underappreciated impact upon the optimization when using a fast primitive-based scene representation like 3D Gaussian splatting, which is commonly used with multi-view data to produce satisfactory results and is brittle in its optimization otherwise. We incorporate two heuristics into the optimization to improve the accuracy of scene geometry represented by Gaussians. Experimental results show that our approach produces accurate reconstructions under constrained C-ToF sensing conditions, including for fast motions like swinging baseball bats. https://visual.cs.brown.edu/gftorf
more »
« less
This content will become publicly available on June 11, 2026
WonderWorld: Interactive 3D Scene Generation from a Single Image
We present WonderWorld, a novel framework for interactive 3D scene generation that enables users to interactively specify scene contents and layout and see the created scenes in low latency. The major challenge lies in achieving fast generation of 3D scenes. Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene representations. Our approach does not need multiple views, and it leverages a geometry-based initialization that significantly reduces optimization time. Another challenge is generating coherent geometry that allows all scenes to be connected. We introduce the guided depth diffusion that allows partial conditioning of depth estimation. WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. Our interactive demo, full code, data, and software can be found at https://kovenyu.com/WonderWorld/
more »
« less
- Award ID(s):
- 1955864
- PAR ID:
- 10646692
- Publisher / Repository:
- IEEE / CVF CVPR 2025
- Date Published:
- Page Range / eLocation ID:
- 5916-5926
- Format(s):
- Medium: X
- Location:
- Nashville, TN
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Non-Line-Of-Sight (NLOS) imaging aims at recovering the 3D geometry of objects that are hidden from the direct line of sight. One major challenge with this technique is the weak available multibounce signal limiting scene size, capture speed, and reconstruction quality. To overcome this obstacle, we introduce a multipixel time-of-flight non-line-of-sight imaging method combining specifically designed Single Photon Avalanche Diode (SPAD) array detectors with a fast reconstruction algorithm that captures and reconstructs live low-latency videos of non-line-of-sight scenes with natural non-retroreflective objects. We develop a model of the signal-to-noise-ratio of non-line-of-sight imaging and use it to devise a method that reconstructs the scene such that signal-to-noise-ratio, motion blur, angular resolution, and depth resolution are all independent of scene depth suggesting that reconstruction of very large scenes may be possible.more » « less
-
Visual scenes are often remembered as if they were observed from a different viewpoint. Some scenes are remembered as farther than they appeared, and others as closer. These memory distortions—also known as boundary extension and contraction—are strikingly consistent for a given scene, but their cause remains unknown. We tested whether these distortions can be explained by an inferential process that adjusts scene memories toward high-probability views, using viewing depth as a test case. We first carried out a large-scale analysis of depth maps of natural indoor scenes to quantify the statistical probability of views in depth. We then assessed human observers’ memory for these scenes at various depths and found that viewpoint judgments were consistently biased toward the modal depth, even when just a few seconds elapsed between viewing and reporting. Thus, scenes closer than the modal depth showed a boundary-extension bias (remembered as farther-away), and scenes farther than the modal depth showed a boundary-contraction bias (remembered as closer). By contrast, scenes at the modal depth did not elicit a consistent bias in either direction. This same pattern of results was observed in a follow-up experiment using tightly controlled stimuli from virtual environments. Together, these findings show that scene memories are biased toward statistically probable views, which may serve to increase the accuracy of noisy or incomplete scene representations.more » « less
-
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at arm.stanford.edu/next-best-sense.more » « less
-
The placement of vegetation plays a central role in the realism of virtual scenes. We introduce procedural placement models (PPMs) for vegetation in urban layouts. PPMs are environmentally sensitive to city geometry and allow identifying plausible plant positions based on structural and functional zones in an urban layout. PPMs can either be directly used by defining their parameters or learned from satellite images and land register data. This allows us to populate urban landscapes with complex 3D vegetation and enhance existing approaches for generating urban landscapes. Our framework’s effectiveness is shown through examples of large-scale city scenes and close-ups of individually grown tree models. We validate the results generated with our framework with a perceptual user study and its usability based on urban scene design sessions with expert users.more » « less
An official website of the United States government
