Advances in neural fields are enablling high-fidelity capture of shape and appearance of dynamic 3D scenes. However, this capbabilities lag behind those offered by conventional representations such as 2D videos because of algorithmic challenges and the lack of large-scale multi-view real-world datasets. We address the dataset limitations with DiVa-360, a real-world 360° dynamic visual dataset that contains synchronized high-resolution and long-duration multi-view video sequences of table-scale scenes captured using a customized low-cost system with 53 cameras. It contains 21 object-centric sequences categorized by different motion types, 25 intricate hand-object interaction sequences, and 8 long-duration sequences for a total of 17.4M frames. In addition, we provide foreground-background segmentation masks, synchronized audio, and text descriptions. We benchmark the state-of-the-art dynamic neural field methods on DiVa-360 and provide insights about existing methods and future challenges on long-duration neural field capture.
more »
« less
Skeletons, Object Shape, Statistics
Objects and object complexes in 3D, as well as those in 2D, have many possible representations. Among them skeletal representations have special advantages and some limitations. For the special form of skeletal representation called “s-reps,” these advantages include strong suitability for representing slabular object populations and statistical applications on these populations. Accomplishing these statistical applications is best if one recognizes that s-reps live on a curved shape space. Here we will lay out the definition of s-reps, their advantages and limitations, their mathematical properties, methods for fitting s-reps to single- and multi-object boundaries, methods for measuring the statistics of these object and multi-object representations, and examples of such applications involving statistics. While the basic theory, ideas, and programs for the methods are described in this paper and while many applications with evaluations have been produced, there remain many interesting open opportunities for research on comparisons to other shape representations, new areas of application and further methodological developments, many of which are explicitly discussed here.
more »
« less
- Award ID(s):
- 2113404
- PAR ID:
- 10436718
- Date Published:
- Journal Name:
- Frontiers in Computer Science
- Volume:
- 4
- ISSN:
- 2624-9898
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Advances in neural fields are enablling high-fidelity capture of shape and appearance of dynamic 3D scenes. However, this capbabilities lag behind those offered by conventional representations such as 2D videos because of algorithmic challenges and the lack of large-scale multi-view real-world datasets. We address the dataset limitations with DiVa-360, a real-world 360° dynamic visual dataset that contains synchronized high-resolution and long-duration multi-view video sequences of table-scale scenes captured using a customized low-cost system with 53 cameras. It contains 21 object-centric sequences categorized by different motion types, 25 intricate hand-object interaction sequences, and 8 long-duration sequences for a total of 17.4M frames. In addition, we provide foreground-background segmentation masks, synchronized audio, and text descriptions. We benchmark the state-of-the-art dynamic neural field methods on DiVa-360 and provide insights about existing methods and future challenges on long-duration neural field capture.more » « less
-
null (Ed.)Recent advances in deep generative models have led to immense progress in 3D shape synthesis. While existing models are able to synthesize shapes represented as voxels, point-clouds, or implicit functions, these methods only indirectly enforce the plausibility of the final 3D shape surface. Here we present a 3D shape synthesis framework (SurfGen) that directly applies adversarial training to the object surface. Our approach uses a differentiable spherical projection layer to capture and represent the explicit zero isosurface of an implicit 3D generator as functions defined on the unit sphere. By processing the spherical representation of 3D object surfaces with a spherical CNN in an adversarial setting, our generator can better learn the statistics of natural shape surfaces. We evaluate our model on large-scale shape datasets, and demonstrate that the end-to-end trained model is capable of generating high fidelity 3D shapes with diverse topology.more » « less
-
na (Ed.)Bayesian methods have been widely used in the last two decades to infer statistical proper- ties of spatially variable coefficients in partial differential equations from measurements of the solutions of these equations. Yet, in many cases the number of variables used to param- eterize these coefficients is large, and oobtaining meaningful statistics of their probability distributions is difficult using simple sampling methods such as the basic Metropolis– Hastings algorithm—in particular, if the inverse problem is ill-conditioned or ill-posed. As a consequence, many advanced sampling methods have been described in the literature that converge faster than Metropolis–Hastings, for example, by exploiting hierarchies of statistical models or hierarchies of discretizations of the underlying differential equation. At the same time, it remains difficult for the reader of the literature to quantify the advantages of these algorithms because there is no commonly used benchmark. This paper presents a benchmark Bayesian inverse problem—namely, the determination of a spatially variable coefficient, discretized by 64 values, in a Poisson equation, based on point mea- surements of the solution—that fills the gap between widely used simple test cases (such as superpositions of Gaussians) and real applications that are difficult to replicate for de- velopers of sampling algorithms. We provide a complete description of the test case and provide an open-source implementation that can serve as the basis for further experiments. We have also computed 2 × 10^11 samples, at a cost of some 30 CPU years, of the poste- rior probability distribution from which we have generated detailed and accurate statistics against which other sampling algorithms can be tested.more » « less
-
null (Ed.)The success of supervised learning requires large-scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks including multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the five different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method.more » « less
An official website of the United States government

