skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
In this paper, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human. In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Specifically, we first introduce a temporal transformer that aggregates tracked visual features based on the skeletal body motion over time. Moreover, a multi-view transformer is proposed to perform cross-attention between the temporally-fused features and the pixel-aligned features at each time step to integrate observations on the fly from multiple views. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses. The video results and code are available at https://youngjoongunc.github.io/nhp.  more » « less
Award ID(s):
1840131
PAR ID:
10390738
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. 
    more » « less
  2. Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. 
    more » « less
  3. Recent advances in Neural Radiance Field (NeRF)-based methods have enabled high-fidelity novel view synthesis for video with dynamic elements. However, these methods often require expensive hardware, take days to process a second-long video and do not scale well to longer videos. We create an end-to-end pipeline for creating dynamic 3D video from a monocular video that can be run on consumer hardware in minutes per second of footage, not days. Our pipeline handles the estimation of the camera parameters, depth maps, 3D reconstruction of dynamic foreground and static background elements, and the rendering of the 3D video on a computer or VR headset. We use a state-of-the-art visual transformer model to estimate depth maps which we use to scale COLMAP poses and enable RGB-D fusion with estimated depth data. In our preliminary experiments, we rendered the output in a VR headset and visually compared the method against ground-truth datasets and state-of-the-art NeRF-based methods. 
    more » « less
  4. Neural Radiance Field (NeRF) approaches learn the underlying 3D representation of a scene and generate photorealistic novel views with high fidelity. However, most proposed settings concentrate on modelling a single object or a single level of a scene. However, in the real world, we may capture a scene at multiple levels, resulting in a layered capture. For example, tourists usually capture a monument’s exterior structure before capturing the inner structure. Modelling such scenes in 3D with seamless switching between levels can drastically improve immersive experiences. However, most existing techniques struggle in modelling such scenes. We propose Strata-NeRF, a single neural radiance field that implicitly captures a scene with multiple levels. Strata-NeRF achieves this by conditioning the NeRFs on Vector Quantized (VQ) latent representations which allow sudden changes in scene structure. We evaluate the effectiveness of our approach in multi-layered synthetic dataset comprising diverse scenes and then further validate its generalization on the real-world RealEstate 10k dataset. We find that Strata-NeRF effectively captures stratified scenes, minimizes artifacts, and synthesizes high-fidelity views compared to existing approaches. 
    more » « less
  5. Methods based on upward canopy gap fractions are widely employed to measure in-situ effective LAI (Le) as an alternative to destructive sampling. However, these measurements are limited to point-level and are not practical for scaling up to larger areas. To address the point-to-landscape gap, this study introduces an innovative approach, named NeRF-LAI, for corn and soybean Le estimation that combines gap-fraction theory with the neural radiance field (NeRF) technology, an emerging neural network-based method for implicitly representing 3D scenes using multi-angle 2D images. The trained NeRF-LAI can render downward photorealistic hemispherical depth images from an arbitrary viewpoint in the 3D scene, and then calculate gap fractions to estimate Le. To investigate the intrinsic difference between upward and downward gaps estimations, initial tests on virtual corn fields demonstrated that the downward Le matches well with the upward Le, and the viewpoint height is insensitive to Le estimation for a homogeneous field. Furthermore, we conducted intensive real-world experiments at controlled plots and farmer-managed fields to test the effectiveness and transferability of NeRF-LAI in real-world scenarios, where multi-angle UAV oblique images from different phenological stages were collected for corn and soybeans. Results showed the NeRF-LAI is able to render photorealistic synthetic images with an average peak signal-to-noise ratio (PSNR) of 18.94 for the controlled corn plots and 19.10 for the controlled soybean plots. We further explored three methods to estimate Le from calculated gap fractions: the 57.5° method, the five-ring-based method, and the cell-based method. Among these, the cell-based method achieved the best performance, with the r2 ranging from 0.674 to 0.780 and RRMSE ranging from 1.95 % to 5.58 %. The Le estimates are sensitive to viewpoint height in heterogeneous fields due to the difference in the observable foliage volume, but they exhibit less sensitivity to relatively homogeneous fields. Additionally, the cross-site testing for pixel-level LAI mapping showed the NeRF-LAI significantly outperforms the VI-based models, with a small variation of RMSE (0.71 to 0.95 m2/m2) for spatial resolution from 0.5 m to 2.0 m. This study extends the application of gap fraction-based Le estimation from a discrete point scale to a continuous field scale by leveraging implicit 3D neural representations learned by NeRF. The NeRF-LAI method can map Le from raw multi-angle 2D images without prior information, offering a potential alternative to the traditional in-situ plant canopy analyzer with a more flexible and efficient solution. 
    more » « less