Reconstructing 3D objects in natural environments requires solving the ill-posed problem of geometry, spatially-varying material, and lighting estimation. As such, many approaches impractically constrain to a dark environment, use controlled lighting rigs, or use few handheld captures but suffer reduced quality. We develop a method that uses just two smartphone exposures captured in ambient lighting to reconstruct appearance more accurately and practically than baseline methods. Our insight is that we can use a flash/no-flash RGB-D pair to pose an inverse rendering problem using point lighting. This allows efficient differentiable rendering to optimize depth and normals from a good initialization and so also the simultaneous optimization of diffuse environment illumination and SVBRDF material. We find that this reduces diffuse albedo error by 25%, specular error by 46%, and normal error by 30% against single and paired-image baselines that use learning-based techniques. Given that our approach is practical for everyday solid objects, we enable photorealistic relighting for mobile photography and easier content creation for augmented reality.
more »
« less
End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images
Recovering 3D face models from in-the-wild face images has numerous potential applications. However, properly modeling complex lighting effects in reality, including specular lighting, shadows, and occlusions, from a single in-the-wild face image is still considered as a widely open research challenge. In this paper, we propose a convolutional neural network based framework to regress the face model from a single image in the wild. The outputted face model includes dense 3D shape, head pose, expression, diffuse albedo, specular albedo, and the corresponding lighting conditions. Our approach uses novel hybrid loss functions to disentangle face shape identities, expressions, poses, albedos, and lighting. Besides a carefully designed ablation study, we also conduct direct comparison experiments to show that our method can outperform state-of-art methods both quantitatively and qualitatively.
more »
« less
- Award ID(s):
- 2005430
- PAR ID:
- 10463606
- Date Published:
- Journal Name:
- Proceeding of ACM International Conference on Multimedia 2022
- Page Range / eLocation ID:
- 4694 to 4703
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent work on 3D-aware image synthesis has achieved compelling results using advances in neural rendering. However, 3D-aware synthesis of face dynamics hasn't received much attention. Here, we study how to explicitly control generative model synthesis of face dynamics exhibiting non-rigid motion (e.g., facial expression change), while simultaneously ensuring 3D-awareness. For this we propose a Controllable Radiance Field (CoRF): 1) Motion control is achieved by embedding motion features within the layered latent motion space of a style-based generator; 2) To ensure consistency of background, motion features and subject-specific attributes such as lighting, texture, shapes, albedo, and identity, a face parsing net, a head regressor and an identity encoder are incorporated. On head image/video data we show that CoRFs are 3D-aware while enabling editing of identity, viewing directions, and motion.more » « less
-
Camera-based heart rate measurement is becoming an attractive option as a non-contact modality for continuous remote health and engagement monitoring. However, reliable heart rate extraction from camera-based measurement is challenging in realistic scenarios, especially when the subject is moving. In this work, we develop a motion-robust algorithm, labeled RobustPPG, for extracting photoplethysmography signals (PPG) from face video and estimating the heart rate. Our key innovation is to explicitly model and generate motion distortions due to the movements of the person’s face. We use inverse rendering to obtain the 3D shape and albedo of the face and environment lighting from video frames and then render the human face for each frame. The rendered face is similar to the original face but does not contain the heart rate signal; facial movements alone cause pixel intensity variation in the generated video frames. Finally, we use the generated motion distortion to filter the motion-induced measurements. We demonstrate that our approach performs better than the state-of-the-art methods in extracting a clean blood volume signal with over 2 dB signal quality improvement and 30% improvement in RMSE of estimated heart rate in intense motion scenarios.more » « less
-
Face recognition (FR) systems are fast becoming ubiquitous. However, differential performance among certain demographics was identified in several widely used FR models. The skin tone of the subject is an important factor in addressing the differential performance. Previous work has used modeling methods to propose skin tone measures of subjects across different illuminations or utilized subjective labels of skin color and demographic information. However, such models heavily rely on consistent background and lighting for calibration, or utilize labeled datasets, which are time-consuming to generate or are unavailable. In this work, we have developed a novel and data-driven skin color measure capable of accurately representing subjects' skin tone from a single image, without requiring a consistent background or illumination. Our measure leverages the dichromatic reflection model in RGB space to decompose skin patches into diffuse and specular bases.more » « less
-
Single image 3D face reconstruction with accurate geometric details is a critical and challenging task due to the similar appearance on the face surface and fine details in organs. In this work, we introduce a self-supervised 3D face reconstruction approach from a single image that can recover detailed textures under different camera settings. The proposed network learns high-quality disparity maps from stereo face images during the training stage, while just a single face image is required to generate the 3D model in real applications. To recover fine details of each organ and facial surface, the framework introduces facial landmark spatial consistency to constrain the face recovering learning process in local point level and segmentation scheme on facial organs to constrain the correspondences at the organ level. The face shape and textures will further be refined by establishing holistic constraints based on the varying light illumination and shading information. The proposed learning framework can recover more accurate 3D facial details both quantitatively and qualitatively compared with state-of-the-art 3DMM and geometry-based reconstruction algorithms based on a single image.more » « less
An official website of the United States government

