skip to main content


Title: BioFace-3D: continuous 3d facial reconstruction through lightweight single-ear biosensors
Over the last decade, facial landmark tracking and 3D reconstruction have gained considerable attention due to their numerous applications such as human-computer interactions, facial expression analysis, and emotion recognition, etc. Traditional approaches require users to be confined to a particular location and face a camera under constrained recording conditions (e.g., without occlusions and under good lighting conditions). This highly restricted setting prevents them from being deployed in many application scenarios involving human motions. In this paper, we propose the first single-earpiece lightweight biosensing system, BioFace-3D, that can unobtrusively, continuously, and reliably sense the entire facial movements, track 2D facial landmarks, and further render 3D facial animations. Our single-earpiece biosensing system takes advantage of the cross-modal transfer learning model to transfer the knowledge embodied in a high-grade visual facial landmark detection model to the low-grade biosignal domain. After training, our BioFace-3D can directly perform continuous 3D facial reconstruction from the biosignals, without any visual input. Without requiring a camera positioned in front of the user, this paradigm shift from visual sensing to biosensing would introduce new opportunities in many emerging mobile and IoT applications. Extensive experiments involving 16 participants under various settings demonstrate that BioFace-3D can accurately track 53 major facial landmarks with only 1.85 mm average error and 3.38\% normalized mean error, which is comparable with most state-of-the-art camera-based solutions. The rendered 3D facial animations, which are in consistency with the real human facial movements, also validate the system's capability in continuous 3D facial reconstruction.  more » « less
Award ID(s):
2132112
NSF-PAR ID:
10377880
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (MobiCom '21)
Page Range / eLocation ID:
350 to 363
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Single image 3D face reconstruction with accurate geometric details is a critical and challenging task due to the similar appearance on the face surface and fine details in organs. In this work, we introduce a self-supervised 3D face reconstruction approach from a single image that can recover detailed textures under different camera settings. The proposed network learns high-quality disparity maps from stereo face images during the training stage, while just a single face image is required to generate the 3D model in real applications. To recover fine details of each organ and facial surface, the framework introduces facial landmark spatial consistency to constrain the face recovering learning process in local point level and segmentation scheme on facial organs to constrain the correspondences at the organ level. The face shape and textures will further be refined by establishing holistic constraints based on the varying light illumination and shading information. The proposed learning framework can recover more accurate 3D facial details both quantitatively and qualitatively compared with state-of-the-art 3DMM and geometry-based reconstruction algorithms based on a single image. 
    more » « less
  2. Low-cost 3D scanners and automatic photogrammetry software have brought digitization of objects into 3D models to the level of the consumer. However, the digitization techniques are either tedious, disruptive to the scanned object, or expensive. We create a novel 3D scanning system using consumer grade hardware that revolves a camera around the object of interest. Our approach does not disturb the object during capture and allows us to scan delicate objects that can deform under motion, such as potted plants. Our system consists of a Raspberry Pi camera and computer, stepper motor, 3D printed camera track, and control software. Our 3D scanner allows the user to gather image sets for 3D model reconstruction using photogrammetry software with minimal effort. We scale 3D scanning to objects of varying sizes by designing our scanner using programmatic modeling, and allowing the user to change the physical dimensions of the scanner without redrawing each part. 
    more » « less
  3. We present an end-to-end method for capturing the dynamics of 3D human characters and translating them for synthesizing new, visually-realistic motion sequences. Conventional methods employ sophisticated, but generic, control approaches for driving the joints of articulated characters, paying little attention to the distinct dynamics of human joint movements. In contrast, our approach attempts to synthesize human-like joint movements by exploiting a biologically-plausible, compact network of spiking neurons that drive joint control in primates and rodents. We adapt the controller architecture by introducing learnable components and propose an evolutionary algorithm for training the spiking neural network architectures and capturing diverse joint dynamics. Our method requires only a few samples for capturing the dynamic properties of a joint's motion and exploits the biologically-inspired, trained controller for its reconstruction. More importantly, it can transfer the captured dynamics to new visually-plausible motion sequences. To enable user-dependent tailoring of the resulting motion sequences, we develop an interactive framework that allows for editing and real-time visualization of the controlled 3D character. We also demonstrate the applicability of our method to real human motion capture data by learning the hand joint dynamics from a gesture dataset and using our framework to reconstruct the gestures with our 3D animated character. The compact architecture of our joint controller emerging from its biologically-realistic design, and the inherent capacity of our evolutionary learning algorithm for parallelization, suggest that our approach could provide an efficient and scalable alternative for synthesizing 3D character animations with diverse and visually-realistic motion dynamics.

     
    more » « less
  4. This work is motivated by the need to automate the analysis of parent-infant interactions to better understand the existence of any potential behavioral patterns useful for the early diagnosis of autism spectrum disorder (ASD). It presents an approach for synthesizing the facial expression exchanges that occur during parent-infant interactions. This is accomplished by developing a novel approach that uses landmarks when synthesizing changing facial expressions. The proposed model consists of two components: (i) The first is a landmark converter that receives a set of facial landmarks and the target emotion as input and outputs a set of new landmarks transformed to match the emotion. (ii) The second component involves an image converter that takes in an input image, a target landmark and a target emotion and outputs a face transformed to match the input emotion. The inclusion of landmarks in the generation process proves useful in the generation of baby facial expressions; babies have somewhat different facial musculature and facial dynamics than adults. This paper presents a realistic-looking matrix of changing facial expressions sampled from a 2-D emotion continuum (valence and arousal) and displays successfully transferred facial expressions from real-life mother-infant dyads to novel ones. 
    more » « less
  5. Heatmap regression-based models have significantly advanced the progress of facial landmark detection. However, the lack of structural constraints always generates inaccurate heatmaps resulting in poor landmark detection performance. While hierarchical structure modeling methods have been proposed to tackle this issue, they all heavily rely on manually designed tree structures. The designed hierarchical structure is likely to be completely corrupted due to the missing or inaccurate prediction of landmarks. To the best of our knowledge, in the context of deep learning, no work before has investigated how to automatically model proper structures for facial landmarks, by discovering their inherent relations. In this paper, we propose a novel Hierarchical Structured Landmark Ensemble (HSLE) model for learning robust facial landmark detection, by using it as the structural constraints. Different from existing approaches of manually designing structures, our proposed HSLE model is constructed automatically via discovering the most robust patterns so HSLE has the ability to robustly depict both local and holistic landmark structures simultaneously. Our proposed HSLE can be readily plugged into any existing facial landmark detection baselines for further performance improvement. Extensive experimental results demonstrate our approach significantly outperforms the baseline by a large margin to achieve a state-of-the-art performance. 
    more » « less