Significance: Non-contact, camera-based heart rate variability estimation is desirable in numerous applications, including medical, automotive, and entertainment. Unfortunately, camera-based HRV accuracy and reliability suffer due to two challenges: (a) darker skin tones result in lower SNR and (b) relative motion induces measurement artifacts. Aim: We propose an algorithm HRVCam that provides sufficient robustness to low SNR and motion-induced artifacts commonly present in imaging photoplethysmography (iPPG) signals. Approach: HRVCam computes camera-based HRV from the instantaneous frequency of the iPPG signal. HRVCam uses automatic adaptive bandwidth filtering along with discrete energy separation to estimate the instantaneous frequency. The parameters of HRVCam use the observed characteristics of HRV and iPPG signals. Results: We capture a new dataset containing 16 participants with diverse skin tones. We demonstrate that HRVCam reduces the error in camera-based HRV metrics significantly (more than 50% reduction) for videos with dark skin and face motion. Conclusion: HRVCam can be used on top of iPPG estimation algorithms to provide robust HRV measurements making camera-based HRV practical. Keywords: heart rate variability, imaging photoplethysmography, noncontact HRV, pulse frequency demodulation
more »
« less
RobustPPG: camera-based robust heart rate estimation using motion cancellation
Camera-based heart rate measurement is becoming an attractive option as a non-contact modality for continuous remote health and engagement monitoring. However, reliable heart rate extraction from camera-based measurement is challenging in realistic scenarios, especially when the subject is moving. In this work, we develop a motion-robust algorithm, labeled RobustPPG, for extracting photoplethysmography signals (PPG) from face video and estimating the heart rate. Our key innovation is to explicitly model and generate motion distortions due to the movements of the person’s face. We use inverse rendering to obtain the 3D shape and albedo of the face and environment lighting from video frames and then render the human face for each frame. The rendered face is similar to the original face but does not contain the heart rate signal; facial movements alone cause pixel intensity variation in the generated video frames. Finally, we use the generated motion distortion to filter the motion-induced measurements. We demonstrate that our approach performs better than the state-of-the-art methods in extracting a clean blood volume signal with over 2 dB signal quality improvement and 30% improvement in RMSE of estimated heart rate in intense motion scenarios.
more »
« less
- Award ID(s):
- 1730574
- PAR ID:
- 10397301
- Date Published:
- Journal Name:
- Biomedical Optics Express
- Volume:
- 13
- Issue:
- 10
- ISSN:
- 2156-7085
- Page Range / eLocation ID:
- 5447
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Over the last few years, camera-based estimation of vital signs referred to as imaging photoplethysmography (iPPG) has garnered significant attention due to the relative simplicity, ease, unobtrusiveness and flexibility offered by such measurements. It is expected that iPPG may be integrated into a host of emerging applications in areas as diverse as autonomous cars, neonatal monitoring, and telemedicine. In spite of this potential, the primary challenge of non-contact camera-based measurements is the relative motion between the camera and the subjects. Current techniques employ 2D feature tracking to reduce the effect of subject and camera motion but they are limited to handling translational and in-plane motion. In this paper, we study, for the first-time, the utility of 3D face tracking to allow iPPG to retain robust performance even in presence of out-of-plane and large relative motions. We use a RGB-D camera to obtain 3D information from the subjects and use the spatial and depth information to fit a 3D face model and track the model over the video frames. This allows us to estimate correspondence over the entire video with pixel-level accuracy, even in the presence of out-of-plane or large motions. We then estimate iPPG from the warped video data that ensures per-pixel correspondence over the entire window-length used for estimation. Our experiments demonstrate improvement in robustness when head motion is large.more » « less
-
In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results.more » « less
-
null (Ed.)Attention networks perform well on diverse computer vision tasks. The core idea is that the signal of interest is stronger in some pixels ("foreground"), and by selectively focusing computation on these pixels, networks can extract subtle information buried in noise and other sources of corruption. Our paper is based on one key observation: in many real-world applications, many sources of corruption, such as illumination and motion, are often shared between the "foreground" and the "background" pixels. Can we utilize this to our advantage? We propose the utility of inverse attention networks, which focus on extracting information about these shared sources of corruption. We show that this helps to effectively suppress shared covariates and amplify signal information, resulting in improved performance. We illustrate this on the task of camera-based physiological measurement where the signal of interest is weak and global illumination variations and motion act as significant shared sources of corruption. We perform experiments on three datasets and show that our approach of inverse attention produces state-of-the-art results, increasing the signal-to-noise ratio by up to 5.8 dB, reducing heart rate and breathing rate estimation errors by as much as 30 %, recovering subtle waveform dynamics, and generalizing from RGB to NIR videos without retraining.more » « less
-
null (Ed.)Imaging photoplethysmography (iPPG) could greatly improve driver safety systems by enabling capabilities ranging from identifying driver fatigue to unobtrusive early heart failure detection. Unfortunately, the driving context poses unique challenges to iPPG, including illumination and motion. First, drastic illumination variations present during driving can overwhelm the small intensity-based iPPG signals. Second, significant driver head motion during driving, as well as camera motion (e.g., vibration) make it challenging to recover iPPG signals. To address these two challenges, we present two innovations. First, we demonstrate that we can reduce most outside light variations using narrow-band near-infrared (NIR) video recordings and obtain reliable heart rate estimates. Second, we present a novel optimization algorithm, which we call AutoSparsePPG, that leverages the quasi-periodicity of iPPG signals and achieves better performance than the state-of-the-art methods. In addition, we release the first publicly available driving dataset that contains both NIR and RGB video recordings of a passenger's face with simultaneous ground truth pulse oximeter recordings.more » « less
An official website of the United States government

