skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Can Shadows Reveal Biometric Informationƒ
We study the problem of extracting biometric informa- tion of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum like- lihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, exploit- ing the subtle cues in the shadows that are the source of the leakage without requiring any labeled real data. In par- ticular, our approach relies on building synthetic scenes composed of 3D face models obtained from a single photo- graph of each identity. We transfer what we learn from the synthetic data to the real data using domain adaptation in a completely unsupervised way. Our model is able to general- ize well to the real domain and is robust to several variations in the scenes. We report high classification accuracies in an identity classification task that takes place in a scene with unknown geometry and occluding objects.  more » « less
Award ID(s):
1816209
PAR ID:
10482933
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proc. Winter Conf. Appl. Comp. Vision (WACV)
ISSN:
2642-9381
ISBN:
978-1-6654-9346-8
Page Range / eLocation ID:
869 to 879
Format(s):
Medium: X
Location:
Waikoloa, HI, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present a novel technique for transcribing crowds in video scenes that allows extracting the positions of moving objects in video frames. The technique can be used as a more precise alternative to image processing methods, such as background-removal or automated pedestrian detection based on feature extraction and classification. By manually projecting pedestrian actors on a two-dimensional plane and translating screen coordinates to absolute real-world positions using the cross ratio, we provide highly accurate and complete results at the cost of increased processing time. We are able to completely avoid most errors found in other automated annotation techniques, resulting from sources such as noise, occlusion, shadows, view angle or the density of pedestrians. It is further possible to process scenes that are difficult or impossible to transcribe by automated image processing methods, such as low-contrast or low-light environments. We validate our model by comparing it to the results of both background-removal and feature extraction and classification in a variety of scenes. 
    more » « less
  2. A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection. 
    more » « less
  3. Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics. 
    more » « less
  4. Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data. 
    more » « less
  5. null (Ed.)
    It is generally accepted that relatively more permanent (i.e., more temporally persistent) traits are more valuable for biometric performance than less permanent traits. Although this finding is intuitive, there is no current work identifying exactly where in the biometric analysis temporal persistence makes a difference. In this paper, we answer this question. In a recent report, we introduced the intraclass correlation coefficient (ICC) as an index of temporal persistence for such features. Here, we present a novel approach using synthetic features to study which aspects of a biometric identification study are influenced by the temporal persistence of features. What we show is that using more temporally persistent features produces effects on the similarity score distributions that explain why this quality is so key to biometric performance. The results identified with the synthetic data are largely reinforced by an analysis of two datasets, one based on eye-movements and one based on gait. There was one difference between the synthetic and real data, related to the intercorrelation of features in real data. Removing these intercorrelations for real datasets with a decorrelation step produced results which were very similar to that obtained with synthetic features. 
    more » « less