skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Resource Aware Person Re-identification across Multiple Resolutions
Not all people are equally easy to identify: color statistics might be enough for some cases while others might re- quire careful reasoning about high- and low-level details. However, prevailing person re-identification(re-ID) meth- ods use one-size-fits-all high-level embeddings from deep convolutional networks for all cases. This might limit their accuracy on difficult examples or makes them needlessly ex- pensive for the easy ones. To remedy this, we present a new person re-ID model that combines effective embeddings built on multiple convolutional network layers, trained with deep-supervision. On traditional re-ID benchmarks, our method improves substantially over the previous state-of- the-art results on all five datasets that we evaluate on. We then propose two new formulations of the person re- ID problem under resource-constraints, and show how our model can be used to effectively trade off accuracy and computation in the presence of resource constraints.  more » « less
Award ID(s):
1724282
PAR ID:
10064657
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
CVPR 2018
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    With the goal of supporting real-time AI-based agents to facilitate student collaboration, as well as to enable educational data-mining of group discussions, multimodal classroom analytics, and social network analysis, we investigate how to identify who-is-where-when in classroom videos. We take a person re-identification ( re-id ) approach, and we explore different methods of improving re-id accuracy in the challenging environments of school classrooms. Our results on a multi-grade classroom (MGC) dataset suggest that (1) fine-tuning off-the-shelf person re-id models such as AGW can deliver sizable accuracy gains (from 70.4\\% to 76.7\\% accuracy); (2) clustering, rather than nearest-neighbor identification, can yield accuracy improvements (76.7\\% to 79.4\\%) of identifying each detected person, especially when structural constraints are imposed; and (3) there is a strong benefit to re-id accuracy in obtaining multiple enrollment images from each student. 
    more » « less
  2. Long-term surveillance applications often involve having to re-identify individuals over several days or weeks. The task is made even more challenging with the lack of sufficient visibility of the subjects faces. We address this problem by modeling the wardrobe of individuals using discriminative features and labels extracted from their clothing information from video sequences. In contrast to previous person re-id works, we exploit that people typically own a limited amount of clothing and that knowing a person's wardrobe can be used as a soft-biometric to distinguish identities. We a) present a new dataset consisting of more than 70,000 images recorded over 30 days of 25 identities; b) model clothing features using CNNs that minimize intra-garments variations while maximizing inter-garments differences; and c) build a reference wardrobe model that captures each persons set of clothes that can be used for re-id. We show that these models open new perspectives to long-term person re-id problem using clothing information. 
    more » « less
  3. In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs. 
    more » « less
  4. In this paper we investigate image classification with computational resource lim- its at test time. Two such settings are: 1. anytime classification, where the net- work’s prediction for a test example is progressively updated, facilitating the out- put of a prediction at any time; and 2. budgeted batch classification, where a fixed amount of computation is available to classify a set of examples that can be spent unevenly across “easier” and “harder” inputs. In contrast to most prior work, such as the popular Viola and Jones algorithm, our approach is based on convolutional neural networks. We train multiple classifiers with varying resource demands, which we adaptively apply during test time. To maximally re-use computation between the classifiers, we incorporate them as early-exits into a single deep convolutional neural network and inter-connect them with dense connectivity. To facilitate high quality classification early on, we use a two-dimensional multi-scale network architecture that maintains coarse and fine level features all-throughout the network. Experiments on three image-classification tasks demonstrate that our framework substantially improves the existing state-of-the-art in both settings. 
    more » « less
  5. Emotions significantly impact human physical and mental health, and, therefore, emotion recognition has been a popular research area in neuroscience, psychology, and medicine. In this paper, we preprocess the raw signals acquired by millimeter-wave radar to obtain high-quality heartbeat and respiration signals. Then, we propose a deep learning model incorporating a convolutional neural network and gated recurrent unit neural network in combination with human face expression images. The model achieves a recognition accuracy of 84.5% in person-dependent experiments and 74.25% in person-independent experiments. The experiments show that it outperforms a single deep learning model compared to traditional machine learning algorithms. 
    more » « less