skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Target tracking and classification using compressive sensing camera for SWIR videos
The pixel-wise code exposure (PCE) camera is a compressive sensing camera that has several advantages, such as low power consumption and high compression ratio.Moreover, one notable advantage is the capability to control individual pixel exposure time. Conventional approaches of using PCE cameras involve a time-consuming and lossy process to reconstruct the original frames and then use those frames for target tracking and classification. Otherwise, conventional approaches will fail if compressive measurements are used. In this paper, we present a deep learning approach that directly performs target tracking and classification in the compressive measurement domain without any frame reconstruction. Our approach has two parts: tracking and classification. The tracking has been done via detection using You Only Look Once (YOLO), and the classification is achieved using residual network (ResNet). Extensive simulations using short-wave infrared (SWIR) videos demonstrated the efficacy of our proposed approach.  more » « less
Award ID(s):
1824198
PAR ID:
10108155
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Signal, Image and Video Processing
ISSN:
1863-1703
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Compressive video measurements can save bandwidth and data storage. However, conventional approaches to target detection require the compressive measurements to be reconstructed before any detectors are applied. This is not only time consuming but also may lose information in the reconstruction process. In this paper, we summarized the application of a recent approach to vehicle detection and classification directly in the compressive measurement domain to human targets. The raw videos were collected using a pixel-wise code exposure (PCE) camera, which condensed multiple frames into one frame. A combination of two deep learning-based algorithms (you only look once (YOLO) and residual network (ResNet)) was used for detection and confirmation. Optical and mid-wave infrared (MWIR) videos from a well-known database (SENSIAC) were used in our experiments. Extensive experiments demonstrated that the proposed framework was feasible for target detection up to 1500 m, but target confirmation needs more research. 
    more » « less
  2. Efficient and low-energy camera signal processing is critical for battery-supported sensing and surveillance applications. In this research, we develop a video object detection and tracking framework which adaptively down-samples frame pixels to minimize computation and memory costs, and thereby the energy consumed, while maintaining a high level of accuracy. Instead of always operating with the highest sensor pixel resolution (compute-intensive), video frame (pixel) content is down-sampled spatially, to adapt to changing camera environments (size of object tracked, peak-signal-tonoise- ratio (i.e, PSNR) of video frames). Object detection and tracking is supported by a novel video resolution-aware adaptive hyperdimensional computing framework. This leverages a low memory overhead non-linear hypervector encoding scheme specifically tailored for handling multiple degrees of resolution. Previous classification decisions of a moving object based on its tracking label are used to improve tracking robustness. Energy savings of up to 1.6 orders of magnitude and up to an order of magnitude compute speedup is obtained on a range of experiments performed on benchmark systems. 
    more » « less
  3. SUMMARY An efficient method for tracking a target using a single Pan-Tilt-Zoom (PTZ) camera is proposed. The proposed Scale-Invariant Optical Flow (SIOF) method estimates the motion of the target and rotates the camera accordingly to keep the target at the center of the image. Also, SIOF estimates the scale of the target and changes the focal length relatively to adjust the Field of View (FoV) and keep the target appear in the same size in all captured frames. SIOF is a feature-based tracking method. Feature points used are extracted and tracked using Optical Flow (OF) and Scale-Invariant Feature Transform (SIFT). They are combined in groups and used to achieve robust tracking. The feature points in these groups are used within a twist model to recover the 3D free motion of the target. The merits of this proposed method are (i) building an efficient scale-invariant tracking method that tracks the target and keep it in the FoV of the camera with the same size, and (ii) using tracking with prediction and correction to speed up the PTZ control and achieve smooth camera control. Experimental results were performed on online video streams and validated the efficiency of the proposed method SIOF, comparing with OF, SIFT, and other tracking methods. The proposed SIOF has around 36% less average tracking error and around 70% less tracking overshoot than OF. 
    more » « less
  4. null (Ed.)
    Over the last few years, camera-based estimation of vital signs referred to as imaging photoplethysmography (iPPG) has garnered significant attention due to the relative simplicity, ease, unobtrusiveness and flexibility offered by such measurements. It is expected that iPPG may be integrated into a host of emerging applications in areas as diverse as autonomous cars, neonatal monitoring, and telemedicine. In spite of this potential, the primary challenge of non-contact camera-based measurements is the relative motion between the camera and the subjects. Current techniques employ 2D feature tracking to reduce the effect of subject and camera motion but they are limited to handling translational and in-plane motion. In this paper, we study, for the first-time, the utility of 3D face tracking to allow iPPG to retain robust performance even in presence of out-of-plane and large relative motions. We use a RGB-D camera to obtain 3D information from the subjects and use the spatial and depth information to fit a 3D face model and track the model over the video frames. This allows us to estimate correspondence over the entire video with pixel-level accuracy, even in the presence of out-of-plane or large motions. We then estimate iPPG from the warped video data that ensures per-pixel correspondence over the entire window-length used for estimation. Our experiments demonstrate improvement in robustness when head motion is large. 
    more » « less
  5. It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this “leap of faith” that deep learning models that work well on images will also work well on videos is actually flawed.We show that even when a video camera is viewing a scene that is not changing in any humanperceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but are perceived quite differently by the video analytics applications.We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an “unintentional adversary” because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. Our experiments with a number of different cameras, and a variety of different video analytics tasks, show that the inadvertent adversarial effect from the camera can be noticeably offset by quickly re-training the deep learning models using transfer learning. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects (∼40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera’s adversarial effect on deep learning models used for video analytics applications. 
    more » « less