skip to main content


Title: Design and FPGA Implementation of an Adaptive Video Subsampling Algorithm for Energy-Efficient Single Object Tracking
Image sensors with programmable region-of-interest (ROI) readout are a new sensing technology important for energyefficient embedded computer vision. In particular, ROIs can subsample the number of pixels being readout while performing single object tracking in a video. In this paper, we develop adaptive sampling algorithms which perform joint object tracking and predictive video subsampling. We utilize an object detection consisting of either mean shift tracking or a neural network, coupled with a Kalman filter for prediction. We show that our algorithms achieve mean average precision of 0.70 or higher on a dataset of 20 videos in software. Further, we implement hardware acceleration of mean shift tracking with Kalman filter adaptive subsampling on an FPGA. Hardware results show a 23× improvement in clock cycles and latency as compared to baseline methods and achieves 38FPS real-time performance. This research points to a new domain of hardware-software co-design for adaptive video subsampling in embedded computer vision.  more » « less
Award ID(s):
1909663
NSF-PAR ID:
10157898
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Image Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. 1. Description of the objectives and motivation for the contribution to ECE education The demand for wireless data transmission capacity is increasing rapidly and this growth is expected to continue due to ongoing prevalence of cellular phones and new and emerging bandwidth-intensive applications that encompass high-definition video, unmanned aerial systems (UAS), intelligent transportation systems (ITS) including autonomous vehicles, and others. Meanwhile, vital military and public safety applications also depend on access to the radio frequency spectrum. To meet these demands, the US federal government is beginning to move from the proven but inefficient model of exclusive frequency assignments to a more-efficient, shared-spectrum approach in some bands of the radio frequency spectrum. A STEM workforce that understands the radio frequency spectrum and applications that use the spectrum is needed to further increase spectrum efficiency and cost-effectiveness of wireless systems over the next several decades to meet anticipated and unanticipated increases in wireless data capacity. 2. Relevant background including literature search examples if appropriate CISCO Systems’ annual survey indicates continued strong growth in demand for wireless data capacity. Meanwhile, undergraduate electrical and computer engineering courses in communication systems, electromagnetics, and networks tend to emphasize mathematical and theoretical fundamentals and higher-layer protocols, with less focus on fundamental concepts that are more specific to radio frequency wireless systems, including the physical and media access control layers of wireless communication systems and networks. An efficient way is needed to introduce basic RF system and spectrum concepts to undergraduate engineering students in courses such as those mentioned above who are unable to, or had not planned to take a full course in radio frequency / microwave engineering or wireless systems and networks. We have developed a series of interactive online modules that introduce concepts fundamental to wireless communications, the radio frequency spectrum, and spectrum sharing, and seek to present these concepts in context. The modules include interactive, JavaScript-based simulation exercises intended to reinforce the concepts that are presented in the modules through narrated slide presentations, text, and external links. Additional modules in development will introduce advanced undergraduate and graduate students and STEM professionals to configuration and programming of adaptive frequency-agile radios and spectrum management systems that can operate efficiently in congested radio frequency environments. Simulation exercises developed for the advanced modules allow both manual and automatic control of simulated radio links in timed, game-like simulations, and some exercises will enable students to select from among multiple pre-coded controller strategies and optionally edit the code before running the timed simulation. Additionally, we have developed infrastructure for running remote laboratory experiments that can also be embedded within the online modules, including a web-based user interface, an experiment management framework, and software defined radio (SDR) application software that runs in a wireless testbed initially developed for research. Although these experiments rely on limited hardware resources and introduce additional logistical considerations, they provide additional realism that may further challenge and motivate students. 3. Description of any assessment methods used to evaluate the effectiveness of the contribution, Each set of modules is preceded and followed by a survey. Each individual module is preceded by a quiz and followed by another quiz, with pre- and post-quiz questions drawn from the same pool. The pre-surveys allow students to opt in or out of having their survey and quiz results used anonymously in research. 4. Statement of results. The initial modules have been and are being used by three groups of students: (1) students in an undergraduate Introduction to Communication Systems course; (2) an interdisciplinary group of engineering students, including computer science students, who are participating in related undergraduate research project; and (3) students in a graduate-level communications course that includes both electrical and computer engineers. Analysis of results from the first group of students showed statistically significant increases from pre-quiz to post-quiz for each of four modules on fundamental wireless communication concepts. Results for the other students have not yet been analyzed, but also appear to show substantial pre-quiz to post-quiz increases in mean scores. 
    more » « less
  2. Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. Despite several recent efforts, we show that existing solutions suffer from two major drawbacks. First , while mobile devices or systems-on-chips (SOCs) usually come with limited resources including battery power, most systems do not consider the energy consumption of the models during inference. Second , they do not consider the interplay between the three metrics of interest in their configurations, namely, latency, accuracy, and energy. In this work, we propose an efficient and adaptive video object detection system — Virtuoso , which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. We position this work as a first step in understanding the suitability of various object detection kernels on embedded boards in the accuracy-latency-energy axes, opening the door for further development in solutions customized to embedded systems and for benchmarking such solutions. Virtuoso is able to achieve up to 286 FPS on the NVIDIA Jetson AGX Xavier board, which is up to 45 times faster than the baseline EfficientDet D3 and 15 times faster than the baseline EfficientDet D0. In addition, we also observe up to 97.2% energy reduction using Virtuoso compared to the baseline YOLO (v3) — a widely used object detector designed for mobiles. To fairly compare with Virtuoso , we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN) [NeurIPS’15], YOLO v3 [CVPR’16], SSD [ECCV’16], EfficientDet [CVPR’20], SELSA [ICCV’19], MEGA [CVPR’20], REPP [IROS’20], FastAdapt [EMDL’21], and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%. 
    more » « less
  3. null (Ed.)
    The advent of pervasive autonomous systems such as self-driving cars and drones has raised questions about their safety and trustworthiness. This is particularly relevant in the event of on-board subsystem errors or failures. In this research, we show how encoded Extended Kalman Filter can be used to detect anomalous behaviors of critical components of nonlinear autonomous systems: sensors, actuators, state estimation algorithms and control software. As opposed to prior work that is limited to linear systems or requires the use of cumbersome machine learned checks with fixed detection thresholds, the proposed approach necessitates the use of time-varying checks with dynamically adaptive thresholds. The method is lightweight in comparison to existing methods (does not rely on machine learning paradigms) and achieves high coverage as well as low detection latency of errors. A quadcopter and an automotive steer-by-wire system are used as test vehicles for the research and simulation and hardware results indicate the overhead, coverage and error detection latency benefits of the proposed approach. 
    more » « less
  4. Most of the current solutions for autonomous flights in indoor environments rely on purely geometric maps (e.g., point clouds). There has been, however, a growing interest in supplementing such maps with semantic information (e.g., object detections) using computer vision algorithms. Unfortunately, there is a disconnect between the relatively heavy computational requirements of these computer vision solutions, and the limited computation capacity available on mobile autonomous platforms. In this paper, we propose to bridge this gap with a novel Markov Decision Process framework that adapts the parameters of the vision algorithms to the incoming video data rather than fixing them a priori. As a concrete example, we test our framework on a object detection and tracking task, showing significant benefits in terms of energy consumption without considerable loss in accuracy, using a combination of publicly available and novel datasets. 
    more » « less
  5. Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a\(\uparrow 1.9\% \)higher ImageNet top-1 accuracy under the same FLOPs and a\(\uparrow 3.7\% \)better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “can these scaling strategies be transferred across different real hardware devices?”; and (2) “can these scaling strategies be transferred to different ViT variants and tasks?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from\(74.6\% \)to\(76.7\% \)(\(\uparrow 2.1\% \)) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by\(\uparrow 0.7\% \)under a similar throughput on a V100 GPU.

     
    more » « less