skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Design and FPGA Implementation of an Adaptive Video Subsampling Algorithm for Energy-Efficient Single Object Tracking
Image sensors with programmable region-of-interest (ROI) readout are a new sensing technology important for energyefficient embedded computer vision. In particular, ROIs can subsample the number of pixels being readout while performing single object tracking in a video. In this paper, we develop adaptive sampling algorithms which perform joint object tracking and predictive video subsampling. We utilize an object detection consisting of either mean shift tracking or a neural network, coupled with a Kalman filter for prediction. We show that our algorithms achieve mean average precision of 0.70 or higher on a dataset of 20 videos in software. Further, we implement hardware acceleration of mean shift tracking with Kalman filter adaptive subsampling on an FPGA. Hardware results show a 23× improvement in clock cycles and latency as compared to baseline methods and achieves 38FPS real-time performance. This research points to a new domain of hardware-software co-design for adaptive video subsampling in embedded computer vision.  more » « less
Award ID(s):
1909663
PAR ID:
10157898
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Image Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a benchmark-driven experimental study of autonomous drone agility relative to edge offload pipeline attributes. This pipeline includes a monocular gimbal-actuated on-drone camera, hardware RTSP video encoding, 4G LTE wireless network transmission, and computer vision processing on a ground-based GPU-equipped cloudlet. Our parameterized and reproducible agility benchmarks stress the OODA (“Observe, Orient, Decide, Act”) loop of the drone on obstacle avoidance and object tracking tasks. We characterize the latency and throughput of components of this OODA loop through software profiling, and identify opportunities for optimization. 
    more » « less
  2. Video-based eye trackers increasingly have potential to improve on-screen magnification for low-vision computer users. Yet, little is known about the viability of eye tracking hardware for gaze-guided magnification. We employed a magnification prototype to assess eye tracking quality for low-vision users as they performed reading and search tasks. We show that a high degree of tracking loss prevents current video-based eye tracking from capturing gaze input for low-vision users. Our findings show current technologies were not made with low vision users in mind, and we offer suggestions to improve gaze-tracking for diverse eye input. 
    more » « less
  3. Most of the current solutions for autonomous flights in indoor environments rely on purely geometric maps (e.g., point clouds). There has been, however, a growing interest in supplementing such maps with semantic information (e.g., object detections) using computer vision algorithms. Unfortunately, there is a disconnect between the relatively heavy computational requirements of these computer vision solutions, and the limited computation capacity available on mobile autonomous platforms. In this paper, we propose to bridge this gap with a novel Markov Decision Process framework that adapts the parameters of the vision algorithms to the incoming video data rather than fixing them a priori. As a concrete example, we test our framework on a object detection and tracking task, showing significant benefits in terms of energy consumption without considerable loss in accuracy, using a combination of publicly available and novel datasets. 
    more » « less
  4. We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot’s proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. 
    more » « less
  5. Vehicle flow estimation has many potential smart cities and transportation applications. Many cities have existing camera networks which broadcast image feeds; however, the resolution and frame-rate are too low for existing computer vision algorithms to accurately estimate flow. In this work, we present a computer vision and deep learning framework for vehicle tracking. We demonstrate a novel tracking pipeline which enables accurate flow estimates in a range of environments under low resolution and frame-rate constraints. We demonstrate that our system is able to track vehicles in New York City's traffic camera video feeds at 1 Hz or lower frame-rate, and produces higher traffic flow accuracy than popular open source tracking frameworks. 
    more » « less