skip to main content


Search for: All records

Award ID contains: 1909663

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Energy-efficient visual sensing is of paramount importance to enable battery-backed low power IoT and mobile applications. Unfortunately, modern image sensors still consume hundreds of milliwatts of power, mainly due to analog readout. This is because current systems always supply a fixed voltage to the sensor’s analog circuitry, leading to higher power profiles. In this work, we propose to aggressively scale the analog voltage supplied to the camera as a means to significantly reduce sensor power consumption. To that end, we characterize the power and fidelity implications of analog voltage scaling on three off-the-shelf image sensors. Our characterization reveals that analog voltage scaling reduces sensor power but also degrades image quality. Furthermore, the degradation in image quality situationally affects the task accuracy of vision applications. We develop a visual streaming pipeline that flexibly allows application developers to dynamically adapt sensor voltage on a frame-by-frame basis. We develop a voltage controller that programmatically generates desired sensor voltage based on application request. We integrate our voltage controller into the existing RPi-based video streaming IoT pipeline. On top of this, we develop runtime support for flexible voltage specification from vision applications. Evaluating the system over a wide range of voltage scaling policies on popular vision tasks reveals that Squint imaging can deliver up to 73% sensor power savings, while maintaining reasonable task fidelity. Our artifacts are available at: https://gitlab.com/squint1/squint-ae-public 
    more » « less
  2. null (Ed.)
    High spatiotemporal resolution can offer high precision for vision applications, which is particularly useful to capture the nuances of visual features, such as for augmented reality. Unfortunately, capturing and processing high spatiotemporal visual frames generates energy-expensive memory traffic. On the other hand, low resolution frames can reduce pixel memory throughput, but reduce also the opportunities of high-precision visual sensing. However, our intuition is that not all parts of the scene need to be captured at a uniform resolution. Selectively and opportunistically reducing resolution for different regions of image frames can yield high-precision visual computing at energy-efficient memory data rates. To this end, we develop a visual sensing pipeline architecture that flexibly allows application developers to dynamically adapt the spatial resolution and update rate of different “rhythmic pixel regions” in the scene. We develop a system that ingests pixel streams from commercial image sensors with their standard raster-scan pixel read-out patterns, but only encodes relevant pixels prior to storing them in the memory. We also present streaming hardware to decode the stored rhythmic pixel region stream into traditional frame-based representations to feed into standard computer vision algorithms. We integrate our encoding and decoding hardware modules into existing video pipelines. On top of this, we develop runtime support allowing developers to flexibly specify the region labels. Evaluating our system on a Xilinx FPGA platform over three vision workloads shows 43 − 64% reduction in interface traffic and memory footprint, while providing controllable task accuracy. 
    more » « less
  3. Visual simultaneous localization and mapping (SLAM) is an emerging technology that enables low-power devices with a single camera to perform robotic navigation. However, most visual SLAM algorithms are tuned for images produced through the image sensor processing (ISP) pipeline optimized for highly aesthetic photography. In this paper, we investigate the feasibility of varying sensor quantization on RAW images directly from the sensor to save energy for visual SLAM. In particular, we compare linear and logarithmic image quantization and show visual SLAM is robust to the latter. Further, we introduce a new gradient-based image quantization scheme that outperforms logarithmic quantization’s energy savings while preserving accuracy for feature-based visual SLAM algorithms. This work opens a new direction in energy-efficient image sensing for SLAM in the future 
    more » « less
  4. Deep learning for computer vision depends on lossy image compression: it reduces the storage required for training and test data and lowers transfer costs in deployment. Mainstream datasets and imaging pipelines all rely on standard JPEG compression. In JPEG, the degree of quantization of frequency coefficients controls the lossiness: an 88 quantization table (Q-table) decides both the quality of the encoded image and the compression ratio. While a long history of work has sought better Q-tables, existing work either seeks to minimize image distortion or to optimize for models of the human visual system. This work asks whether JPEG Q-tables exist that are “better” for specific vision networks and can offer better quality–size trade-offs than ones designed for human perception or minimal distortion. We reconstruct an ImageNet test set with higher resolution to explore the effect of JPEG compression under novel Q-tables. We attempt several approaches to tune a Q-table for a vision task. We find that a simple sorted random sampling method can exceed the performance of the standard JPEG Q-table. We also use hyper-parameter tuning techniques including bounded random search, Bayesian optimization, and composite heuristic optimization methods. The new Q-tables we obtained can improve the compression rate by 10% to 200% when the accuracy is fixed, or improve accuracy up to 2% at the same compression rate. 
    more » « less
  5. Image sensors with programmable region-of-interest (ROI) readout are a new sensing technology important for energyefficient embedded computer vision. In particular, ROIs can subsample the number of pixels being readout while performing single object tracking in a video. In this paper, we develop adaptive sampling algorithms which perform joint object tracking and predictive video subsampling. We utilize an object detection consisting of either mean shift tracking or a neural network, coupled with a Kalman filter for prediction. We show that our algorithms achieve mean average precision of 0.70 or higher on a dataset of 20 videos in software. Further, we implement hardware acceleration of mean shift tracking with Kalman filter adaptive subsampling on an FPGA. Hardware results show a 23× improvement in clock cycles and latency as compared to baseline methods and achieves 38FPS real-time performance. This research points to a new domain of hardware-software co-design for adaptive video subsampling in embedded computer vision. 
    more » « less