skip to main content


This content will become publicly available on September 24, 2024

Title: Enhanced Object Detection by Integrating Camera Parameters into Raw Image-Based Faster R-CNN
The rapid progress in intelligent vehicle technology has led to a significant reliance on computer vision and deep neural networks (DNNs) to improve road safety and driving experience. However, the image signal processing (ISP) steps required for these networks, including demosaicing, color correction, and noise reduction, increase the overall processing time and computational resources. To address this, our paper proposes an improved version of the Faster R-CNN algorithm that integrates camera parameters into raw image input, reducing dependence on complex ISP steps while enhancing object detection accuracy. Specifically, we introduce additional camera parameters, such as ISO speed rating, exposure time, focal length, and F-number, through a custom layer into the neural network. Further, we modify the traditional Faster R-CNN model by adding a new fully connected layer, combining these parameters with the original feature maps from the backbone network. Our proposed new model, which incorporates camera parameters, has a 4.2% improvement in mAP@[0.5,0.95] compared to the traditional Faster RCNN model for object detection tasks on raw image data.  more » « less
Award ID(s):
2152258
NSF-PAR ID:
10510626
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-9946-2
Page Range / eLocation ID:
4473 to 4478
Format(s):
Medium: X
Location:
Bilbao, Spain
Sponsoring Org:
National Science Foundation
More Like this
  1. Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems. 
    more » « less
  2. With the wider adoption of edge computing services, intelligent edge devices, and high-speed V2X communication, compute-intensive tasks for autonomous vehicles, such as object detection using camera, LiDAR, and/or radar data, can be partially offloaded to road-side edge servers. However, data privacy becomes a major concern for vehicular edge computing, as sensitive sensor data from vehicles can be observed and used by edge servers. We aim to address the privacy problem by protecting both vehicles’ sensor data and the detection results. In this paper, we present vehicle–edge cooperative deep-learning networks with privacy protection for object-detection tasks, named vePOD for short. In vePOD, we leverage the additive secret sharing theory to develop secure functions for every layer in an object-detection convolutional neural network (CNN). A vehicle’s sensor data is split and encrypted into multiple secret shares, each of which is processed on an edge server by going through the secure layers of a detection network. The detection results can only be obtained by combining the partial results from the participating edge servers. We have developed proof-of-concept detection networks with secure layers: vePOD Faster R-CNN (two-stage detection) and vePOD YOLO (single-stage detection). Experimental results on public datasets show that vePOD does not degrade the accuracy of object detection and, most importantly, it protects data privacy for vehicles. The execution of a vePOD object-detection network with secure layers is orders of magnitude faster than the existing approaches for data privacy. To the best of our knowledge, this is the first work that targets privacy protection in object-detection tasks with vehicle–edge cooperative computing. 
    more » « less
  3. null (Ed.)
    Most modern commodity imaging systems we use directly for photography—or indirectly rely on for downstream applications—employ optical systems of multiple lenses that must balance deviations from perfect optics, manufacturing constraints, tolerances, cost, and footprint. Although optical designs often have complex interactions with downstream image processing or analysis tasks, today’s compound optics are designed in isolation from these interactions. Existing optical design tools aim to minimize optical aberrations, such as deviations from Gauss’ linear model of optics, instead of application-specific losses, precluding joint optimization with hardware image signal processing (ISP) and highly parameterized neural network processing. In this article, we propose an optimization method for compound optics that lifts these limitations. We optimize entire lens systems jointly with hardware and software image processing pipelines, downstream neural network processing, and application-specific end-to-end losses. To this end, we propose a learned, differentiable forward model for compound optics and an alternating proximal optimization method that handles function compositions with highly varying parameter dimensions for optics, hardware ISP, and neural nets. Our method integrates seamlessly atop existing optical design tools, such as Zemax . We can thus assess our method across many camera system designs and end-to-end applications. We validate our approach in an automotive camera optics setting—together with hardware ISP post processing and detection—outperforming classical optics designs for automotive object detection and traffic light state detection. For human viewing tasks, we optimize optics and processing pipelines for dynamic outdoor scenarios and dynamic low-light imaging. We outperform existing compartmentalized design or fine-tuning methods qualitatively and quantitatively, across all domain-specific applications tested. 
    more » « less
  4. Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a deep neural network image classifier, as demonstrated in prior work. In this work, we tackle the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. In this showcase, we will demonstrate the first robust physical adversarial attack that can fool a state-of-the-art Faster R-CNN object detector. Specifically, we will show various perturbed stop signs that will be consistently mis-detected by an object detector as other target objects. The audience can test in real time the robustness of our adversarially crafted stop signs from different distances and angles. This work is a collaboration between Georgia Tech and Intel Labs and is funded by the Intel Science & Technology Center for Adversary-Resilient Security Analytics at Georgia Tech. 
    more » « less
  5. Bayer pattern is a widely used Color Filter Array (CFA) for digital image sensors, efficiently capturing different light wavelengths on different pixels without the need for a costly ISP pipeline. The resulting single-channel raw Bayer images offer benefits such as spectral wavelength sensitivity and low time latency. However, object detection based on Bayer images has been underexplored due to challenges in human observation and algorithm design caused by the discontinuous color channels in adjacent pixels. To address this issue, we propose the BayerDetect network, an end-to-end deep object detection framework that aims to achieve fast, accurate, and memory-efficient object detection. Unlike RGB color images, where each pixel encodes spectral context from adjacent pixels during ISP color interpolation, raw Bayer images lack spectral context. To enhance the spectral context, the BayerDetect network introduces a spectral frequency attention block, transforming the raw Bayer image pattern to the frequency domain. In object detection, clear object boundaries are essential for accurate bounding box predictions. To handle the challenges posed by alternating spectral channels and mitigate the influence of discontinuous boundaries, the BayerDetect network incorporates a spatial attention scheme that utilizes deformable convolutional kernels in multiple scales to explore spatial context effectively. The extracted convolutional features are then passed through a sparse set of proposal boxes for detection and classification. We conducted experiments on both public and self-collected raw Bayer images, and the results demonstrate the superb performance of the BayerDetect network in object detection tasks. 
    more » « less