skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Spatio-Temporal Fusion of LiDAR and Camera Data for Omnidirectional Depth Perception
Object recognition and depth perception are two tightly coupled tasks that are indispensable for situational awareness. Most autonomous systems are able to perform these tasks by processing and integrating data streaming from a variety of sensors. The multiple hardware and sophisticated software architectures required to operate these systems makes them expensive to scale and operate. This paper implements a fast, monocular vision system that can be used for simultaneous object recognition and depth perception. We borrow from the architecture of a start-of-the-art object recognition system, YOLOv3, and extend its architecture by incorporating distances and modifying its loss functions and prediction vectors to enable it to multitask on both tasks. The vision system is trained on a large database acquired through the coupling of LiDAR measurements with complementary 360-degree camera to generate a high-fidelity labeled dataset. The performance of the multipurpose network is evaluated on a test dataset consisting of a total of 7,634 objects collected on a different road network. When compared with ground truth LiDAR data, the proposed network achieves a mean absolute percentage error rate of 11% on the passenger car within 10 m and a mean error rate of 7% or 9% on the truck within 10 m and beyond 10 m, respectively. It was also observed that adding a second task (depth perception) to the modeling network improved the accuracy of object detection by about 3%. The proposed multipurpose model can be used for the development of automated alert systems, traffic monitoring, and safety monitoring.  more » « less
Award ID(s):
2045786
PAR ID:
10497822
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
SAGE JOURNALS
Date Published:
Journal Name:
Transportation Research Record: Journal of the Transportation Research Board
ISSN:
0361-1981
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Our world offers a never-ending stream of visual stimuli, yet today’s vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset. 
    more » « less
  2. null (Ed.)
    Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds. These systems understand the present, but fail to contextualize it in past or future events. In this paper, we study long-form video understanding. We introduce a framework for modeling long-form videos and develop evaluation protocols on large-scale datasets. We show that existing state-of-the-art short-term models are limited for long-form tasks. A novel object-centric transformer-based video recognition architecture performs significantly better on 7 diverse tasks. It also outperforms comparable state-of-the-art on the AVA dataset. 
    more » « less
  3. Arthropods’ eyes are effective biological vision systems for object tracking and wide field of view because of their structural uniqueness; however, unlike mammalian eyes, they can hardly acquire the depth information of a static object because of their monocular cues. Therefore, most arthropods rely on motion parallax to track the object in three-dimensional (3D) space. Uniquely, the praying mantis (Mantodea) uses both compound structured eyes and a form of stereopsis and is capable of achieving object recognition in 3D space. Here, by mimicking the vision system of the praying mantis using stereoscopically coupled artificial compound eyes, we demonstrated spatiotemporal object sensing and tracking in 3D space with a wide field of view. Furthermore, to achieve a fast response with minimal latency, data storage/transportation, and power consumption, we processed the visual information at the edge of the system using a synaptic device and a federated split learning algorithm. The designed and fabricated stereoscopic artificial compound eye provides energy-efficient and accurate spatiotemporal object sensing and optical flow tracking. It exhibits a root mean square error of 0.3 centimeter, consuming only approximately 4 millijoules for sensing and tracking. These results are more than 400 times lower than conventional complementary metal-oxide semiconductor–based imaging systems. Our biomimetic imager shows the potential of integrating nature’s unique design using hardware and software codesigned technology toward capabilities of edge computing and sensing. 
    more » « less
  4. Object detection plays a pivotal in autonomous driving by enabling the vehicles to perceive and comprehend their environment, thereby making informed decisions for safe navigation. Camera data provides rich visual context and object recognition, while LiDAR data offers precise distance measurements and 3D mapping. Multi-modal object detection models are gaining prominence in incorporating these data types, which is essential for the comprehensive perception and situational awareness needed in autonomous vehicles. Although graphics processing units (GPUs) and field-programmable gate arrays (FPGAs) are promising hardware options for this application, the complex knowledge required to efficiently adapt and optimize multi-modal detection models for FPGAs presents a significant barrier to their utilization on this versatile and efficient platform. In this work, we evaluate the performance of camera and LiDARbased detection models on GPU and FPGA hardware, aiming to provide a specialized understanding for translating multi-modal detection models to suit the unique architecture of heterogeneous hardware platforms in autonomous driving systems. We focus on critical metrics from both system and model performance aspects. Based on our quantitative implications, we propose foundational insights and guidance for the design of camera and LiDAR-based multi-modal detection models on diverse hardware platforms. 
    more » « less
  5. Advances in perception for self-driving cars have accel- erated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety re- quirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new dataset to enable robust autonomous driving via a novel data collection pro- cess — data is repeatedly recorded along a 15 km route un- der diverse scene (urban, highway, rural, campus), weather (snow, rain, sun), time (day/night), and traffic conditions (pedestrians, cyclists and cars). The dataset includes im- ages and point clouds from cameras and LiDAR sensors, along with high-precision GPS/INS to establish correspon- dence across routes. The dataset includes road and object annotations using amodal masks to capture partial occlu- sions and 3D bounding boxes. We demonstrate the unique- ness of this dataset by analyzing the performance of base- lines in amodal segmentation of road and objects, depth estimation, and 3D object detection. The repeated routes opens new research directions in object discovery, contin- ual learning, and anomaly detection. Link to Ithaca365: https://ithaca365.mae.cornell.edu/ 
    more » « less