Integrating multimodal data such as RGB and LiDAR from multiple views significantly increases computational and communication demands, which can be challenging for resource-constrained autonomous agents while meeting the time-critical deadlines required for various mission-critical applications. To address this challenge, we propose CoOpTex, a collaborative task execution framework designed for cooperative perception in distributed autonomous systems (DAS). CoOpTex contribution is twofold: (a) CoOpTex fuses multiview RGB images to create a panoramic camera view for 2D object detection and utilizes 360° LiDAR for 3D object detection, improving accuracy with a lightweight Graph Neural Network (GNN) that integrates object coordinates from both perspectives, (b) To optimize task execution and meet the deadline, CoOpTex dynamically offloads computationally intensive image stitching tasks to auxiliary devices when available and adjusts frame capture rates for RGB frames based on device mobility and processing capabilities. We implement CoOpTex in real-time on static and mobile heterogeneous autonomous agents, which helps to significantly reduce deadline violations by 100% while improving frame rates for 2D detection by 2.2 times in stationary and 2 times in mobile conditions, demonstrating its effectiveness in enabling real-time cooperative perception.
more »
« less
VINet: Lightweight, scalable, and heterogeneous cooperative perception for 3D object detection
Utilizing the latest advances in Artificial Intelligence (AI), the computer vision community is now witnessing an unprecedented evolution in all kinds of perception tasks, particularly in object detection. Based on multiple spatially separated perception nodes, Cooperative Perception (CP) has emerged to significantly advance the perception of automated driving. However, current cooperative object detection methods mainly focus on ego-vehicle efficiency without considering the practical issues of system-wide costs. In this paper, we introduce VINet, a unified deep learning-based CP network for scalable, lightweight, and heterogeneous cooperative 3D object detection. VINet is the first CP method designed from the standpoint of large-scale systemlevel implementation and can be divided into three main phases: (1) Global Pre-Processing and Lightweight Feature Extraction which prepare the data into global style and extract features for cooperation in a lightweight manner; (2) Two-Stream Fusion which fuses the features from scalable and heterogeneous perception nodes; and (3) Central Feature Backbone and 3D Detection Head which further process the fused features and generate cooperative detection results. An open-source data experimental platform is designed and developed for CP dataset acquisition and model evaluation. The experimental analysis shows that VINet can reduce 84% system-level computational cost and 94% system-level communication cost while improving the 3D detection accuracy.
more »
« less
- Award ID(s):
- 2152258
- PAR ID:
- 10510371
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- Mechanical Systems and Signal Processing
- Volume:
- 204
- Issue:
- C
- ISSN:
- 0888-3270
- Page Range / eLocation ID:
- 110723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Interest in cooperative perception is growing quickly due to its remarkable performance in improving perception capabilities for connected and automated vehicles. This improvement is crucial, especially for automated driving scenarios in which perception performance is one of the main bottlenecks to the development of safety and efficiency. However, current cooperative perception methods typically assume that all collaborating vehicles have enough communication bandwidth to share all features with an identical spatial size, which is impractical for real-world scenarios. In this paper, we propose Adaptive Cooperative Perception, a new cooperative perception framework that is not limited by the aforementioned assumptions, aiming to enable cooperative perception under more realistic and challenging conditions. To support this, a novel feature encoder is proposed and named Pillar Attention Encoder. A pillar attention mechanism is designed to extract the feature data while considering its significance for the perception task. An adaptive feature filter is proposed to adjust the size of the feature data for sharing by considering the importance value of the feature. Experiments are conducted for cooperative object detection from multiple vehicle-based and infrastructure-based LiDAR sensors under various communication conditions. Results demonstrate that our method can successfully handle dynamic communication conditions and improve the mean Average Precision by 10.18% when compared with the state-of-the-art feature encoder.more » « less
-
Perceiving the environment is one of the most fundamental keys to enabling Cooperative Driving Automation, which is regarded as the revolutionary solution to addressing the safety, mobility, and sustainability issues of contemporary transportation systems. Although an unprecedented evolution is now happening in the area of computer vision for object perception, state-of-the-art perception methods are still struggling with sophisticated real-world traffic environments due to the inevitable physical occlusion and limited receptive field of single-vehicle systems. Based on multiple spatially separated perception nodes, Cooperative Perception (CP) is born to unlock the bottleneck of perception for driving automation. In this paper, we comprehensively review and analyze the research progress on CP, and we propose a unified CP framework. The architectures and taxonomy of CP systems based on different types of sensors are reviewed to show a high-level description of the workflow and different structures for CP systems. The node structure, sensing modality, and fusion schemes are reviewed and analyzed with detailed explanations for CP. A Hierarchical Cooperative Perception (HCP) framework is proposed, followed by a review of existing open-source tools that support CP development. The discussion highlights the current opportunities, open challenges, and anticipated future trends.more » « less
-
Abstract—Object perception plays a fundamental role in Cooperative Driving Automation (CDA) which is regarded as a revolutionary promoter for next-generation transportation systems. However, the vehicle-based perception may suffer from the limited sensing range and occlusion as well as low penetration rates in connectivity. In this paper, we propose Cyber Mobility Mirror (CMM), a next-generation real-world object perception system for 3D object detection, tracking, localization, and reconstruction, to explore the potential of roadside sensors for enabling CDA in the real world. The CMM system consists of six main components: i) the data pre-processor to retrieve and preprocess the raw data; ii) the roadside 3D object detector to generate 3D detection results; iii) the multi-object tracker to identify detected objects; iv) the global locator to generate geo-localization information; v) the mobile-edge-cloud-based communicator to transmit perception information to equipped vehicles, and vi) the onboard advisor to reconstruct and display the real-time traffic conditions. An automatic perception evaluation approach is proposed to support the assessment of data-driven models without human-labeling requirements and a CMM field-operational system is deployed at a real-world intersection to assess the performance of the CMM. Results from field tests demonstrate that our CMM prototype system can achieve 96.99% precision and 83.62% recall for detection and 73.55% ID-recall for tracking. High-fidelity real-time traffic conditions (at the object level) can be geo-localized with a root-mean-square error (RMSE) of 0.69m and 0.33m for lateral and longitudinal direction, respectively, and displayed on the GUI of the equipped vehicle with a frequency of 3 − 4Hz.more » « less
-
Perceiving the surrounding environment is critical to enable cooperative driving automation, which is regarded as a transformative solution to improving our transportation system. Cooperative perception, by cooperating information from spatially separated nodes, can innately unlock the bottleneck caused by physical occlusions and has become an important research topic. Although cooperative perception aims to resolve practical problems, most of the current research work is designed based on the default assumption that the communication capacities of collaborated perception entities are identical. In this work, we introduce a fundamental approach - Dynamic Feature Sharing (DFS) - for cooperative perception from a more pragmatic context. Specifically, a DFS-based cooperative perception framework is designed to dynamically reduce the feature data required for sharing among the cooperating entities. Convolution-based Priority Filtering (CPF) is proposed to enable DFS under different communication constraints (e.g., bandwidth) by filtering the feature data according to the designed priority values. Zero-shot experiments demonstrate that the proposed CPF method can improve cooperative perception performance by approximately +22% under a dynamic communication-capacity condition and up to +130% when the communication bandwidth is reduced by 90 %.more » « less
An official website of the United States government

