After the 2017 TuSimple Lane Detection Challenge, its dataset and evaluation based on accuracy and F1 score have become the de facto standard to measure the performance of lane detection methods. While they have played a major role in improving the performance of lane detection methods, the validity of this evaluation method in downstream tasks has not been adequately researched. In this study, we design 2 new driving-oriented metrics for lane detection: End-to-End Lateral Deviation metric (E2E-LD) is directly formulated based on the requirements of autonomous driving, a core downstream task of lane detection; Per-frame Simulated Lateral Deviation metric (PSLD) is a lightweight surrogate metric of E2E-LD. To evaluate the validity of the metrics, we conduct a large-scale empirical study with 4 major types of lane detection approaches on the TuSimple dataset and our newly constructed dataset Comma2k19-LD. Our results show that the conventional metrics have strongly negative correlations (≤-0.55) with E2E-LD, meaning that some recent improvements purely targeting the conventional metrics may not have led to meaningful improvements in autonomous driving, but rather may actually have made it worse by overfitting to the conventional metrics. As autonomous driving is a security/safety-critical system, the underestimation of robustness hinders the sound development of practical lane detection models. We hope that our study will help the community achieve more downstream task-aware evaluations for lane detection.
more »
« less
Evaluation Metrics for Object Detection for Autonomous Systems
This paper studies the evaluation of learning-based object detection models in conjunction with model-checking of formal specifications defined on an abstract model of an autonomous system and its environment. In particular, we define two metrics – proposition-labeled and class-labeled confusion matrices – for evaluating object detection, and we incorporate these metrics to compute the satisfaction probability of system-level safety requirements. While confusion matrices have been effective for comparative evaluation of classification and object detection models, our framework fills two key gaps. First, we relate the performance of object detection to formal requirements defined over downstream high-level planning tasks. In particular, we provide empirical results that show that the choice of a good object detection algorithm, with respect to formal requirements on the overall system, significantly depends on the downstream planning and control design. Secondly, unlike the traditional confusion matrix, our metrics account for variations in performance with respect to the distance between the ego and the object being detected. We demonstrate this framework on a car-pedestrian example by computing the satisfaction probabilities for safety requirements formalized in Linear Temporal Logic (LTL).
more »
« less
- Award ID(s):
- 2141153
- PAR ID:
- 10472521
- Publisher / Repository:
- 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a deductive synthesis framework for construct- ing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements.more » « less
-
null (Ed.)Commonly used metrics for evaluation of object detection systems (precision, recall, mAP) do not give complete information about their suitability of use in safety-critical tasks, like obstacle detection for collision avoidance in Autonomous Vehicles (AV). This work introduces the Risk Ranked Recall ($R^3$) metrics for object detection systems. The $R^3$ metrics categorize objects within three ranks. Ranks are assigned based on an objective cyber-physical model for the risk of collision. Recall is measured for each rankmore » « less
-
null (Ed.)Access to electricity positively correlates with many beneficial socioeconomic outcomes in the developing world including improvements in education, health, and poverty. Efficient planning for electricity access requires information on the location of existing electric transmission and distribution infrastructure; however, the data on existing infrastructure is often unavailable or expensive. We propose a deep learning based method to automatically detect electric transmission infrastructure from aerial imagery and quantify those results with traditional object detection performance metrics. In addition, we explore two challenges to applying these techniques at scale: (1) how models trained on particular geographies generalize to other locations and (2) how the spatial resolution of imagery impacts infrastructure detection accuracy. Our approach results in object detection performance with an F1 score of 0.53 (0.47 precision and 0.60 recall). Using training data that includes more diverse geographies improves performance across the 4 geographies that we examined. Image resolution significantly impacts object detection performance and decreases precipitously as the image resolution decreases.more » « less
-
The operational safety of Automated Driving System (ADS)-Operated Vehicles (AVs) are a rising concern with the deployment of AVs as prototypes being tested and also in commercial deployment. The robustness of safety evaluation systems is essential in determining the operational safety of AVs as they interact with human-driven vehicles. Extending upon earlier works of the Institute of Automated Mobility (IAM) that have explored the Operational Safety Assessment (OSA) metrics and infrastructure-based safety monitoring systems, in this work, we compare the performance of an infrastructure-based Light Detection And Ranging (LIDAR) system to an onboard vehicle-based LIDAR system in testing at the Maricopa County Department of Transportation SMARTDrive testbed in Anthem, Arizona. The sensor modalities are located in infrastructure and onboard the test vehicles, including LIDAR, cameras, a real-time differential GPS, and a drone with a camera. Bespoke localization and tracking algorithms are created for the LIDAR and cameras. In total, there are 26 different scenarios of the test vehicles navigating the testbed intersection; for this work, we are only considering car following scenarios. The LIDAR data collected from the infrastructure-based and onboard vehicle-based sensors system are used to perform object detection and multi-target tracking to estimate the velocity and position information of the test vehicles and use these values to compute OSA metrics. The comparison of the performance of the two systems involves the localization and tracking errors in calculating the position and the velocity of the subject vehicle, with the real-time differential GPS data serving as ground truth for velocity comparison and tracking results from the drone for OSA metrics comparison.more » « less