Task-Relevant Evaluation Metrics of Object Detection for Quantitative System-Level Analysis of Safety-Critical Autonomous Systems

Badithela, Apurva (ORCID:0000000297882702); Srivastav, Ranai (ORCID:0009000413013676); Wongpiromsarn, Tichakorn (ORCID:000000023977122X); Murray, Richard (ORCID:0000000257857481)

doi:10.1145/3771284

Citation Details

This content will become publicly available on October 15, 2026

Task-Relevant Evaluation Metrics of Object Detection for Quantitative System-Level Analysis of Safety-Critical Autonomous Systems

In safety-critical robotic systems, perception is tasked with representing the environment to effectively guide decision-making and plays a crucial role in ensuring that the overall system meets its requirements. To quantitatively assess the impact of object detection and classification errors on system-level performance, we present a rigorous formalism for a model of detection error, and probabilistically reason about the satisfaction of regular-safety temporal logic requirements at the system level. We also show how standard evaluation metrics for object detection, such as confusion matrices, can be represented as models of detection error, which enables the computation of probabilistic satisfaction of system-level specifications. However, traditional confusion matrices treat all detections equally, without considering their relevance to the system-level task. To address this limitation, we propose novel evaluation metrics for object detection that are informed by both the system-level task and the downstream control logic, enabling a more context-appropriate evaluation of detection models. We identify logic-based formulas relevant to the downstream control and system-level specifications and use these formulas to define a logic-based evaluation metric for object detection and classification. These logic-based metrics result in less conservative assessments of system-level performance. Finally, we demonstrate our approach on a car-pedestrian example with a leaderboard PointPillars model evaluated on the nuScenes dataset, and validate probabilistic system-level guarantees in simulation. more »

Award ID(s):: 2141153

PAR ID:: 10645813

Author(s) / Creator(s):: Badithela, Apurva; Srivastav, Ranai; Wongpiromsarn, Tichakorn; Murray, Richard

Publisher / Repository:: Association for Computing Machinery

Date Published:: 2025-10-15

Journal Name:: ACM Transactions on Cyber-Physical Systems

ISSN:: 2378-962X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on October 15, 2026
Journal Article:
https://doi.org/10.1145/3771284

More Like this