skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Toward Explainable Artificial Intelligence for Early Anticipation of Traffic Accidents
Traffic accident anticipation is a vital function of Automated Driving Systems (ADS) in providing a safety-guaranteed driving experience. An accident anticipation model aims to predict accidents promptly and accurately before they occur. Existing Artificial Intelligence (AI) models of accident anticipation lack a human-interpretable explanation of their decision making. Although these models perform well, they remain a black-box to the ADS users who find it to difficult to trust them. To this end, this paper presents a gated recurrent unit (GRU) network that learns spatio-temporal relational features for the early anticipation of traffic accidents from dashcam video data. A post-hoc attention mechanism named Grad-CAM (Gradient-weighted Class Activation Map) is integrated into the network to generate saliency maps as the visual explanation of the accident anticipation decision. An eye tracker captures human eye fixation points for generating human attention maps. The explainability of network-generated saliency maps is evaluated in comparison to human attention maps. Qualitative and quantitative results on a public crash data set confirm that the proposed explainable network can anticipate an accident on average 4.57 s before it occurs, with 94.02% average precision. Various post-hoc attention-based XAI methods are then evaluated and compared. This confirms that the Grad-CAM chosen by this study can generate high-quality, human-interpretable saliency maps (with 1.23 Normalized Scanpath Saliency) for explaining the crash anticipation decision. Importantly, results confirm that the proposed AI model, with a human-inspired design, can outperform humans in accident anticipation.  more » « less
Award ID(s):
2026357
PAR ID:
10374013
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Transportation Research Record: Journal of the Transportation Research Board
Volume:
2676
Issue:
6
ISSN:
0361-1981
Page Range / eLocation ID:
743 to 755
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a simple approach to make pre-trained Vision Transformers (ViTs) interpretable for fine-grained analysis, aiming to identify and localize the traits that distinguish visually similar categories, such as bird species. Pre-trained ViTs, such as DINO, have demonstrated remarkable capabilities in extracting localized, discriminative features. However, saliency maps like Grad-CAM often fail to identify these traits, producing blurred, coarse heatmaps that highlight entire objects instead. We propose a novel approach, Prompt Class Attention Map (Prompt-CAM), to address this limitation. Prompt-CAM learns class-specific prompts for a pre-trained ViT and uses the corresponding outputs for classification. To correctly classify an image, the true-class prompt must attend to unique image patches not present in other classes' images (i.e., traits). As a result, the true class's multi-head attention maps reveal traits and their locations. Implementation-wise, Prompt-CAM is almost a "free lunch," requiring only a modification to the prediction head of Visual Prompt Tuning (VPT). This makes Prompt-CAM easy to train and apply, in stark contrast to other interpretable methods that require designing specific models and training processes. Extensive empirical studies on a dozen datasets from various domains (e.g., birds, fishes, insects, fungi, flowers, food, and cars) validate the superior interpretation capability of Prompt-CAM. The source code and demo are available at https://github.com/Imageomics/Prompt_CAM. 
    more » « less
  2. The class activation map (CAM) represents the neural-network-derived region of interest, which can help clarify the mechanism of the convolutional neural network’s determination of any class of interest. In medical imaging, it can help medical practitioners diagnose diseases like COVID-19 or pneumonia by highlighting the suspicious regions in Computational Tomography (CT) or chest X-ray (CXR) film. Many contemporary deep learning techniques only focus on COVID-19 classification tasks using CXRs, while few attempt to make it explainable with a saliency map. To fill this research gap, we first propose a VGG-16-architecture-based deep learning approach in combination with image enhancement, segmentation-based region of interest (ROI) cropping, and data augmentation steps to enhance classification accuracy. Later, a multi-layer Gradient CAM (ML-Grad-CAM) algorithm is integrated to generate a class-specific saliency map for improved visualization in CXR images. We also define and calculate a Severity Assessment Index (SAI) from the saliency map to quantitatively measure infection severity. The trained model achieved an accuracy score of 96.44% for the three-class CXR classification task, i.e., COVID-19, pneumonia, and normal (healthy patients), outperforming many existing techniques in the literature. The saliency maps generated from the proposed ML-GRAD-CAM algorithm are compared with the original Gran-CAM algorithm. 
    more » « less
  3. Autonomous vehicle (AV) technology is a huge leap forward in capability for mobility. To be effective, the current human based vehicle safety infrastructure will have to be upgraded. A critical leg of this infrastructure is the automobile accident report. Conventional vehicle accident reports have evolved to a point where law enforcement have a reasonably standard approach focused on humans. However, with AVs there are no drivers to interview. Also, given their automation, a flaw found in an AV has the potential to be a systemic risk. In this respect, AVs must be handled more like airplanes in terms of post accident procedures. In this paper, we explore the requirements for AV accident reports and the escalation procedures required to avoid systemic risks. Our methodology is to analyze all the information available (crash reports as well as press accounts) of AV accidents to date with a special focus on the fatal accidents. As a result of this work, a recommendation of an AV crash report template, associated escalation procedure, and an infrastructure for accumulated learning is presented. 
    more » « less
  4. Given the widespread deployment of black box deep neural networks in computer vision applications, the interpretability aspect of these black box systems has recently gained traction. Various methods have been proposed to explain the results of such deep neural networks. However, some recent works have shown that such explanation methods are biased and do not produce consistent interpretations. Hence, rather than introducing a novel explanation method, we learn models that are encouraged to be interpretable given an explanation method. We use Grad-CAM as the explanation algorithm and encourage the network to learn consistent interpretations along with maximizing the log-likelihood of the correct class. We show that our method outperforms the baseline on the pointing game evaluation on ImageNet and MS-COCO datasets respectively. We also introduce new evaluation metrics that penalize the saliency map if it lies outside the ground truth bounding box or segmentation mask, and show that our method outperforms the baseline on these metrics as well. Moreover, our model trained with interpretation consistency generalizes to other explanation algorithms on all the evaluation metrics. 
    more » « less
  5. Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs. 
    more » « less