skip to main content


Title: Towards Driving-Oriented Metric for Lane Detection Models
After the 2017 TuSimple Lane Detection Challenge, its dataset and evaluation based on accuracy and F1 score have become the de facto standard to measure the performance of lane detection methods. While they have played a major role in improving the performance of lane detection methods, the validity of this evaluation method in downstream tasks has not been adequately researched. In this study, we design 2 new driving-oriented metrics for lane detection: End-to-End Lateral Deviation metric (E2E-LD) is directly formulated based on the requirements of autonomous driving, a core downstream task of lane detection; Per-frame Simulated Lateral Deviation metric (PSLD) is a lightweight surrogate metric of E2E-LD. To evaluate the validity of the metrics, we conduct a large-scale empirical study with 4 major types of lane detection approaches on the TuSimple dataset and our newly constructed dataset Comma2k19-LD. Our results show that the conventional metrics have strongly negative correlations (≤-0.55) with E2E-LD, meaning that some recent improvements purely targeting the conventional metrics may not have led to meaningful improvements in autonomous driving, but rather may actually have made it worse by overfitting to the conventional metrics. As autonomous driving is a security/safety-critical system, the underestimation of robustness hinders the sound development of practical lane detection models. We hope that our study will help the community achieve more downstream task-aware evaluations for lane detection.  more » « less
Award ID(s):
1929771 1932464 2145493
NSF-PAR ID:
10359468
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. State-of-the-art lane detection methods use a variety of deep learning techniques for lane feature extraction and prediction, demonstrating better performance than conventional lane detectors. However, deep learning approaches are computationally demanding and often fail to meet real-time requirements of autonomous vehicles. This paper proposes a lane detection method using a light-weight convolutional neural network model as a feature extractor exploiting the potential of deep learning while meeting real-time needs. The developed model is trained with a dataset containing small image patches of dimension 16 × 64 pixels and a non-overlapping sliding window approach is employed to achieve fast inference. Then, the predictions are clustered and fitted with a polynomial to model the lane boundaries. The proposed method was tested on the KITTI and Caltech datasets and demonstrated an acceptable performance. We also integrated the detector into the localization and planning system of our autonomous vehicle and runs at 28 fps in a CPU on image resolution of 768 × 1024 meeting real-time requirements needed for self-driving cars. 
    more » « less
  2. RNN Tranducer (RNN-T) technology is very popular for building deployable models for end-to-end (E2E) automatic speech recognition (ASR) and spoken language understanding (SLU). Since these are E2E models operating on speech directly, there remains a potential to improve their performance using purely text based models like BERT, which have strong language understanding capabilities. In this paper, we propose a new training criteria for RNN-T based E2E ASR and SLU to transfer BERT’s knowledge into these systems. In the first stage of our proposed mechanism, we improve ASR performance by using a fine-grained, tokenwise knowledge transfer from BERT. In the second stage, we fine-tune the ASR model for SLU such that the above knowledge is explicitly utilized by the RNN-T model for improved performance. Our techniques improve ASR performance on the Switchboard and CallHome test sets of the NIST Hub5 2000 evaluation and on the recently released SLURP dataset on which we achieve a new state-of-the-art performance. For SLU, we show significant improvements on the SLURP slot filling task, outperforming HuBERT-base and reaching a performance close to HuBERTlarge. Compared to large transformer based speech models like HuBERT, our model is significantly more compact and uses only 300 hours of speech pretraining data. 
    more » « less
  3. null (Ed.)
    3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels. However, due to the high task complexity, we observe that the pseudo-labels suffer from significant noise and are thus not directly usable. To that end, we introduce a confidence-based filtering mechanism, inspired by FixMatch. We set confidence thresholds based upon the predicted objectness and class probability to filter low-quality pseudo-labels. While effective, we observe that these two measures do not sufficiently capture localization quality. We therefore propose to use the estimated 3D IoU as a localization metric and set category-aware self-adjusted thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone detector on indoor datasets while we use PV-RCNN on the autonomous driving dataset, KITTI. Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios (including fully labeled setting). For example, when training using only 10% labeled data on ScanNet, 3DIoUMatch achieves 7.7 absolute improvement on mAP@0.25 and 8.5 absolute improvement on mAP@0.5 upon the prior art. On KITTI, we are the first to demonstrate semi-supervised 3D object detection and our method surpasses a fully supervised baseline from 1.8% to 7.6% under different label ratio and categories. 
    more » « less
  4. Autonomous Vehicles (AVs) are an emerging and highly impactful technology on today's roads. When assessing the performance of AVs, it is useful to study their improvement relative to common metrics such as fuel economy/emissions, safety, and congestion. But metrics of the vehicle's performance alone may not be complete; an AV that is affecting and reacting to a smart traffic light, for example, may improve its own performance, but may cause the same intersection to degrade the performance of other vehicles around the AV. Similar concerns arise in nearly all AV topics: platooning, light pre-emption, lane tracking, etc. Thus, the assessment of the vehicle's impacts on surrounding traffic is important, possibly even more important than the improvements enabled on the AV alone. But what boundary, or factors, define the vehicles, equipment, etc. “surrounding” an AV? The goal of this work is to characterize the boundary of vehicles “surrounding” an AV, referred to as Region of Influence, or ROI. Specifically, this work focuses on the problem that considering a perturbation is exerted into a traffic system, how far in time and space the perturbation from an AV’s decision can influence the surrounding system’s behavior. To achieve the goal, we utilized AIMSUN, a microscopic traffic simulator, to perform baseline and perturbed simulations. The ROI was evaluated by comparing trajectories of traffic surrounding the ego vehicle using different metrics, including difference in trajectories, Euclidian distance, rate of change of Euclidian distance, total number of lane changes over the whole simulation space versus time and total number of lane changes over the whole simulation time versus distance to ego vehicle. The results show that the ROI can be viewed from different perspectives using these metrics, and it is dependent on speed variance of the traffic. 
    more » « less
  5. Radar-based solutions support practical and longi- tudinal respiration monitoring owing to their non-invasive nature. Nighttime respiration monitoring at home provides rich and high- quality data, mostly free of motion disturbances because the user is quasi-stationary during sleep, and 6-8 hours per day rather than tens of minutes, promising for longitudinal studies. However, most existing work was conducted in laboratory environments for short periods, thus the environment, user motions, and postures can differ significantly from those in real homes. To understand how to obtain quality, overnight respiration data in real homes, we conduct a thorough experimental study with 6 participants of various sleep postures over 9 nights in 4 real-home testbeds, each configured with 3–4 sensors around the bed. We first compare the performance among four typical sensor placements around the bed to understand which is the optimal location for high quality data. Then we explore methods to track range bins with high quality signals as occasional user motions change the distance thus signal qualities, and different aspects of amplitude and phase data to further improve the signal quality using metrics of the periodicity-to-noise ratio (PNR) and end-to-end (e2e) accuracy. The experiments demonstrate that the sensor placement is a vital factor, and the bedside is an optimal choice considering both accuracy and ease of deployment (2.65 bpm error at 80 percentile), also consistent among four typical sleep postures. We also observe that, a proper range bin selection method can improve the PNR by 2 dB at 75-percentile, and e2e accuracy by 0.9 bpm at 80-percentile. Both amplitude and phase data have comparable e2e accuracy, while phase is more sensitive to motions thus suitable for nighttime movement detection. Based on these discoveries, we develop a few simple practical guidelines useful for the community to achieve high quality, longitudinal home- based overnight respiration monitoring. 
    more » « less