skip to main content

Title: Too Afraid to Drive: Systematic Discovery of Semantic DoS Vulnerability in Autonomous Driving Planning under Physical-World Attacks
In high-level Autonomous Driving (AD) systems, behavioral planning is in charge of making high-level driving decisions such as cruising and stopping, and thus highly securitycritical. In this work, we perform the first systematic study of semantic security vulnerabilities specific to overly-conservative AD behavioral planning behaviors, i.e., those that can cause failed or significantly-degraded mission performance, which can be critical for AD services such as robo-taxi/delivery. We call them semantic Denial-of-Service (DoS) vulnerabilities, which we envision to be most generally exposed in practical AD systems due to the tendency for conservativeness to avoid safety incidents. To achieve high practicality and realism, we assume that the attacker can only introduce seemingly-benign external physical objects to the driving environment, e.g., off-road dumped cardboard boxes. To systematically discover such vulnerabilities, we design PlanFuzz, a novel dynamic testing approach that addresses various problem-specific design challenges. Specifically, we propose and identify planning invariants as novel testing oracles, and design new input generation to systematically enforce problemspecific constraints for attacker-introduced physical objects. We also design a novel behavioral planning vulnerability distance metric to effectively guide the discovery. We evaluate PlanFuzz on 3 planning implementations from practical open-source AD systems, and find that it can effectively discover 9 more » previouslyunknown semantic DoS vulnerabilities without false positives. We find all our new designs necessary, as without each design, statistically significant performance drops are generally observed. We further perform exploitation case studies using simulation and real-vehicle traces. We discuss root causes and potential fixes. « less
; ; ; ; ; ;
Award ID(s):
1929771 2145493 1932464 1850533
Publication Date:
Journal Name:
ISOC Network and Distributed Systems Security (NDSS) Symposium
Sponsoring Org:
National Science Foundation
More Like this
  1. For high-level Autonomous Vehicles (AV), localization is highly security and safety critical. One direct threat to it is GPS spoofing, but fortunately, AV systems today predominantly use Multi-Sensor Fusion (MSF) algorithms that are generally believed to have the potential to practically defeat GPS spoofing. However, no prior work has studied whether today’s MSF algorithms are indeed sufficiently secure under GPS spoofing, especially in AV settings. In this work, we perform the first study to fill this critical gap. As the first study, we focus on a production-grade MSF with both design and implementation level representativeness, and identify two AV-specific attack goals, off-road and wrong-way attacks. To systematically understand the security property, we first analyze the upper-bound attack effectiveness, and discover a take-over effect that can fundamentally defeat the MSF design principle. We perform a cause analysis and find that such vulnerability only appears dynamically and non-deterministically. Leveraging this insight, we design FusionRipper, a novel and general attack that opportunistically captures and exploits take-over vulnerabilities. We evaluate it on 6 real-world sensor traces, and find that FusionRipper can achieve at least 97% and 91.3% success rates in all traces for off-road and wrongway attacks respectively. We also find that it ismore »highly robust to practical factors such as spoofing inaccuracies. To improve the practicality, we further design an offline method that can effectively identify attack parameters with over 80% average success rates for both attack goals, with the cost of at most half a day. We also discuss promising defense directions.« less
  2. In Autonomous Driving (AD) systems, perception is both security and safety critical. Despite various prior studies on its security issues, all of them only consider attacks on cameraor LiDAR-based AD perception alone. However, production AD systems today predominantly adopt a Multi-Sensor Fusion (MSF) based design, which in principle can be more robust against these attacks under the assumption that not all fusion sources are (or can be) attacked at the same time. In this paper, we present the first study of security issues of MSF-based perception in AD systems. We directly challenge the basic MSF design assumption above by exploring the possibility of attacking all fusion sources simultaneously. This allows us for the first time to understand how much security guarantee MSF can fundamentally provide as a general defense strategy for AD perception. We formulate the attack as an optimization problem to generate a physically-realizable, adversarial 3D-printed object that misleads an AD system to fail in detecting it and thus crash into it. To systematically generate such a physical-world attack, we propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-basedmore »AD perception. We evaluate our attack on MSF algorithms included in representative open-source industry-grade AD systems in real-world driving scenarios. Our results show that the attack achieves over 90% success rate across different object types and MSF algorithms. Our attack is also found stealthy, robust to victim positions, transferable across MSF algorithms, and physical-world realizable after being 3D-printed and captured by LiDAR and camera devices. To concretely assess the end-to-end safety impact, we further perform simulation evaluation and show that it can cause a 100% vehicle collision rate for an industry-grade AD system. We also evaluate and discuss defense strategies.« less
  3. We present a framework for deformable object manipulation that interleaves planning and control, enabling complex manipulation tasks without relying on high-fidelity modeling or simulation. The key question we address is when should we use planning and when should we use control to achieve the task? Planners are designed to find paths through complex configuration spaces, but for highly underactuated systems, such as deformable objects, achieving a specific configuration is very difficult even with high-fidelity models. Conversely, controllers can be designed to achieve specific configurations, but they can be trapped in undesirable local minima owing to obstacles. Our approach consists of three components: (1) a global motion planner to generate gross motion of the deformable object; (2) a local controller for refinement of the configuration of the deformable object; and (3) a novel deadlock prediction algorithm to determine when to use planning versus control. By separating planning from control we are able to use different representations of the deformable object, reducing overall complexity and enabling efficient computation of motion. We provide a detailed proof of probabilistic completeness for our planner, which is valid despite the fact that our system is underactuated and we do not have a steering function. We thenmore »demonstrate that our framework is able to successfully perform several manipulation tasks with rope and cloth in simulation, which cannot be performed using either our controller or planner alone. These experiments suggest that our planner can generate paths efficiently, taking under a second on average to find a feasible path in three out of four scenarios. We also show that our framework is effective on a 16-degree-of-freedom physical robot, where reachability and dual-arm constraints make the planning more difficult.« less
  4. Recent studies demonstrated the vulnerability of control policies learned through deep reinforcement learning against adversarial attacks, raising concerns about the application of such models to risk-sensitive tasks such as autonomous driving. Threat models for these demonstrations are limited to (1) targeted attacks through real-time manipulation of the agent's observation, and (2) untargeted attacks through manipulation of the physical environment. The former assumes full access to the agent's states/observations at all times, while the latter has no control over attack outcomes. This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical objects in the environment, a threat model that combines the practicality and effectiveness of the existing ones. Through analysis, we demonstrate that a pre-trained policy can be hijacked within a time window, e.g., performing an unintended self-parking, when an adversarial object is present. To enable the attack, we adopt an assumption that the dynamics of both the environment and the agent can be learned by the attacker. Lastly, we empirically show the effectiveness of the proposed attack on different driving scenarios, perform a location robustness test, and study the tradeoff between the attack strength and its effectiveness Code is available at Targeted-Physical-Adversarial-Attacks-on-AD
  5. Background: Drivers gather most of the information they need to drive by looking at the world around them and at visual displays within the vehicle. Navigation systems automate the way drivers navigate. In using these systems, drivers offload both tactical (route following) and strategic aspects (route planning) of navigational tasks to the automated SatNav system, freeing up cognitive and attentional resources that can be used in other tasks (Burnett, 2009). Despite the potential benefits and opportunities that navigation systems provide, their use can also be problematic. For example, research suggests that drivers using SatNav do not develop as much environmental spatial knowledge as drivers using paper maps (Waters & Winter, 2011; Parush, Ahuvia, & Erev, 2007). With recent growth and advances of augmented reality (AR) head-up displays (HUDs), there are new opportunities to display navigation information directly within a driver’s forward field of view, allowing them to gather information needed to navigate without looking away from the road. While the technology is promising, the nuances of interface design and its impacts on drivers must be further understood before AR can be widely and safely incorporated into vehicles. Specifically, an impact that warrants investigation is the role of AR HUDS inmore »spatial knowledge acquisition while driving. Acquiring high levels of spatial knowledge is crucial for navigation tasks because individuals who have greater levels of spatial knowledge acquisition are more capable of navigating based on their own internal knowledge (Bolton, Burnett, & Large, 2015). Moreover, the ability to develop an accurate and comprehensive cognitive map acts as a social function in which individuals are able to navigate for others, provide verbal directions and sketch direction maps (Hill, 1987). Given these points, the relationship between spatial knowledge acquisition and novel technologies such as AR HUDs in driving is a relevant topic for investigation. Objectives: This work explored whether providing conformal AR navigational cues improves spatial knowledge acquisition (as compared to traditional HUD visual cues) to assess the plausibility and justification for investment in generating larger FOV AR HUDs with potentially multiple focal planes. Methods: This study employed a 2x2 between-subjects design in which twenty-four participants were counterbalanced by gender. We used a fixed base, medium fidelity driving simulator for where participants drove while navigating with one of two possible HUD interface designs: a world-relative arrow post sign and a screen-relative traditional arrow. During the 10-15 minute drive, participants drove the route and were encouraged to verbally share feedback as they proceeded. After the drive, participants completed a NASA-TLX questionnaire to record their perceived workload. We measured spatial knowledge at two levels: landmark and route knowledge. Landmark knowledge was assessed using an iconic recognition task, while route knowledge was assessed using a scene ordering task. After completion of the study, individuals signed a post-trial consent form and were compensated $10 for their time. Results: NASA-TLX performance subscale ratings revealed that participants felt that they performed better during the world-relative condition but at a higher rate of perceived workload. However, in terms of perceived workload, results suggest there is no significant difference between interface design conditions. Landmark knowledge results suggest that the mean number of remembered scenes among both conditions is statistically similar, indicating participants using both interface designs remembered the same proportion of on-route scenes. Deviance analysis show that only maneuver direction had an influence on landmark knowledge testing performance. Route knowledge results suggest that the proportion of scenes on-route which were correctly sequenced by participants is similar under both conditions. Finally, participants exhibited poorer performance in the route knowledge task as compared to landmark knowledge task (independent of HUD interface design). Conclusions: This study described a driving simulator study which evaluated the head-up provision of two types of AR navigation interface designs. The world-relative condition placed an artificial post sign at the corner of an approaching intersection containing a real landmark. The screen-relative condition displayed turn directions using a screen-fixed traditional arrow located directly ahead of the participant on the right or left side on the HUD. Overall results of this initial study provide evidence that the use of both screen-relative and world-relative AR head-up display interfaces have similar impact on spatial knowledge acquisition and perceived workload while driving. These results contrast a common perspective in the AR community that conformal, world-relative graphics are inherently more effective. This study instead suggests that simple, screen-fixed designs may indeed be effective in certain contexts.« less