<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Self-Driving Vehicles: Key Technical Challenges and Progress Off the Road</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10144525</idno>
					<idno type="doi">10.1109/MPOT.2019.2939376</idno>
					<title level='j'>IEEE Potentials</title>
<idno>0278-6648</idno>
<biblScope unit="volume">39</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Michael Milford</author><author>Sam Anthony</author><author>Walter Scheirer</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>I n a period of fewer than 10 years, the quest for self-driving vehicles, also referred to as autonomous vehicles (AVs) or driverless cars, has become one of the biggest technology races in the world, with tens of billions of dollars poured into companies and start-ups. The goal is an on-road, consumer-driverless car: whether owned by individuals or part of a centralized ride-sharing fleet, this is the area where the majority of investment has occurred. However, AVs have been around for much longer in other fields, such as mining, which share some but not all of the same technical challenges faced by on-road AVs. In this article, we provide an overview of the key technical challenges and solutions for both onand off-road AVs, with a focus on one of the key unsolved challengesinteraction with vulnerable road users (VRUs).</p><p>solutions to the problem of autonomy. Larger vehicles are typically heavier and harder to stop-and more damaging when they hit something-but they can carry a greater number of onboard sensing and computing components. Energy storage also generally scales favorably with vehi-cle size, an important consideration that can enable better up-time percentages and utilization of more power-hungry computing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sensor suites</head><p>AV platforms have access to a range of sensing technologies (Fig. <ref type="figure">1</ref>).</p><p>Lidar-and laser-based range sensors provide accurate long distance range-to-object information and can also use reflectance information to detect lane markings. However, in adverse weather, such as rain or smoke, their capabilities can be significantly degraded.</p><p>Modern camera technology provides very-high-resolution imagery of the environment, with good dynamic range (revealing detail both in bright and dark areas of an image simultaneously) and high frame rates (Fig. <ref type="figure">2</ref>). The information present in a camera image is much richer than that produced by any other sensing modality, provided it can be successfully extracted-the widely quoted proof of concept here being that humans can drive very well with primarily visual sensing alone. Cameras are often less expensive and require less power than lidar, but they are sensitive to changes in environmental appearance caused by factors such as day-night cycles (Fig. <ref type="figure">3</ref>).</p><p>Radar's primary purpose in most current AV applications is collision avoidance: although it does not have good acuity and, as a result, struggles to distinguish small objects, it is relatively resilient to environmental conditions, such as adverse weather, and can see through smoke and fog quite well. Finally, sensors such as GPS receivers provide positioning information (which can be disrupted by tunnels or tall buildings), while internal sensors deliver information such as linear acceleration, rotational rate, steering angle, and wheel speed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Computational hardware</head><p>Computer hardware provides the processing power to perform all of the onboard autonomy-related tasks, such as scene understanding, navigation, and high-level control. To maximize electric-vehicle range, recent hardware trends have focused on power usage per computing unit. Nvidia is a good example of a key player in this space, with power-efficient, highly capable systems such as its Jetson AGX Xavier. Offboard computing still has a useful role to play in AV applications-for example, in the consolidation and merging of the massive amounts of data uploaded by thousands of cars in a city on a daily basis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Key technical competencies: Software</head><p>The software operating on AVs performs a number of key technical competencies, including localization, planning, decision making, and scene understanding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mapping and localization</head><p>Mapping and localization are key pillars of AV operation. There are several subtypes of localization, each of which plays a different role in enabling autonomy on a vehicle (Fig. <ref type="figure">4</ref>).</p><p>Simultaneous localization and mapping has long been a major research field in robotics: how does a robot move through an environment, building up a map of that environment, while simultaneously localizing itself within that everchanging map? Approximate localizationwhat you get on your phone's GPSis typically used for overall route planning and is obtained from GPS or onboard localization systems. Automation-enabling higher-precision localization is typically provided by onboard localization within existing maps of the environment or, in the case of some autonomous mining vehicles, high-accuracy GPS.</p><p>Relative localization is also important-for example, knowing that the vehicle is currently located 0.73 m from the edge of the road. Accurate relative positioning (and velocities and accelerations) with respect to moving objects, such as an oncoming car, is critical for safe vehicle planning and control.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Planning, decision making, and control</head><p>Just as critical to an AV's viability as sensing and mapping is what is then done with that information: how does the vehicle plan and then act, whether to accelerate, brake, turn, or activate a turning indicator? These processes play a critical role in safety; the planning system must continually plan safe actions, such as slowing down or suddenly changing lanes to avoid an unexpected obstacle when braking is not an option.</p><p>The planning and decision-making process also changes significantly for on-road delivery vehicles that carry goods rather than people, such as those used by Nuro. In accident situations with these AVs, there is no tension between protecting humans inside and outside the vehicle, so the safety of humans outside can be entirely prioritized. The information present in a camera image is much richer than that produced by any other sensing modality, provided it can be successfully extracted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Interaction with VRUs</head><p>Vehicles that have reached Society of Automotive Engineers level 3 autonomy and above will have to know how to interact with humans. This includes human drivers, who-even in a world where AVs are rapidly adopted-will be on the roads for the foreseeable future. Bicyclists, pedestrians, motorcyclists, scooter riders: these categories of VRUs have enduring claim to their share of the urban pavement. Interacting safely, explainably, and politely with VRUs is likely to remain an essential part of the AV's task (Fig. <ref type="figure">5</ref>). Pedestrians and cyclists are not predictable with standard techniques, such as Kalman filters. Simply stopping every time a VRU could potentially enter the vehicle's path results in vehicles that perform excessive and unnecessary emergency maneuvers. Overall, 86% of documented incidents with AVs are either rear-endings or sideswipings that result from a human's misunderstanding of an AV's behavior. Under-standing VRUs is key to eliminating this failure mode.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Moving away from the trolley problem mind-set</head><p>Much of the attention devoted to interactions between AVs and VRUs has focused on ethical dilemmas. A famous thought experiment is the "trolley problem," where, in the eponymous problem, a trolley driver is forced to choose which of two actions, both of which cause someone's death, is more morally acceptable; this has been held up as a model for the kinds of decisions AVs will have to make. Although it may someday be the case that AVs are sophisticated enough and have good enough information about the world that the primary concern with VRU interaction is how to behave ethically in the unlikely event that there is no option but to cause catastrophic bodily harm to a human, there are several reasons why this is not currently a primary concern to AV makers.</p><p>First, the starting goal for many vehicle makers is finding a motion plan that provably minimizes or eliminates any chance of a harm-causing interaction. The Intel division Mobil-Eye has published work attempting to formalize risk analysis in motion planning to develop behavior plans where a negative interaction is impossible. Second, the types of ethical dilemmas discussed in most trolleyproblem research rely on very finegrained categorization of VRUs-an old person versus a young person, a pregnant woman versus a helmetless cyclist, and so on-that are largely out of reach for current perception systems in AVs. Third, much of the current focus in AVs is on minimizing harm in general, and one way to do that is to plan around the level of damage likely to be caused. For these reasons and others, the ethical considerations raised by the trolley problem are increasingly not being considered as the most immediate practical challenge for AVs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Key technical breakdown</head><p>VRUs are sometimes difficult to distinguish: from the waist up, a cyclist, FIG5 Detecting and predicting the intent of VRUs, such as cyclists and pedestrians, is a critical challenge for AVs. (Source: Perceptive Automata; used with permission.) FIG4 All errors are not created equal. For example, for second-to-second control in a mining tunnel, minimizing lateral error is more important than downtrack (along the length of the tunnel) error since the immediate risk is hitting the wall. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recognizing VRUs</head><p>When we use the term recognition, we mean automatically determining what exactly is in the scene as sensed by the vehicle (Fig. <ref type="figure">6</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recognizing VRU actions</head><p>After detection and recognition of a VRU comes activity recognition-what the VRU is doing. Take one scenario: traffic officers signaling cars to follow a detour by waving in a certain direction. With a correct determination of what the officer's action means, the vehicle can alter its course and safely proceed as directed. This is a nontrivial sequence of events that must unfold within seconds and be executed with a level of accuracy that matches that of a hu-man driver. As with other areas of visual recognition within computer vision, great strides have been made in action recognition, but current approaches are not as robust as human drivers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Predicting VRU actions</head><p>Arguably, the most important aspect of interacting with VRUs is prediction. A motor vehicle that is traveling straight at 25 mi/h on a road and does not have its brake lights illuminated can be assumed to continue traveling at approximately 25 mi/h, at least momentarily. Compared with vehicles, VRUs have many fewer constraints in terms of traffic signals, other traffic, and rules of the road and, therefore, have much more variability in potential paths.</p><p>Much work on the prediction of VRU actions has relied on fundamentally physics-based models: If you know the location and trajectory of the pedestrian, how well can you extrapolate his/her future trajectory? Elaborations have included the use of cues, such as the presence of relevant context like crosswalks, and the integration of information regarding the pose of the person. These approaches have proven to be relatively robust on very short time scales, but they have not been able to provide useful predictions outside of a time window of about 1.5 s. At normal urban driving speeds, that's not enough. One proposed solution is to model the dynamics of all of the actors at an intersection, which critically relies on being able to accurately model every agent in the scene.</p><p>Almost all of the current approaches have another shortcoming: that the drivers with which VRUs are most comfortable interacting-humansdo nothing like either of these approaches. Humans have a finely tuned and remarkably high-functioning facility called theory of mind, which allows them to make behaviorally useful assumptions about the internal mental state of another human. A human driver isn't trying to guess the trajectory of a pedestrian; instead, he or she is making sophisticated inferential judgments about what that pedestrian's goals are and how that pedestrian might interact in a social process with the vehicle. Approaches that model this concept look promising.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Communicating car intent to VRUs</head><p>The interaction between VRUs and human-driven vehicles begins when FIG6 Reliably detecting and recognizing VRUs, such as cyclists, is difficult enough under normal conditions but is compounded in poor visual conditions and when the VRU (indicated by the red box) is partially obscured by other objects in the environment. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Current technical issues</head><p>With the field maturing over the past 15 years since the DARPA Grand Challenge AV competition in 2004 (which is widely credited as catalyzing the modern AV technology race), it has become relatively clear that some key technical issues remain unsolved, and these are generally widely acknowledged by both industry and researchers working in this area. One of the most significant topics is interactions with VRUs, which we have already covered. Here, we briefly highlight some of the other challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The problem of corner cases</head><p>Corner cases, as they have become known, are situations that rarely occur and, as a result, are hard to predict, anticipate, and react to appropriately. A person dressed in a chicken suit is one example of a corner case. For self-driving cars, the problem is particularly difficult because the current artificial intelligence techniques behind these systems do not generalize as well as a human driver and have difficulty coping with these highly unusual situations. Consequently, much effort is being invested in coming up with ways to deal more effectively with these corner cases by gathering ever-larger amounts of data from the real world, simulating billions of miles of driving, and repeatedly testing pathologically difficult scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Simulation versus real-world testing</head><p>A key issue for AV developers is that cars are already quite safe: there is approximately one fatality for every 100 million miles of driving. Consequently, it is very difficult under normal conditions to obtain sufficient mileage on a limited number of development vehicles to prove the safety of a system. Therefore, developers have turned to simulation as a critical tool in their autonomy arsenal. High-fidelity simulation environments enable researchers to target specific weather conditions and pedestrian configurations and run much-higherthroughput simulation and evaluation than is possible in the real world.</p><p>A key challenge in using simulation arises from the transferability problem: how do you show and prove that the system you have developed in simulation will work as well in the real world? Simulation is never a perfect replication of reality. Many resources and much effort have consequently been invested in improving the utilization and transferability of development in simulation environments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sometimes versus anytime: Weather and other environmental conditions</head><p>The real world is a constantly changing environment, which presents major challenges for AVs. First and foremost, the environment can change in both appearance and physical structure due to day-night cycles; seasonal change; and weather conditions, such as rain, snow, and fog.</p><p>Figure <ref type="figure">3</ref> illustrates some of the key challenges that a changing world can cause. The same place [shown in the left column of Fig. <ref type="figure">3(a)</ref>] can appear completely different at night during a tropical storm versus clear weather in the daytime. The problem is further complicated by the natural environmental aliasing that can also be encountered, shown in the left column of Fig. <ref type="figure">3(b)</ref>: these are two places that are completely different locations but look highly similar.</p><p>These problems can be partly solved by using advanced methods or sensors that are not as sensitive to appearance change, such as lidar. However, visual sensing is critical for the rich, nuanced understanding of the world around an AV, and, consequently, the problem of operating in challenging visual conditions remains relevant and unsolved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Provability, explainability, and self-characterization</head><p>A significant shortcoming of the present generation of self-driving vehicles (and deep learning in general) is the difficulty in describing the properties of their underlying deep-learning models in a rigorous manner. In essence, the learning problem during training is one of function approximation, where the approximated function cannot be recovered in an exact manner afterward. (This is why neural networks have a reputation of being black boxes.) We would like to be able to enforce explainability for any output of a deep-learning model, but since we cannot examine any learned functions directly, we can only turn to the observable output of the system-the same situation psychologists find themselves in when studying the human brain. One possibility, then, is to test the deeplearning models in a manner similar to how psychologists test the brain.</p><p>For some applications, pausing and handing off control to a human operator</p><p>The pedestrian wants to know that the car recognizes that he or she is there, the car seeks to know what the pedestrian wants, and so on.</p><p>is feasible, but only if the system is able to assess its own performance reliably. To do this, probabilistic outputs reflecting uncertainty are required.</p><p>For deep-learning-based systems, this can be accomplished with strategies, such as making small perturbations to the weights of the network, dropping out units of a trained network at test time, using a probabilistically calibrated readout layer, or examining statistical distributions of the data sampled by the sensors. The choice of distribution is important: underestimating the occurrence of rare events can be dangerous, but overestimating them may be problematic for usability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>AVs beyond the road</head><p>Beyond the road, AVs have been or could be deployed in a range of other domains, including mining, logistics, agriculture, and defense. Here, we briefly cover the key deployment domains and their unique problems and opportunities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Mining</head><p>Mining, in general, has several of the key characteristics that facilitated its early adoption of AVs: it is large enough to support the capital-intensive development of AV-related technology, its existing remote operation workflows are more easily automated, and there are fewer latency-critical scenarios, meaning that occasional handover to a remote operator is feasible. One example milestone in AVs in mining is Rio Tinto's autonomous haulage system, which recently hauled its one billionth ton autonomously.</p><p>Mining is a challenging environment (Fig. <ref type="figure">7</ref>). Underground, there is no access to satellite-based GPS, so alternative technological solutions are required: some involve installation of additional infrastructure, local Wi-Fi networks, or on-vehicle camera-and laser-based localization solutions. Onboard camerabased solutions encounter a range of challenging perceptual conditions: dust, smoke, water, and highly varied lighting conditions. Range-sensorbased solutions encounter a different set of challenges, including the highly aliased geometry of many underground tunnel systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Logistics</head><p>It is possible to design an entire logistics center to facilitate higher levels of automation. Amazon's fulfillment centers, built on top of its acquisition of Kiva Systems, are a prime example of this: the autonomous robots move shelving around rather than attempt to pick things off static shelves. Other approaches, such as Ocado's, involve a rigid square lattice on which robots move around, picking up and dropping off grocery loads. In both cases, humans are restricted to certain areas of the environment, so human safety issues are significantly reduced as a technological concern.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Agriculture</head><p>Farms generally have relatively controlled access and minimal to no human presence in the operational Underestimating the occurrence of rare events can be dangerous, but overestimating them may be problematic for usability.</p><p>zone of an AV. In addition, it can sometimes be hard to find people to fill some labor roles, further motivating the case for developing AVs. Autonomous farming vehicles can perform a range of activities, including sowing and planting crops, killing weeds, and the long-term holy grail: harvesting crops. Progress has been slow: although there have been dozens of AV trials, there are few longterm commercial deployments (Fig. <ref type="figure">8</ref>).</p><p>Most of the more capable platform demonstrations have been announced only in the past 2-3 years.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Defense</head><p>In defense, as in mining, the cost per unit of many vehicle types is typically far larger than that of a normal consumer car, enabling the use of more capable sensing and computing. Much modern defense theory assumes that there will be a complete blackout on both communications and GPS-based positioning technologies (similar to the conditions imposed on underground autonomous mining trucks), meaning that on-vehicle autonomy will have to shoulder the bulk of the decision making rather than relying on outsourcing to a human at a re mote command post.</p><p>The environments that these vehicles might deploy into, such as ruined, dusty, or smoking urban landscapes and thickly vegetated forests, pose a range of challenging mobility, perception, planning, and control challenges. Finally, there are also the ethical considerations around autonomy in any defense application, which are receiving significant sustained attention.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Other fields</head><p>There are almost 40 marine ports that are at least partly automated globally, and some of those autonomous components involve AVs, for example, shifting shipping containers around. Other areas of AV de ployment include sidewalkbased delivery vehicles, such as Amazon's Scout program and Starship technologies. These vehicles are typically relatively small and inexpensive, and they move at relatively low speeds, radically reducing their danger profile compared to on-road larger vehicles moving at higher speeds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>AV-enabling technology has matured and advanced significantly over the past decade in a range of domains, including on-road passenger-carrying or delivery vehicles, mining, and logistics. In some application areas, such as logistics and mining, these vehicles already form a commercially critical part of the companies that operate them, whereas in others, most notably on-road AVs, widespread commercial deployment has not yet occurred.</p><p>Much of the core technology is likely to continue benefitting from steady progress in sensing and computing capabilities (along with a corresponding decrease in price) and the associated progress in vital technical capabilities, such as general scene understanding and VRU interaction. In fields where safety is not directly involved, such as those where humans are physically absent from the operating environment of AVs, future progress will likely be determined by simple commercial calculations based on the cost and efficiency of AV systems. However, there remain key technical hurdles to overcome with respect to safety for widespread on-road deployment, which will make for interesting years ahead.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: UNIVERSITY NOTRE DAME. Downloaded on April 14,2020 at 22:05:55 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
