skip to main content


Title: Soaring like a bird via reinforcement learning in the field
Soaring birds often rely on ascending thermal plumes in the atmosphere as they search for prey or migrate across large distances. The landscape of convective currents is turbulent and rapidly shifts on timescales of a few minutes as thermals constantly form, disintegrate, or are transported away by the wind. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning can be used to find an effective navigational strategy as a sequence of decisions taken in response to environmental cues. Reinforcement learning was applied to train gliders in the field to autonomously navigate atmospheric thermals. Gliders of two-meter wingspan were equipped with a flight controller that enabled an on-board implementation of autonomous flight policies via precise control over their bank angle and pitch. Learning is severely challenged by a multitude of physical effects and the unpredictability of the natural environment. A navigational strategy was determined solely from the experiences collected over several days in the field using exploratory behavioral policies. Bird-like performance was achieved and several viable biological mechanosensory cues were identified for soaring birds, which are also directly applicable to the development of autonomous soaring vehicles.  more » « less
Award ID(s):
1735004
NSF-PAR ID:
10078614
Author(s) / Creator(s):
Date Published:
Journal Name:
07 Nature
Volume:
562
ISSN:
1260-3368
Page Range / eLocation ID:
236–239
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A swarm of unmanned aerial vehicles (UAVs) can be used for many applications, including disaster relief, search and rescue, and establishing communication networks, due to its mobility, scalability, and robustness to failure. However, a UAV swarm’s performance is typically limited by each agent’s stored energy. Recent works have considered the usage of thermals, or vertical updrafts of warm air, to address this issue. One challenge lies in a swarm of UAVs detecting and taking advantage of these thermals. Inspired by hawks, a swarm could take advantage of thermals better than individuals due to the swarm’s distributed sensing abilities. To determine which emergent behaviors increase survival time, simulation software was created to test the behavioral models of UAV gliders around thermals. For simplicity and robustness, agents operate with limited information about other agents. The UAVs’ motion was implemented as a Boids model, replicating the behavior of flocking birds through cohesion, separation, and alignment forces. Agents equipped with a modified behavioral model exhibit dynamic flocking behavior, including relative ascension-based cohesion and relative height-based separation and alignment. The simulation results show the agents flocking to thermals and improving swarm survival. These findings present a promising method to extend the flight time of autonomous UAV swarms. 
    more » « less
  2. Faisal, Aldo A (Ed.)
    Animals display characteristic behavioural patterns when performing a task, such as the spiraling of a soaring bird or the surge-and-cast of a male moth searching for a female. Identifying such recurring sequences occurring rarely in noisy behavioural data is key to understanding the behavioural response to a distributed stimulus in unrestrained animals. Existing models seek to describe the dynamics of behaviour or segment individual locomotor episodes rather than to identify the rare and transient sequences of locomotor episodes that make up the behavioural response. To fill this gap, we develop a lexical, hierarchical model of behaviour. We designed an unsupervised algorithm called “BASS” to efficiently identify and segment recurring behavioural action sequences transiently occurring in long behavioural recordings. When applied to navigating larval zebrafish, BASS extracts a dictionary of remarkably long, non-Markovian sequences consisting of repeats and mixtures of slow forward and turn bouts. Applied to a novel chemotaxis assay, BASS uncovers chemotactic strategies deployed by zebrafish to avoid aversive cues consisting of sequences of fast large-angle turns and burst swims. In a simulated dataset of soaring gliders climbing thermals, BASS finds the spiraling patterns characteristic of soaring behaviour. In both cases, BASS succeeds in identifying rare action sequences in the behaviour deployed by freely moving animals. BASS can be easily incorporated into the pipelines of existing behavioural analyses across diverse species, and even more broadly used as a generic algorithm for pattern recognition in low-dimensional sequential data. 
    more » « less
  3. Abstract

    Uncrewed aerial vehicles are integral to a smart city framework, but the dynamic environments above and within urban settings are dangerous for autonomous flight. Wind gusts caused by the uneven landscape jeopardize safe and effective aircraft operation. Birds rapidly reject gusts by changing their wing shape, but current gust alleviation methods for aircraft still use discrete control surfaces. Additionally, modern gust alleviation controllers challenge small uncrewed aerial vehicle power constraints by relying on extensive sensing networks and computationally expensive modeling. Here we show end-to-end deep reinforcement learning forgoing state inference to efficiently alleviate gusts on a smart material camber-morphing wing. In a series of wind tunnel gust experiments at the University of Michigan, trained controllers reduced gust impact by 84% from on-board pressure signals. Notably, gust alleviation using signals from only three pressure taps was statistically indistinguishable from using six pressure tap signals. By efficiently rejecting environmental perturbations, reduced-sensor fly-by-feel controllers open the door to small uncrewed aerial vehicle missions in cities.

     
    more » « less
  4. Task and motion planning subject to linear temporal logic (LTL) specifications in complex, dynamic environments requires efficient exploration of many possible future worlds. model‐free reinforcement learning has proven successful in a number of challenging tasks, but shows poor performance on tasks that require long‐term planning. in this work, we integrate Monte Carlo tree search with hierarchical neural net policies trained on expressive LTL specifications. we use reinforcement learning to find deep neural networks representing both low‐level control policies and task‐level ``option policies'' that achieve high‐level goals. our combined architecture generates safe and responsive motion plans that respect theLTL constraints. we demonstrate our approach in a simulated autonomous driving setting, where a vehicle must drive down a road in traffic, avoid collisions, and navigate an intersection, all while obeying rules of the road. 
    more » « less
  5. Background: Drivers gather most of the information they need to drive by looking at the world around them and at visual displays within the vehicle. Navigation systems automate the way drivers navigate. In using these systems, drivers offload both tactical (route following) and strategic aspects (route planning) of navigational tasks to the automated SatNav system, freeing up cognitive and attentional resources that can be used in other tasks (Burnett, 2009). Despite the potential benefits and opportunities that navigation systems provide, their use can also be problematic. For example, research suggests that drivers using SatNav do not develop as much environmental spatial knowledge as drivers using paper maps (Waters & Winter, 2011; Parush, Ahuvia, & Erev, 2007). With recent growth and advances of augmented reality (AR) head-up displays (HUDs), there are new opportunities to display navigation information directly within a driver’s forward field of view, allowing them to gather information needed to navigate without looking away from the road. While the technology is promising, the nuances of interface design and its impacts on drivers must be further understood before AR can be widely and safely incorporated into vehicles. Specifically, an impact that warrants investigation is the role of AR HUDS in spatial knowledge acquisition while driving. Acquiring high levels of spatial knowledge is crucial for navigation tasks because individuals who have greater levels of spatial knowledge acquisition are more capable of navigating based on their own internal knowledge (Bolton, Burnett, & Large, 2015). Moreover, the ability to develop an accurate and comprehensive cognitive map acts as a social function in which individuals are able to navigate for others, provide verbal directions and sketch direction maps (Hill, 1987). Given these points, the relationship between spatial knowledge acquisition and novel technologies such as AR HUDs in driving is a relevant topic for investigation. Objectives: This work explored whether providing conformal AR navigational cues improves spatial knowledge acquisition (as compared to traditional HUD visual cues) to assess the plausibility and justification for investment in generating larger FOV AR HUDs with potentially multiple focal planes. Methods: This study employed a 2x2 between-subjects design in which twenty-four participants were counterbalanced by gender. We used a fixed base, medium fidelity driving simulator for where participants drove while navigating with one of two possible HUD interface designs: a world-relative arrow post sign and a screen-relative traditional arrow. During the 10-15 minute drive, participants drove the route and were encouraged to verbally share feedback as they proceeded. After the drive, participants completed a NASA-TLX questionnaire to record their perceived workload. We measured spatial knowledge at two levels: landmark and route knowledge. Landmark knowledge was assessed using an iconic recognition task, while route knowledge was assessed using a scene ordering task. After completion of the study, individuals signed a post-trial consent form and were compensated $10 for their time. Results: NASA-TLX performance subscale ratings revealed that participants felt that they performed better during the world-relative condition but at a higher rate of perceived workload. However, in terms of perceived workload, results suggest there is no significant difference between interface design conditions. Landmark knowledge results suggest that the mean number of remembered scenes among both conditions is statistically similar, indicating participants using both interface designs remembered the same proportion of on-route scenes. Deviance analysis show that only maneuver direction had an influence on landmark knowledge testing performance. Route knowledge results suggest that the proportion of scenes on-route which were correctly sequenced by participants is similar under both conditions. Finally, participants exhibited poorer performance in the route knowledge task as compared to landmark knowledge task (independent of HUD interface design). Conclusions: This study described a driving simulator study which evaluated the head-up provision of two types of AR navigation interface designs. The world-relative condition placed an artificial post sign at the corner of an approaching intersection containing a real landmark. The screen-relative condition displayed turn directions using a screen-fixed traditional arrow located directly ahead of the participant on the right or left side on the HUD. Overall results of this initial study provide evidence that the use of both screen-relative and world-relative AR head-up display interfaces have similar impact on spatial knowledge acquisition and perceived workload while driving. These results contrast a common perspective in the AR community that conformal, world-relative graphics are inherently more effective. This study instead suggests that simple, screen-fixed designs may indeed be effective in certain contexts. 
    more » « less