skip to main content


Title: Navigating mobile robots to target in near shortest time using reinforcement learning with spiking neural networks
The autonomous navigation of mobile robots in unknown environments is of great interest in mobile robotics. This article discusses a new strategy to navigate to a known target location in an unknown environment using a combination of the “go-to-goal” approach and reinforcement learning with biologically realistic spiking neural networks. While the “goto-goal” approach itself might lead to a solution for most environments, the added neural reinforcement learning in this work results in a strategy that takes the robot from a starting position to a target location in a near shortest possible time. To achieve the goal, we propose a reinforcement learning approach based on spiking neural networks. The presented biologically motivated delayed reward mechanism using eligibility traces results in a greedy approach that leads the robot to the target in a close to shortest possible time.  more » « less
Award ID(s):
1639995
NSF-PAR ID:
10026440
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of ... International Joint Conference on Neural Networks
ISSN:
2161-4393
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Collision avoidance is a key technology enabling applications such as autonomous vehicles and robots. Various reinforcement learning techniques such as the popular Q-learning algorithms have emerged as a promising solution for collision avoidance in robotics. While spiking neural networks (SNNs), the third generation model of neural networks, have gained increased interest due to their closer resemblance to biological neural circuits in the brain, the application of SNNs to mobile robot navigation has not been well studied. Under the context of reinforcement learning, this paper aims to investigate the potential of biologically-motivated spiking neural networks for goal-directed collision avoidance in reasonably complex environments. Unlike the existing additive reward-modulated spike timing dependent plasticity learning rule (A-RM-STDP), for the first time, we explore a new multiplicative RM-STDP scheme (M-RM-STDP) for the targeted application. Furthermore, we propose a more biologically plausible feed-forward spiking neural network architecture with fine-grained global rewards. Finally, by combining the above two techniques we demonstrate a further improved solution to collision avoidance. Our proposed approaches not only completely outperform Q-learning for cases where Q-learning can hardly reach the target without collision, but also significantly outperform a baseline SNN with A-RMSTDP in terms of both success rate and the quality of navigation trajectories. 
    more » « less
  2. null (Ed.)
    Intelligent mobile robots have recently become able to operate autonomously in large-scale indoor environments for extended periods of time. In this process, mobile robots need the capabilities of both task and motion planning. Task planning in such environments involves sequencing the robot’s high-level goals and subgoals, and typically requires reasoning about the locations of people, rooms, and objects in the environment, and their interactions to achieve a goal. One of the prerequisites for optimal task planning that is often overlooked is having an accurate estimate of the actual distance (or time) a robot needs to navigate from one location to another. State-of-the-art motion planning algorithms, though often computationally complex, are designed exactly for this purpose of finding routes through constrained spaces. In this article, we focus on integrating task and motion planning (TMP) to achieve task-level-optimal planning for robot navigation while maintaining manageable computational efficiency. To this end, we introduce TMP algorithm PETLON (Planning Efficiently for Task-Level-Optimal Navigation), including two configurations with different trade-offs over computational expenses between task and motion planning, for everyday service tasks using a mobile robot. Experiments have been conducted both in simulation and on a mobile robot using object delivery tasks in an indoor office environment. The key observation from the results is that PETLON is more efficient than a baseline approach that pre-computes motion costs of all possible navigation actions, while still producing plans that are optimal at the task level. We provide results with two different task planning paradigms in the implementation of PETLON, and offer TMP practitioners guidelines for the selection of task planners from an engineering perspective. 
    more » « less
  3. We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robotic systems have relied on centralized motion planners, whose run times often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a shared policy network is trained to control each individual robot arm to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multiarm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets. 
    more » « less
  4. Mobile devices such as drones and autonomous vehicles increasingly rely on object detection (OD) through deep neural networks (DNNs) to perform critical tasks such as navigation, target-tracking and surveillance, just to name a few. Due to their high complexity, the execution of these DNNs requires excessive time and energy. Low-complexity object tracking (OT) is thus used along with OD, where the latter is periodically applied to generate "fresh" references for tracking. However, the frames processed with OD incur large delays, which does not comply with real-time applications requirements. Offloading OD to edge servers can mitigate this issue, but existing work focuses on the optimization of the offloading process in systems where the wireless channel has a very large capacity. Herein, we consider systems with constrained and erratic channel capacity, and establish parallel OT (at the mobile device) and OD (at the edge server) processes that are resilient to large OD latency. We propose Katch-Up, a novel tracking mechanism that improves the system resilience to excessive OD delay. We show that this technique greatly improves the quality of the reference available to tracking, and boosts performance up to 33%. However, while Katch-Up significantly improves performance, it also increases the computing load of the mobile device. Hence, we design SmartDet, a low-complexity controller based on deep reinforcement learning (DRL) that learns to achieve the right trade-off between resource utilization and OD performance. SmartDet takes as input highly-heterogeneous context-related information related to the current video content and the current network conditions to optimize frequency and type of OD offloading, as well as Katch-Up utilization. We extensively evaluate SmartDet on a real-world testbed composed by a JetSon Nano as mobile device and a GTX 980 Ti as edge server, connected through a Wi-Fi link, to collect several network-related traces, as well as energy measurements. We consider a state-of-the-art video dataset (ILSVRC 2015 - VID) and state-of-the-art OD models (EfficientDet 0, 2 and 4). Experimental results show that SmartDet achieves an optimal balance between tracking performance – mean Average Recall (mAR) and resource usage. With respect to a baseline with full Katch-Up usage and maximum channel usage, we still increase mAR by 4% while using 50% less of the channel and 30% power resources associated with Katch-Up. With respect to a fixed strategy using minimal resources, we increase mAR by 20% while using Katch-Up on 1/3 of the frames. 
    more » « less
  5. The advances in deep reinforcement learning re- cently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment [1]. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches. 
    more » « less