skip to main content


Search for: All records

Award ID contains: 1837515

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Intelligence involves processing sensory experiences into representations useful for prediction. Understanding sensory experiences and building these contextual representations without prior knowledge of sensor models and environment is a challenging unsupervised learning problem. Current machine learning methods process new sensory data using prior knowledge defined by either domain knowledge or datasets. When datasets are not available, data acquisition is needed, though automating exploration in support of learning is still an unsolved problem. Here we develop a method that enables agents to efficiently collect data for learning a predictive sensor model—without requiring domain knowledge, human input, or previously existing data—using ergodicity to specify the data acquisition process. This approach is based entirely on data-driven sensor characteristics rather than predefined knowledge of the sensor model and its physical characteristics. We learn higher quality models with lower energy expenditure during exploration for data acquisition compared to competing approaches, including both random sampling and information maximization. In addition to applications in autonomy, our approach provides a potential model of how animals use their motor control to develop high quality models of their sensors (sight, sound, touch) before having knowledge of their sensor capabilities or their surrounding environment.

     
    more » « less
  2. We consider the problem of distributed pose graph optimization (PGO) that has important applications in multi- robot simultaneous localization and mapping (SLAM). We pro- pose the majorization minimization (MM) method for distributed PGO (MM−PGO) that applies to a broad class of robust loss kernels. The MM−PGO method is guaranteed to converge to first-order critical points under mild conditions. Furthermore, noting that the MM−PGO method is reminiscent of proximal methods, we leverage Nesterov’s method and adopt adaptive restarts to accelerate convergence. The resulting accelerated MM methods for distributed PGO—both with a master node in the network (AMM−PGO∗) and without (AMM−PGO#)— have faster convergence in contrast to the MM−PGO method without sacrificing theoretical guarantees. In particular, the AMM−PGO# method, which needs no master node and is fully decentralized, features a novel adaptive restart scheme and has a rate of convergence comparable to that of the AMM−PGO∗ method using a master node to aggregate information from all the nodes. The efficacy of this work is validated through extensive applications to 2D and 3D SLAM benchmark datasets and comprehensive comparisons against existing state-of-the-art methods, indicating that our MM methods converge faster and result in better solutions to distributed PGO. The code is available at https://github.com/MurpheyLab/DPGO. 
    more » « less
    Free, publicly-accessible full text available October 16, 2024
  3. During a natural disaster such as hurricane, earth- quake, or fire, robots have the potential to explore vast areas and provide valuable aid in search & rescue efforts. These scenar- ios are often high-pressure and time-critical with dynamically- changing task goals. One limitation to these large scale deploy- ments is effective human-robot interaction. Prior work shows that collaboration between one human and one robot benefits from shared control. Here we evaluate the efficacy of shared control for human-swarm teaming in an immersive virtual reality environment. Although there are many human-swarm interaction paradigms, few are evaluated in high-pressure settings representative of their intended end use. We have developed an open-source virtual reality testbed for realistic evaluation of human-swarm teaming performance under pressure. We conduct a user study (n=16) comparing four human-swarm paradigms to a baseline condition with no robotic assistance. Shared control significantly reduces the number of instructions needed to operate the robots. While shared control leads to marginally improved team performance in experienced participants, novices perform best when the robots are fully autonomous. Our experimental results suggest that in immersive, high-pressure settings, the benefits of robotic assistance may depend on how the human and robots interact and the human operator’s expertise. 
    more » « less
    Free, publicly-accessible full text available October 2, 2024
  4. Early research on physical human–robot interaction (pHRI) has necessarily focused on device design—the creation of compliant and sensorized hardware, such as exoskeletons, prostheses, and robot arms, that enables people to safely come in contact with robotic systems and to communicate about their collaborative intent. As hardware capabilities have become sufficient for many applications, and as computing has become more powerful, algorithms that support fluent and expressive use of pHRI systems have begun to play a prominent role in determining the systems’ usefulness. In this review, we describe a selection of representative algorithmic approaches that regulate and interpret pHRI, describing the progression from algorithms based on physical analogies, such as admittance control, to computational methods based on higher-level reasoning, which take advantage of multimodal communication channels. Existing algorithmic approaches largely enable task-specific pHRI, but they do not generalize to versatile human–robot collaboration. Throughout the review and in our discussion of next steps, we therefore argue that emergent embodied dialogue—bidirectional, multimodal communication that can be learned through continuous interaction—is one of the next frontiers of pHRI. 
    more » « less
    Free, publicly-accessible full text available May 3, 2024
  5. This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot’s actual execution. The method jointly finds an objective function and a time-warping function such that the robot’s resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions. 
    more » « less
  6. This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections — corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot’s current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks. 
    more » « less
  7. Enabling efficient communication in artificial agents brings us closer to machines that can cooperate with each other and with human partners. Hand-engineered approaches have substantial limitations, leading to increased interest in methods for communication to emerge autonomously between artificial agents. Most of the research in the field explores unsituated communication in one-step referential tasks. The tasks are not temporally interactive and lack time pressures typically present in natural communication and language learning. In these settings, agents can successfully learn what to communicate but not when or whether to communicate. Here, we extend the literature by assessing emergence of communication between reinforcement learning agents in a temporally interactive, cooperative task of navigating a gridworld environment. We show that, through multi-step interactions, agents develop just-in-time messaging protocols that enable them to successfully solve the task. With memory—which provides flexibility around message timing—agent pairs converge to a look-ahead communication protocol, finding an optimal solution to the task more quickly than without memory. Lastly, we explore situated communication, enabling the acting agent to choose when and whether to communicate. With the opportunity cost of forgoing an action to communicate, the acting agent learns to solicit information sparingly, in line with the Gricean Maxim of quantity. Our results point towards the importance of studying language emergence through situated communication in multi-step interactions. 
    more » « less
  8. We present a game benchmark for testing human- swarm control algorithms and interfaces in a real-time, high- cadence scenario. Our benchmark consists of a swarm vs. swarm game in a virtual ROS environment in which the goal of the game is to “capture” all agents from the opposing swarm; the game’s high-cadence is a result of the capture rules, which cause agent team sizes to fluctuate rapidly. These rules require players to consider both the number of agents currently at their disposal and the behavior of their opponent’s swarm when they plan actions. We demonstrate our game benchmark with a default human-swarm control system that enables a player to interact with their swarm through a high-level touchscreen interface. The touchscreen interface transforms player gestures into swarm control commands via a low-level decentralized ergodic control framework. We compare our default human- swarm control system to a flocking-based control system, and discuss traits that are crucial for swarm control algorithms and interfaces operating in real-time, high-cadence scenarios like our game benchmark. Our game benchmark code is available on Github; more information can be found at https: //sites.google.com/view/swarm- game- benchmark. 
    more » « less
  9. We develop an approach to improve the learning capabilities of robotic systems by combining learned predictive models with experience-based state-action policy mappings. Predictive models provide an understanding of the task and the dynamics, while experience-based (model-free) policy mappings encode favorable actions that override planned actions. We refer to our approach of systematically combining model-based and model-free learning methods as hybrid learning. Our approach efficiently learns motor skills and improves the performance of predictive models and experience-based policies. Moreover, our approach enables policies (both model-based and model-free) to be updated using any off-policy reinforcement learning method. We derive a deterministic method of hybrid learning by optimally switching between learning modalities. We adapt our method to a stochastic variation that relaxes some of the key assumptions in the original derivation. Our deterministic and stochastic variations are tested on a variety of robot control benchmark tasks in simulation as well as a hardware manipulation task. We extend our approach for use with imitation learning methods, where experience is provided through demonstrations, and we test the expanded capability with a real-world pick-and-place task. The results show that our method is capable of improving the performance and sample efficiency of learning motor skills in a variety of experimental domains. 
    more » « less