skip to main content


Title: Hammers for Robots: Designing Tools for Reinforcement Learning Agents
In this paper we explore what role humans might play in designing tools for reinforcement learning (RL) agents to interact with the world. Recent work has explored RL methods that optimize a robot’s morphology while learning to control it, effectively dividing an RL agent’s environment into the external world and the agent’s interface with the world. Taking a user-centered design (UCD) approach, we explore the potential of a human, instead of an algorithm, redesigning the agent’s tool. Using UCD to design for a machine learning agent brings up several research questions, including what it means to understand an RL agent’s experience, beliefs, tendencies, and goals. After discussing these questions, we then present a system we developed to study humans designing a 2D racecar for an RL autonomous driver. We conclude with findings and insights from exploratory pilots with twelve users using this system.  more » « less
Award ID(s):
1907542 1635253
NSF-PAR ID:
10292491
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
DIS '21: Designing Interactive Systems Conference 2021
Page Range / eLocation ID:
1638 to 1653
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reinforcement learning (RL) presents numerous benefits compared to rule-based approaches in various applications. Privacy concerns have grown with the widespread use of RL trained with privacy- sensitive data in IoT devices, especially for human-in-the-loop systems. On the one hand, RL methods enhance the user experience by trying to adapt to the highly dynamic nature of humans. On the other hand, trained policies can leak the user’s private information. Recent attention has been drawn to designing privacy-aware RL algorithms while maintaining an acceptable system utility. A central challenge in designing privacy-aware RL, especially for human-in-the-loop systems, is that humans have intrinsic variability, and their preferences and behavior evolve. The effect of one privacy leak mitigation can differ for the same human or across different humans over time. Hence, we can not design one fixed model for privacy-aware RL that fits all. To that end, we propose adaPARL, an adaptive approach for privacy-aware RL, especially for human-in-the-loop IoT systems. adaPARL provides a personalized privacy-utility trade-off depend- ing on human behavior and preference. We validate the proposed adaPARL on two IoT applications, namely (i) Human-in-the-Loop Smart Home and (ii) Human-in-the-Loop Virtual Reality (VR) Smart Classroom. Results obtained on these two applications validate the generality of adaPARL and its ability to provide a personalized privacy-utility trade-off. On average, adaPARL improves the utility by 57% while reducing the privacy leak by 23% on average. 
    more » « less
  2. Abstract  
    more » « less
  3. We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems. 
    more » « less
  4. Abstract  
    more » « less
  5. BACKGROUND: Natureculture (Haraway, 2003; Fuentes, 2010) constructs offer a powerful framework for science education to explore learners’ interactions with and understanding of the natural world. Technologies such as Augmented Reality (AR) designed to reveal pets’ sensory worlds and companionship with pets can facilitate learners’ harmonious relationships with significant others in naturecultures. METHODS: At a two-week virtual summer camp, we engaged teens in inquiring into dogs’ and cats’ senses using selective color filters, investigations, experience design projects, and understanding how the umwelt (von Uexküll, 2001) of pets impacts their lives with humans. We qualitatively analyzed participants’ talk, extensive notes, and projects completed at the workshop. FINDINGS: We found that teens engaged in the science and engineering practices of planning and carrying out investigations, constructing explanations and designing solutions, and questioning while investigating specific aspects of their pets’ lives. Further, we found that teens checking and taking pets’ perspectives while caring for them shaped their productive engagement in these practices. The relationship between pets and humans facilitated an ecological and relational approach to science learning. CONTRIBUTION: Our findings suggest that relational practices of caring and perspective-taking coexist with scientific practices and enrich scientific inquiry. 
    more » « less