skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Personalized augmented reality via fog-based imitation learning
Augmented reality (AR) technologies are rapidly gaining momentum in society and are expected to play a critical role in the future of cities and transportation. In such dynamic settings with a heterogeneous population of AR users, it is important for holograms to be placed in the surrounding environment with regard to the users' preferences. However, the area of AR personalization remains largely unexplored. This paper proposes to use behavioral cloning, an algorithm for imitation learning, as a means of automatically generating policies that capture user preferences of hologram positioning. We argue in favor of employing the fog computing paradigm to minimize the volume of data sent to the cloud, and thereby preserve user privacy and increase both communication efficiency and learning efficiency. Through preliminary results obtained with a custom, Unity-based AR simulator, we demonstrate that user-specific policies can be learned quickly and accurately.  more » « less
Award ID(s):
1903136
PAR ID:
10119304
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the Workshop on Fog Computing and the IoT
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population. When these differences arise, traditional RLHF frameworks simply average over them, leading to inaccurate rewards and poor performance for individual subgroups. To address the need for pluralistic alignment, we develop a class of multimodal RLHF methods. Our proposed techniques are based on a latent variable formulation - inferring a novel user-specific latent and learning reward models and policies conditioned on this latent without additional user-specific data. While conceptually simple, we show that in practice, this reward modeling requires careful algorithmic considerations around model architecture and reward scaling. To empirically validate our proposed technique, we first show that it can provide a way to combat under- specification in simulated control problems, inferring and optimizing user-specific reward functions. Next, we conduct experiments on pluralistic language datasets representing diverse user preferences and demonstrate improved reward function accuracy. We additionally show the benefits of this probabilistic framework in terms of measuring uncertainty, and actively learning user preferences. This work enables learning from diverse populations of users with divergent preferences, an important challenge that naturally occurs in problems from robot learning to foundation model alignment. 
    more » « less
  2. Recent investigations showed that cache-aided device-to-device (D2D) networks can be improved by properly exploiting the individual preferences of users. Since in practice it might be difficult to make centralized decisions about the caching distributions, this paper investigates the individual preference aware caching policy that can be implemented distributedly by users without coordination. The proposed policy is based on categorizing different users into different reference groups associated with different caching policies according to their preferences. To construct reference groups, learning-based approaches are used. To design caching policies that maximize throughput and hit-rate, optimization problems are formulated and solved. Numerical results based on measured individual preferences show that our design is effective and exploiting individual preferences is beneficial. 
    more » « less
  3. Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability. State-of-the-art approaches to social navigation fall into two categories: model-based social constraints and learning-based approaches. While effective, these approaches have fundamental limitations – model-based approaches require constraint and parameter tuning to adapt to preferences and new scenarios, while learning-based approaches require reward functions, significant training data, and are hard to adapt to new social scenarios or new domains with limited demonstrations.In this work, we propose Iterative Dimension Informed Program Synthesis (IDIPS) to address these limitations by learning and adapting social navigation in the form of human-readable symbolic programs. IDIPS works by combining pro-gram synthesis, parameter optimization, predicate repair, and iterative human demonstration to learn and adapt model-free action selection policies from orders of magnitude less data than learning-based approaches. We introduce a novel predicate repair technique that can accommodate previously unseen social scenarios or preferences by growing existing policies.We present experimental results showing that IDIPS: 1) synthesizes effective policies that model user preference, 2) can adapt existing policies to changing preferences, 3) can extend policies to handle novel social scenarios such as locked doors, and 4) generates policies that can be transferred from simulation to real-world robots with minimal effort. 
    more » « less
  4. Cache-aided wireless device-to-device (D2D) networks allow significant throughput increase, depending on the concentration of the popularity distribution of files. Many studies assume that all users have the same preference distribution; however, this may not be true in practice. This work investigates whether and how the information about individual preferences can benefit cache-aided D2D networks. We examine a clustered network and derive a network utility that considers both the user distribution and channel fading effects into the analysis. We also formulate a utility maximization problem for designing caching policies. This maximization problem can be applied to optimize several important quantities, including throughput, energy efficiency (EE), cost, and hit-rate, and to solve different tradeoff problems. We provide a general approach that can solve the proposed problem under the assumption that users coordinate, then prove that the proposed approach can obtain the stationary point under a mild assumption. Using simulations of practical setups, we show that performance can improve significantly with proper exploitation of individual preferences. We also show that different types of tradeoffs exist between different performance metrics and that they can be managed through caching policy and cooperation distance designs. 
    more » « less
  5. Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
    Planning in real-world settings often entails addressing partial observability while aligning with users’ requirements. We present a novel framework for expressing users’ constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goal- oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment. Analysis proves that our algorithms converge to the optimal user-aligned behavior in the limit. Empirical results show that parameterized BSQ policies provide a computationally feasible approach for user-aligned planning in partially observable settings. 
    more » « less