skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual Reality
Augmented Reality (AR) experiences tightly associate virtual contents with environmental entities. However, the dissimilarity of different environments limits the adaptive AR content behaviors under large-scale deployment. We propose ScalAR, an integrated workflow enabling designers to author semantically adaptive AR experiences in Virtual Reality (VR). First, potential AR consumers collect local scenes with a semantic understanding technique. ScalAR then synthesizes numerous similar scenes. In VR, a designer authors the AR contents’ semantic associations and validates the design while being immersed in the provided scenes. We adopt a decision-tree-based algorithm to fit the designer’s demonstrations as a semantic adaptation model to deploy the authored AR experience in a physical scene. We further showcase two application scenarios authored by ScalAR and conduct a two-session user study where the quantitative results prove the accuracy of the AR content rendering and the qualitative results show the usability of ScalAR.  more » « less
Award ID(s):
1839971
PAR ID:
10396710
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
CHI Conference on Human Factors in Computing Systems
Page Range / eLocation ID:
1 to 18
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mobile Augmented Reality (AR), which overlays digital content on the real-world scenes surrounding a user, is bringing immersive interactive experiences where the real and virtual worlds are tightly coupled. To enable seamless and precise AR experiences, an image recognition system that can accurately recognize the object in the camera view with low system latency is required. However, due to the pervasiveness and severity of image distortions, an effective and robust image recognition solution for mobile AR is still elusive. In this paper, we present CollabAR, an edge-assisted system that provides distortion-tolerant image recognition for mobile AR with imperceptible system latency. CollabAR incorporates both distortion-tolerant and collaborative image recognition modules in its design. The former enables distortion-adaptive image recognition to improve the robustness against image distortions, while the latter exploits the `spatial-temporal' correlation among mobile AR users to improve recognition accuracy. We implement CollabAR on four different commodity devices, and evaluate its performance on two multi-view image datasets. Our evaluation demonstrates that CollabAR achieves over 96% recognition accuracy for images with severe distortions, while reducing the end-to-end system latency to as low as 17.8ms for commodity mobile devices. 
    more » « less
  2. Augmented Reality (AR) enhances the real world by integrating virtual content, yet ensuring the quality, usability, and safety of AR experiences presents significant challenges. Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? In this study, we evaluate the capabilities of three state-of-the-art commercial VLMs -- GPT, Gemini, and Claude -- in identifying and describing AR scenes. For this purpose, we use DiverseAR, the first AR dataset specifically designed to assess VLMs' ability to analyze virtual content across a wide range of AR scene complexities. Our findings demonstrate that VLMs are generally capable of perceiving and describing AR scenes, achieving a True Positive Rate (TPR) of up to 93% for perception and 71% for description. While they excel at identifying obvious virtual objects, such as a glowing apple, they struggle when faced with seamlessly integrated content, such as a virtual pot with realistic shadows. Our results highlight both the strengths and the limitations of VLMs in understanding AR scenarios. We identify key factors affecting VLM performance, including virtual content placement, rendering quality, and physical plausibility. This study underscores the potential of VLMs as tools for evaluating the quality of AR experiences. 
    more » « less
  3. In Augmented Reality (AR), improper virtual content placement can obstruct real-world elements, causing confusion and degrading the experience. To address this, we present LOBSTAR (Language model-based OBSTruction detection for Augmented Reality), the first system leveraging a vision language model (VLM) to detect key objects and prevent obstructions in AR. We evaluated LOBSTAR using both real-world and virtual-scene images and developed a mobile app for AR content obstruction detection. Our results demonstrate that LOBSTAR effectively understands scenes and detects obstructive content with well-designed VLM prompts, achieving up to 96% accuracy and a detection latency of 580ms on a mobile app. 
    more » « less
  4. Immersive Learning Environments (ILEs) developed in Virtual and Augmented Reality (VR/AR) are a novel pro- fessional training platform. An ILE can facilitate an Adaptive Learning System (ALS), which has proven beneficial to the learning process. However, there is no existing AI-ready ILE that facilitates collecting multimedia multimodal data from the environment and users for training AI models, nor allows for the learning contents and complex learning process to be dynamically adapted by an ALS. This paper proposes a novel multimedia system in VR/AR to dynamically build ILEs for a wide range of use-cases, based on a description language for the generalizable ILE structure. It will detail users’ paths and conditions for completing learning activities, and a content adaptation algorithm to update the ILE at runtime. Human and AI systems can customize the environment based on user learning metrics. Results show that this framework is efficient and low- overhead, suggesting a path to simplifying and democratizing the ILE development without introducing bloat. Index Terms—virtual reality, augmented reality, content generation, immersive learning, 3D environments 
    more » « less
  5. As applications for virtual reality (VR) and augmented reality (AR) technology increase, it will be important to understand how users perceive their action capabilities in virtual environments. Feedback about actions may help to calibrate perception for action opportunities (affordances) so that action judgments in VR and AR mirror actors’ real abilities. Previous work indicates that walking through a virtual doorway while wielding an object can calibrate the perception of one’s passability through feedback from collisions. In the current study, we aimed to replicate this calibration through feedback using a different paradigm in VR while also testing whether this calibration transfers to AR. Participants held a pole at 45°and made passability judgments in AR (pretest phase). Then, they made passability judgments in VR and received feedback on those judgments by walking through a virtual doorway while holding the pole (calibration phase). Participants then returned to AR to make posttest passability judgments. Results indicate that feedback calibrated participants’ judgments in VR. Moreover, this calibration transferred to the AR environment. In other words, after experiencing feedback in VR, passability judgments in VR and in AR became closer to an actor’s actual ability, which could make training applications in these technologies more effective. 
    more » « less