By allowing people to manipulate digital content placed in the real world, Augmented Reality (AR) provides immersive and enriched experiences in a variety of domains. Despite its increasing popularity, providing a seamless AR experience under bandwidth fluctuations is still a challenge, since delivering these experiences at photorealistic quality with minimal latency requires high bandwidth. Streaming approaches have already been proposed to solve this problem, but they require accurate prediction of the Field-Of-View of the user to only stream those regions of scene that are most likely to be watched by the user. To solve this prediction problem, we study in this paper the watching behavior of users exploring different types of AR scenes via mobile devices. To this end, we introduce the ACE Dataset, the first dataset collecting movement data of 50 users exploring 5 different AR scenes. We also propose a four-feature taxonomy for AR scene design, which allows categorizing different types of AR scenes in a methodical way, and supporting further research in this domain. Motivated by the ACE dataset analysis results, we develop a novel user visual attention prediction algorithm that jointly utilizes information of users' historical movements and digital objects positions in the AR scene. Themore »
Predictive Caching for AR/VR Experiences in a Household Scenario
Augmented/virtual reality (AR/VR) technologies can be deployed in a household environment for applications such as checking the weather or traffic reports, watching a summary of news, or attending classes. Since AR/VR applications are highly delay sensitive, delivering these types of reports in maximum quality could be very challenging. In this paper, we consider that users go through a series of AR/VR experience units that can be delivered at different experience quality levels. In order to maximize the quality of the experience while minimizing the cost of delivering it, we aim to predict the users’ behavior in the home and the experiences they are interested in at specific moments in time. We describe a deep learning based technique to predict the users’ requests from AR/VR devices and optimize the local caching of experience units. We evaluate the performance of the proposed technique on two real-world datasets and compare our results with other baselines. Our results show that predicting users’ requests can improve the quality of experience and decrease the cost of delivery.
- Award ID(s):
- 1800961
- Publication Date:
- NSF-PAR ID:
- 10159366
- Journal Name:
- 2020 International Conference on Computing, Networking and Communications (ICNC)
- Page Range or eLocation-ID:
- 591 to 595
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Virtual Reality (VR), together with the network infrastructure, can provide an interactive and immersive experience for multiple users simultaneously and thus enables collaborative VR applications (e.g., VR-based classroom). However, the satisfactory user experience requires not only high-resolution panoramic image rendering but also extremely low latency and seamless user experience. Besides, the competition for limited network resources (e.g., multiple users share the total limited bandwidth) poses a significant challenge to collaborative user experience, in particular under the wireless network with time-varying capacities. While existing works have tackled some of these challenges, a principled design considering all those factors is still missing. In this paper, we formulate a combinatorial optimization problem to maximize the Quality of Experience (QoE), defined as the linear combination of the quality, the average VR content delivery delay, and variance of the quality over a finite time horizon. In particular, we incorporate the influence of imperfect motion prediction when considering the quality of the perceived contents. However, the optimal solution to this problem can not be implemented in real-time since it relies on future decisions. Then, we decompose the optimization problem into a series of combinatorial optimization in each time slot and develop a low-complexity algorithm that canmore »
-
As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene bothmore »
-
Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scalingmore »
-
Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; (2) high communication cost required between the camera and backend processor; and (3) potentially concerned low visual privacy, thus prohibiting their more extensive applications. To this end, we propose, develop, and validate a lensless FlatCambased eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams instead of lens-based cameras to facilitate the small form-factor need in mobile eye tracking systems, which also leaves rooms for a dedicated sensing-processor co-design to reduce the required camera-processor communication latency. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations andmore »