skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 9, 2026

Title: GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration
Photorealistic avatars have become essential for immersive applications in virtual reality (VR) and augmented reality (AR), enabling lifelike interactions in areas such as training simulations, telemedicine, and virtual collaboration. These avatars bridge the gap between the physical and digital worlds, improving the user experience through realistic human representation. However, existing avatar creation techniques face significant challenges, including high costs, long creation times, and limited utility in virtual applications. Manual methods, such as MetaHuman, require extensive time and expertise, while automatic approaches, such as NeRF-based pipelines often lack efficiency, detailed facial expression fidelity, and are unable to be rendered at a speed sufficent for real-time applications. By involving several cutting-edge modern techniques, we introduce an end-to-end 3D Gaussian Splatting (3DGS) avatar creation pipeline that leverages monocular video input to create a scalable and efficient photorealistic avatar directly compatible with the Unity game engine. Our pipeline incorporates a novel Gaussian splatting technique with customized preprocessing that enables the user of ”in the wild” monocular video capture, detailed facial expression reconstruction and embedding within a fully rigged avatar model. Additionally, we present a Unity-integrated Gaussian Splatting Avatar Editor, offering a user-friendly environment for VR/AR application development. Experimental results validate the effectiveness of our preprocessing pipeline in standardizing custom data for 3DGS training and demonstrate the versatility of Gaussian avatars in Unity, highlighting the scalability and practicality of our approach.  more » « less
Award ID(s):
2225890
PAR ID:
10613606
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE; 2025 the 11th International Conference on Virtual Reality
Date Published:
ISBN:
979-8-3503-9273-9
Format(s):
Medium: X
Location:
Wageningen, Netherlands
Sponsoring Org:
National Science Foundation
More Like this
  1. Efthimiou, E.; Fotinea, S-E.; Hanke, T.; McDonald, J.; Shterionov, D.; Wolfe, R. (Ed.)
    With improved and more easily accessible technology, immersive virtual reality (VR) head-mounted devices have become more ubiquitous. As signing avatar technology improves, virtual reality presents a new and relatively unexplored application for signing avatars. This paper discusses two primary ways that signed language can be represented in immersive virtual spaces: 1) Third-person, in which the VR user sees a character who communicates in signed language; and 2) First-person, in which the VR user produces signed content themselves, tracked by the head-mounted device and visible to the user herself (and/or to other users) in the virtual environment. We will discuss the unique affordances granted by virtual reality and how signing avatars might bring accessibility and new opportunities to virtual spaces. We will then discuss the limitations of signed content in virtual reality concerning virtual signers shown from both third- and first-person perspectives. 
    more » « less
  2. We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at arm.stanford.edu/next-best-sense. 
    more » « less
  3. This paper explores avatar identification in creative story- telling applications where users create their own story and environment. We present a study that investigated the effects of avatar facial similarity to the user on the quality of the story product they create. The children told a story using a digital puppet-based storytelling system by inter- acting with a physical puppet box that was augmented with a real-time video feed of the puppet enactment. We used a facial morphing technique to manipulate avatar facial similarity to the user. The resulting morphed image was applied to each participants puppet character, thus creating a custom avatar for each child to use in story creation. We hypothesized that the more familiar avatars appeared to participants, the stronger the sense of character identification would be, resulting in higher story quality. The proposed rationale is that visual familiarity may lead participants to draw richer story details from their past real-life experiences. Qualitative analysis of the stories supported our hypothesis. Our results contribute to avatar design in children's creative storytelling applications. 
    more » « less
  4. Although Augmented Reality (AR) can be easily implemented with most smartphones and tablets today, the investigation of distance perception with these types of devices has been limited. In this paper, we question whether the distance of a virtual human, e.g., avatar, seen through a smartphone or tablet display is perceived accurately. We also investigate, due to the Covid-19 pandemic and increased sensitivity to distances to others, whether a coughing avatar that either does or does not have a mask on affects distance estimates compared to a static avatar. We performed an experiment in which all participants estimated the distances to avatars that were either static or coughing, with and without masks on. Avatars were placed at a range of distances that would be typical for interaction, i.e., action space. Data on judgments of distance to the varying avatars was collected in a distributed manner by deploying an app for smartphones. Results showed that participants were fairly accurate in estimating the distance to all avatars, regardless of coughing condition or mask condition. Such findings suggest that mobile AR applications can be used to obtain accurate estimations of distances to virtual others "in the wild," which is promising for using AR for simulations and training applications that require precise distance estimates. 
    more » « less
  5. The expression of human emotion is integral to social interaction, and in virtual reality it is increasingly common to develop virtual avatars that attempt to convey emotions by mimicking these visual and aural cues, i.e. the facial and vocal expressions. However, errors in (or the absence of) facial tracking can result in the rendering of incorrect facial expressions on these virtual avatars. For example, a virtual avatar may speak with a happy or unhappy vocal inflection while their facial expression remains otherwise neutral. In circumstances where there is conflict between the avatar's facial and vocal expressions, it is possible that users will incorrectly interpret the avatar's emotion, which may have unintended consequences in terms of social influence or in terms of the outcome of the interaction. In this paper, we present a human-subjects study (N = 22) aimed at understanding the impact of conflicting facial and vocal emotional expressions. Specifically we explored three levels of emotional valence (unhappy, neutral, and happy) expressed in both visual (facial) and aural (vocal) forms. We also investigate three levels of head scales (down-scaled, accurate, and up-scaled) to evaluate whether head scale affects user interpretation of the conveyed emotion. We find significant effects of different multimodal expressions on happiness and trust perception, while no significant effect was observed for head scales. Evidence from our results suggest that facial expressions have a stronger impact than vocal expressions. Additionally, as the difference between the two expressions increase, the less predictable the multimodal expression becomes. For example, for the happy-looking and happy-sounding multimodal expression, we expect and see high happiness rating and high trust, however if one of the two expressions change, this mismatch makes the expression less predictable. We discuss the relationships, implications, and guidelines for social applications that aim to leverage multimodal social cues. 
    more » « less