skip to main content


Title: Double Doodles: Sketching Animation in Immersive Environment With 3+6 DOFs Motion Gestures
We present “Double Doodles” to make full use of two sequential inputs of a VR controller with 9 DOFs in total, 3 DOFs of the first input sequence for the generation of motion paths and 6 DOFs of the second input sequence for motion gestures. While engineering our system, we take ergonomics into consideration and design a set of user-defined motion gestures to describe character motions. We employ a real-time deep learning-based approach for highly accurate motion gesture classification. We then integrate our approach into a prototype system, and it allows users to directly create character animations in VR environments using motion gestures with a VR controller, followed by animation preview and animation inter- active editing. Finally, we evaluate the feasibility and effectiveness of our system through a user study, demonstrating the usefulness of our system for visual storytelling dedicated to amateurs, as well as for providing fast drafting tools for artists.  more » « less
Award ID(s):
2005430
PAR ID:
10532344
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400701085
Page Range / eLocation ID:
6998 to 7006
Subject(s) / Keyword(s):
character animation 3D sketching computer puppetry interaction techniques gesture classification virtual reality
Format(s):
Medium: X
Location:
Ottawa ON Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. We present an end-to-end method for capturing the dynamics of 3D human characters and translating them for synthesizing new, visually-realistic motion sequences. Conventional methods employ sophisticated, but generic, control approaches for driving the joints of articulated characters, paying little attention to the distinct dynamics of human joint movements. In contrast, our approach attempts to synthesize human-like joint movements by exploiting a biologically-plausible, compact network of spiking neurons that drive joint control in primates and rodents. We adapt the controller architecture by introducing learnable components and propose an evolutionary algorithm for training the spiking neural network architectures and capturing diverse joint dynamics. Our method requires only a few samples for capturing the dynamic properties of a joint's motion and exploits the biologically-inspired, trained controller for its reconstruction. More importantly, it can transfer the captured dynamics to new visually-plausible motion sequences. To enable user-dependent tailoring of the resulting motion sequences, we develop an interactive framework that allows for editing and real-time visualization of the controlled 3D character. We also demonstrate the applicability of our method to real human motion capture data by learning the hand joint dynamics from a gesture dataset and using our framework to reconstruct the gestures with our 3D animated character. The compact architecture of our joint controller emerging from its biologically-realistic design, and the inherent capacity of our evolutionary learning algorithm for parallelization, suggest that our approach could provide an efficient and scalable alternative for synthesizing 3D character animations with diverse and visually-realistic motion dynamics.

     
    more » « less
  2. We compare the perceived naturalness of character animations generated using three interpolation methods: linear Euler, spherical linear quaternion, and spherical spline quaternion. While previous work focused on the mathematical description of these interpolation types, our work studies the perceptual evaluation of animated upper body character gestures generated using these interpolations. Ninety-seven participants watched 12 animation clips of a character performing four different upper body motions: a beat gesture, a deictic gesture, an iconic gesture, and a metaphoric gesture. Three animation clips were generated for each gesture using the three interpolation methods. The participants rated their naturalness on a 5-point Likert scale. The results showed that animations generated using spherical spline quaternion interpolation were perceived as significantly more natural than those generated using the other two interpolation methods. The findings held true for all subjects regardless of gender and animation experience and across all four gestures. 
    more » « less
  3. Redirected and amplified head movements have the potential to provide more natural interaction with virtual environments (VEs) than using controller-based input, which causes large discrepancies between visual and vestibular self-motion cues and leads to increased VR sickness. However, such amplified head movements may also exacerbate VR sickness symptoms over no amplification. Several general methods have been introduced to reduce VR sickness for controller-based input inside a VE, including a popular vignetting method that gradually reduces the field of view. In this paper, we investigate the use of vignetting to reduce VR sickness when using amplified head rotations instead of controllerbased input. We also investigate whether the induced VR sickness is a result of the user’s head acceleration or velocity by introducing two different modes of vignetting, one triggered by acceleration and the other by velocity. Our dependent measures were pre and post VR sickness questionnaires as well as estimated discomfort levels that were assessed each minute of the experiment. Our results show interesting effects between a baseline condition without vignetting, as well as the two vignetting methods, generally indicating that the vignetting methods did not succeed in reducing VR sickness for most of the participants and, instead, lead to a significant increase. We discuss the results and potential explanations of our findings. 
    more » « less
  4. Abstract

    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

     
    more » « less
  5. In this paper we propose a novel conditional generative adversarial network (cGAN) architecture, called S2M-Net, to holistically synthesize realistic three-party conversational animations based on acoustic speech input together with speaker marking (i.e., the speak- ing time of each interlocutor). Specifically, based on a pre-collected three-party conversational motion dataset, we design and train the S2M-Net for three-party conversational animation synthesis. In the architecture, a generator contains a LSTM encoder to encode a sequence of acoustic speech features to a latent vector that is further fed into a transform unit to transform the latent vector into a gesture kinematics space. Then, the output of this transform unit is fed into a LSTM decoder to generate corresponding three-party conversational gesture kinematics. Meanwhile, a discriminator is implemented to check whether an input sequence of three-party conversational gesture kinematics is real or fake. To evaluate our method, besides quantitative and qualitative evaluations, we also conducted paired comparison user studies to compare it with the state of the art. 
    more » « less