skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Interactive Framework for Visually Realistic 3D Motion Synthesis using Evolutionarily-trained Spiking Neural Networks
We present an end-to-end method for capturing the dynamics of 3D human characters and translating them for synthesizing new, visually-realistic motion sequences. Conventional methods employ sophisticated, but generic, control approaches for driving the joints of articulated characters, paying little attention to the distinct dynamics of human joint movements. In contrast, our approach attempts to synthesize human-like joint movements by exploiting a biologically-plausible, compact network of spiking neurons that drive joint control in primates and rodents. We adapt the controller architecture by introducing learnable components and propose an evolutionary algorithm for training the spiking neural network architectures and capturing diverse joint dynamics. Our method requires only a few samples for capturing the dynamic properties of a joint's motion and exploits the biologically-inspired, trained controller for its reconstruction. More importantly, it can transfer the captured dynamics to new visually-plausible motion sequences. To enable user-dependent tailoring of the resulting motion sequences, we develop an interactive framework that allows for editing and real-time visualization of the controlled 3D character. We also demonstrate the applicability of our method to real human motion capture data by learning the hand joint dynamics from a gesture dataset and using our framework to reconstruct the gestures with our 3D animated character. The compact architecture of our joint controller emerging from its biologically-realistic design, and the inherent capacity of our evolutionary learning algorithm for parallelization, suggest that our approach could provide an efficient and scalable alternative for synthesizing 3D character animations with diverse and visually-realistic motion dynamics.  more » « less
Award ID(s):
2132972 2238955
PAR ID:
10493915
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Computer Graphics and Interactive Techniques
Volume:
6
Issue:
1
ISSN:
2577-6193
Page Range / eLocation ID:
1 to 19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. none (Ed.)
    It takes less than half a second for a person to fall [8]. Capturing the essence of a fall from video or motion capture is difficult. More generally, generating realistic 3D human body motions from motion capture (MoCap) data is a significant challenge with potential applications in animation, gaming, and robotics. Current motion datasets contain single-labeled activities, which lack fine-grained control over the motion, particularly for actions as sparse, dynamic, and complex as falling. This work introduces a novel human falling dataset and a learned multi-branch, Attribute-Conditioned Variational Autoencoder model to generate novel falls. Our unique dataset introduces a new ontology of the motion into three phases: Impact, Glitch, and Fall. Each branch of the model learns each phase separately and the fusion layer learns to fuse the latent space together. Furthermore, we present data augmentation techniques and an inter-phase smoothness loss for natural plausible motion generation. We successfully generated high-quality images, validating the efficacy of our model in producing high-fidelity, attribute-conditioned human movements. 
    more » « less
  2. Sign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars. 
    more » « less
  3. We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos. 
    more » « less
  4. We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos. See the project page at geometry.stanford.edu/projects/humor. 
    more » « less
  5. Humans have an astonishing ability to extract hidden information from the movements of others. For example, even with limited kinematic information, humans can distinguish between biological and nonbiological motion, identify the age and gender of a human demonstrator, and recognize what action a human demonstrator is performing. It is unknown, however, whether they can also estimate hidden mechanical properties of another’s limbs simply by observing their motions. Strictly speaking, identifying an object’s mechanical properties, such as stiffness, requires contact. With only motion information, unambiguous measurements of stiffness are fundamentally impossible, since the same limb motion can be generated with an infinite number of stiffness values. However, we show that humans can readily estimate the stiffness of a simulated limb from its motion. In three experiments, we found that participants linearly increased their rating of arm stiffness as joint stiffness parameters in the arm controller increased. This was remarkable since there was no physical contact with the simulated limb. Moreover, participants had no explicit knowledge of how the simulated arm was controlled. To successfully map nontrivial changes in multijoint motion to changes in arm stiffness, participants likely drew on prior knowledge of human neuromotor control. Having an internal representation consistent with the behavior of the controller used to drive the simulated arm implies that this control policy competently captures key features of veridical biological control. Finding that humans can extract latent features of neuromotor control from kinematics also provides new insight into how humans interpret the motor actions of others. NEW & NOTEWORTHY Humans can visually perceive another’s overt motion, but it is unknown whether they can also perceive the hidden dynamic properties of another’s limbs from their motions. Here, we show that humans can correctly infer changes in limb stiffness from nontrivial changes in multijoint limb motion without force information or explicit knowledge of the underlying limb controller. Our findings suggest that humans presume others control motor behavior in such a way that limb stiffness influences motion. 
    more » « less