skip to main content


Title: An Interactive Framework for Visually Realistic 3D Motion Synthesis using Evolutionarily-trained Spiking Neural Networks

We present an end-to-end method for capturing the dynamics of 3D human characters and translating them for synthesizing new, visually-realistic motion sequences. Conventional methods employ sophisticated, but generic, control approaches for driving the joints of articulated characters, paying little attention to the distinct dynamics of human joint movements. In contrast, our approach attempts to synthesize human-like joint movements by exploiting a biologically-plausible, compact network of spiking neurons that drive joint control in primates and rodents. We adapt the controller architecture by introducing learnable components and propose an evolutionary algorithm for training the spiking neural network architectures and capturing diverse joint dynamics. Our method requires only a few samples for capturing the dynamic properties of a joint's motion and exploits the biologically-inspired, trained controller for its reconstruction. More importantly, it can transfer the captured dynamics to new visually-plausible motion sequences. To enable user-dependent tailoring of the resulting motion sequences, we develop an interactive framework that allows for editing and real-time visualization of the controlled 3D character. We also demonstrate the applicability of our method to real human motion capture data by learning the hand joint dynamics from a gesture dataset and using our framework to reconstruct the gestures with our 3D animated character. The compact architecture of our joint controller emerging from its biologically-realistic design, and the inherent capacity of our evolutionary learning algorithm for parallelization, suggest that our approach could provide an efficient and scalable alternative for synthesizing 3D character animations with diverse and visually-realistic motion dynamics.

 
more » « less
Award ID(s):
2132972 2238955
NSF-PAR ID:
10493915
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Computer Graphics and Interactive Techniques
Volume:
6
Issue:
1
ISSN:
2577-6193
Page Range / eLocation ID:
1 to 19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos. 
    more » « less
  2. We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose and shape. Though substantial progress has been made in estimating 3D human motion and shape from dynamic observations, recovering plausible pose sequences in the presence of noise and occlusions remains a challenge. For this purpose, we propose an expressive generative model in the form of a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. Furthermore, we introduce a flexible optimization-based approach that leverages HuMoR as a motion prior to robustly estimate plausible pose and shape from ambiguous observations. Through extensive evaluations, we demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset, and enables motion reconstruction from multiple input modalities including 3D keypoints and RGB(-D) videos. See the project page at geometry.stanford.edu/projects/humor. 
    more » « less
  3. Abstract

    In this paper, we present a new strategy, a joint deep learning architecture, for two classic tasks in computer graphics: water surface reconstruction and water image synthesis. Modeling water surfaces from single images can be regarded as the inverse of image rendering, which converts surface geometries into photorealistic images. On the basis of this fact, we therefore consider these two problems as a cycle image‐to‐image translation and propose to tackle them together using a pair of neural networks, with the three‐dimensional surface geometries being represented as two‐dimensional surface normal maps. Furthermore, we also estimate the imaging parameters from the existing water images with a subnetwork to reuse the lighting conditions when synthesizing new images. Experiments demonstrate that our method achieves an accurate reconstruction of surfaces from monocular images efficiently and produces visually plausible new images under variable lighting conditions.

     
    more » « less
  4. Humans have an astonishing ability to extract hidden information from the movements of others. For example, even with limited kinematic information, humans can distinguish between biological and nonbiological motion, identify the age and gender of a human demonstrator, and recognize what action a human demonstrator is performing. It is unknown, however, whether they can also estimate hidden mechanical properties of another’s limbs simply by observing their motions. Strictly speaking, identifying an object’s mechanical properties, such as stiffness, requires contact. With only motion information, unambiguous measurements of stiffness are fundamentally impossible, since the same limb motion can be generated with an infinite number of stiffness values. However, we show that humans can readily estimate the stiffness of a simulated limb from its motion. In three experiments, we found that participants linearly increased their rating of arm stiffness as joint stiffness parameters in the arm controller increased. This was remarkable since there was no physical contact with the simulated limb. Moreover, participants had no explicit knowledge of how the simulated arm was controlled. To successfully map nontrivial changes in multijoint motion to changes in arm stiffness, participants likely drew on prior knowledge of human neuromotor control. Having an internal representation consistent with the behavior of the controller used to drive the simulated arm implies that this control policy competently captures key features of veridical biological control. Finding that humans can extract latent features of neuromotor control from kinematics also provides new insight into how humans interpret the motor actions of others. NEW & NOTEWORTHY Humans can visually perceive another’s overt motion, but it is unknown whether they can also perceive the hidden dynamic properties of another’s limbs from their motions. Here, we show that humans can correctly infer changes in limb stiffness from nontrivial changes in multijoint limb motion without force information or explicit knowledge of the underlying limb controller. Our findings suggest that humans presume others control motor behavior in such a way that limb stiffness influences motion. 
    more » « less
  5. Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and employing a curriculum learning strategy to learn atomic action composition. First, we disentangle complex human motions into a set of atomic actions during learning, and then assemble novel actions using the learned atomic actions, which offers better adaptability to new actions. Moreover, we introduce a curriculum learning training strategy that leverages masked motion modeling with a gradual increase in the mask ratio, and thus facilitates atomic action assembly. This approach mitigates the overfitting problem commonly encountered in previous methods while enforcing the model to learn better motion representations. We demonstrate the effectiveness of ATOM through extensive experiments, including text-to-motion and action-to-motion synthesis tasks. We further illustrate its superiority in synthesizing plausible and coherent text-guided human motion sequences. 
    more » « less