Modeling human behaviors in contextual environments has a wide range of applications in character animation, embodied AI, VR/AR, and robotics. In real-world scenarios, humans frequently interact with the environment and manipulate various objects to complete daily tasks. In this work, we study the problem of full-body human motion synthesis for the manipulation of large-sized objects. We propose Object MOtion guided human MOtion synthesis (OMOMO), a conditional diffusion framework that can generate full-body manipulation behaviors from only the object motion. Since naively applying diffusion models fails to precisely enforce contact constraints between the hands and the object, OMOMO learns two separate denoising processes to first predict hand positions from object motion and subsequently synthesize full-body poses based on the predicted hand positions. By employing the hand positions as an intermediate representation between the two denoising processes, we can explicitly enforce contact constraints, resulting in more physically plausible manipulation motions. With the learned model, we develop a novel system that captures full-body human manipulation motions by simply attaching a smartphone to the object being manipulated. Through extensive experiments, we demonstrate the effectiveness of our proposed pipeline and its ability to generalize to unseen objects. Additionally, as high-quality human-object interaction datasets are scarce, we collect a large-scale dataset consisting of 3D object geometry, object motion, and human motion. Our dataset contains human-object interaction motion for 15 objects, with a total duration of approximately 10 hours.
more »
« less
Volumetric Motion Magnification: Subtle Motion Extraction from 4D Data
- Award ID(s):
- 1762809
- PAR ID:
- 10290040
- Date Published:
- Journal Name:
- Measurement
- Volume:
- 176
- Issue:
- C
- ISSN:
- 0263-2241
- Page Range / eLocation ID:
- 109211
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Kinematic motion analysis is widely used in health-care, sports medicine, robotics, biomechanics, sports science, etc. Motion capture systems are essential for motion analysis. There are three types of motion capture systems: marker-based capture, vision-based capture, and volumetric capture. Marker-based motion capture systems can achieve fairly accurate results but attaching markers to a body is inconvenient and time-consuming. Vision-based, marker-less motion capture systems are more desirable because of their non-intrusiveness and flexibility. Volumetric capture is a newer and more advanced marker-less motion capture system that can reconstruct realistic, full-body, animated 3D character models. But volumetric capture has rarely been used for motion analysis because volumetric motion data presents new challenges. We propose a new method for conducting kinematic motion analysis using volumetric capture data. This method consists of a three-stage pipeline. First, the motion is captured by a volumetric capture system. Then the volumetric capture data is processed using the Iterative Closest Points (ICP) algorithm to generate virtual markers that track the motion. Third, the motion tracking data is imported into the biomechanical analysis tool OpenSim for kinematic motion analysis. Our motion analysis method enables users to apply numerical motion analysis to the skeleton model in OpenSim while also studying the full-body, animated 3D model from different angles. It has the potential to provide more detailed and in-depth motion analysis for areas such as healthcare, sports science, and biomechanics.more » « less
-
While large vision-language models can generate motion graphics animations from text prompts, they regularly fail to include all spatio-temporal properties described in the prompt. We introduce MoVer, a motion verification DSL based on first-order logic that can check spatio-temporal properties of a motion graphics animation. We identify a general set of such properties that people commonly use to describe animations (e.g., the direction and timing of motions, the relative positioning of objects, etc.). We implement these properties as predicates in MoVer and provide an execution engine that can apply a MoVer program to any input SVG-based motion graphics animation. We then demonstrate how MoVer can be used in an LLM-based synthesis and verification pipeline for iteratively refining motion graphics animations. Given a text prompt, our pipeline synthesizes a motion graphics animation and a corresponding MoVer program. Executing the verification program on the animation yields a report of the predicates that failed and the report can be automatically fed back to LLM to iteratively correct the animation. To evaluate our pipeline, we build a synthetic dataset of 5600 text prompts paired with ground truth MoVer verification programs. We find that while our LLM-based pipeline is able to automatically generate a correct motion graphics animation for 58.8% of the test prompts without any iteration, this number raises to 93.6% with up to 50 correction iterations. Our code and dataset are at https://mover-dsl.github.io.more » « less
An official website of the United States government

