How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

Prakash, Aditya; Lundell, Benjamin; Andreychuk, Dmitry; Forsyth, David; Gupta, Saurabh; Sawhney, Harpreet

doi:10.1109/CVPR52734.2025.00659

Citation Details

This content will become publicly available on June 10, 2026

How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions

We tackle the novel problem of predicting 3D hand motion and contact maps (or Interaction Trajectories) given a single RGB view, action text, and a 3D contact point on the object as input. Our approach consists of (1) Interaction Codebook: a VQVAE model to learn a latent codebook of hand poses and contact points, effectively tokenizing interaction trajectories, (2) Interaction Predictor: a transformer-decoder module to predict the interaction trajectory from test time inputs by using an indexer module to retrieve a latent affordance from the learned codebook. To train our model, we develop a data engine that extracts 3D hand poses and contact trajectories from the diverse HoloAssist dataset. We evaluate our model on a benchmark that is 2.5-10X larger than existing works, in terms of diversity of objects and interactions observed, and test for generalization of the model across object categories, action categories, tasks, and scenes. Experimental results show the effectiveness of our approach over transformer & diffusion baselines across all settings. more »

Award ID(s):: 2143873 2007035

PAR ID:: 10649444

Author(s) / Creator(s):: Prakash, Aditya ; Lundell, Benjamin ; Andreychuk, Dmitry ; Forsyth, David ; Gupta, Saurabh ; Sawhney, Harpreet

Publisher / Repository:: IEEE

Date Published:: 2025-06-10

Page Range / eLocation ID:: 7026 to 7036

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 10, 2026
Conference Paper:
https://doi.org/10.1109/CVPR52734.2025.00659

More Like this