skip to main content


This content will become publicly available on January 3, 2025

Title: Context in Human Action Through Motion Complementarity
Motivated by Goldman's Theory of Human Action - a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur - we propose a novel learning formulation for motion and context, where context is derived as the complement to motion. More specifically, we model physical movement through the adoption of Therbligs, a set of elemental physical motions centered around object manipulation. Context is modeled through the use of a contrastive mutual information loss that formulates context information as the action information not contained within movement information. We empirically prove the utility brought by this separation of representation, showing sizable improvements in action recognition and action anticipation accuracies for a variety of models. We present results over two object manipulation datasets: EPIC Kitchens 100, and 50 Salads.  more » « less
Award ID(s):
2020624
NSF-PAR ID:
10484643
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE/CVF
Date Published:
Journal Name:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Page Range / eLocation ID:
pp. 6531-6540
Format(s):
Medium: X
Location:
College Park, MD
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present a framework for deformable object manipulation that interleaves planning and control, enabling complex manipulation tasks without relying on high-fidelity modeling or simulation. The key question we address is when should we use planning and when should we use control to achieve the task? Planners are designed to find paths through complex configuration spaces, but for highly underactuated systems, such as deformable objects, achieving a specific configuration is very difficult even with high-fidelity models. Conversely, controllers can be designed to achieve specific configurations, but they can be trapped in undesirable local minima owing to obstacles. Our approach consists of three components: (1) a global motion planner to generate gross motion of the deformable object; (2) a local controller for refinement of the configuration of the deformable object; and (3) a novel deadlock prediction algorithm to determine when to use planning versus control. By separating planning from control we are able to use different representations of the deformable object, reducing overall complexity and enabling efficient computation of motion. We provide a detailed proof of probabilistic completeness for our planner, which is valid despite the fact that our system is underactuated and we do not have a steering function. We then demonstrate that our framework is able to successfully perform several manipulation tasks with rope and cloth in simulation, which cannot be performed using either our controller or planner alone. These experiments suggest that our planner can generate paths efficiently, taking under a second on average to find a feasible path in three out of four scenarios. We also show that our framework is effective on a 16-degree-of-freedom physical robot, where reachability and dual-arm constraints make the planning more difficult. 
    more » « less
  2. While the study of unconstrained movements has revealed important features of neural control, generalizing those insights to more sophisticated object manipulation is challenging. Humans excel at physical interaction with objects, even when those objects introduce complex dynamics and kinematic constraints. This study examined humans turning a horizontal planar crank (radius 10.29 cm) at their preferred and three instructed speeds (with visual feedback), both in clockwise and counterclockwise directions. To explore the role of neuromechanical dynamics, the instructed speeds covered a wide range: fast (near the limits of performance), medium (near preferred speed), and very slow (rendering dynamic effects negligible). Because kinematically constrained movements involve significant physical interaction, disentangling neural control from the influences of biomechanics presents a challenge. To address it, we modeled the interactive dynamics to “subtract off” peripheral biomechanics from observed force and kinematic data, thereby estimating aspects of underlying neural action that may be expressed in terms of motion. We demonstrate the value of this method: remarkably, an approximately elliptical path emerged, and speed minima coincided with curvature maxima, similar to what is seen in unconstrained movements, even though the hand moved at nearly constant speed along a constant-curvature path. These findings suggest that the neural controller takes advantage of peripheral biomechanics to simplify physical interaction. As a result, patterns seen in unconstrained movements persist even when physical interaction prevents their expression in hand kinematics. The reemergence of a speed-curvature relation indicates that it is due, at least in part, to neural processes that emphasize smoothness and predictability. NEW & NOTEWORTHY Physically interacting with kinematic constraints is commonplace in everyday actions. We report a study of humans turning a crank, a circular constraint that imposes constant hand path curvature and hence should suppress variations of hand speed due to the power-law speed-curvature relation widely reported for unconstrained motions. Remarkably, we found that, when peripheral biomechanical factors are removed, a speed-curvature relation reemerges, indicating that it is, at least in part, of neural origin. 
    more » « less
  3. Promising results have been achieved recently in category-level manipulation that generalizes across object instances. Nevertheless, it often requires expensive real-world data collection and manual specification of semantic keypoints for each object category and task. Additionally, coarse keypoint predictions and ignoring intermediate action sequences hinder adoption in complex manipulation tasks beyond pick-and-place. This work proposes a novel, category-level manipulation framework that leverages an object-centric, category-level representation and model-free 6 DoF motion tracking. The canonical object representation is learned solely in simulation and then used to parse a category-level, task trajectory from a single demonstration video. The demonstration is reprojected to a target trajectory tailored to a novel object via the canonical representation. During execution, the manipulation horizon is decomposed into longrange, collision-free motion and last-inch manipulation. For the latter part, a category-level behavior cloning (CatBC) method leverages motion tracking to perform closed-loop control. CatBC follows the target trajectory, projected from the demonstration and anchored to a dynamically selected category-level coordinate frame. The frame is automatically selected along the manipulation horizon by a local attention mechanism. This framework allows to teach different manipulation strategies by solely providing a single demonstration, without complicated manual programming. Extensive experiments demonstrate its efficacy in a range of challenging industrial tasks in highprecision assembly, which involve learning complex, long-horizon policies. The process exhibits robustness against uncertainty due to dynamics as well as generalization across object instances and scene configurations. 
    more » « less
  4. Spatial Augmented Reality (SAR), e.g., based on monoscopic projected imagery on physical three-dimensional (3D) surfaces, can be particularly well-suited for ad hoc group or multi-user augmented reality experiences since it does not encumber users with head-worn or carried devices. However, conveying a notion of realistic 3D shapes and movements on SAR surfaces using monoscopic imagery is a difficult challenge. While previous work focused on physical actuation of such surfaces to achieve geometrically dynamic content, we introduce a different concept, which we call “Synthetic Animatronics,” i.e., conveying geometric movement or deformation purely through manipulation of the imagery being shown on a static display surface. We present a model for the distribution of the viewpoint-dependent distortion that occurs when there are discrepancies between the physical display surface and the virtual object being represented, and describe a realtime implementation for a method of adaptively filtering the imagery based on an approximation of expected potential error. Finally, we describe an existing physical SAR setup well-suited for synthetic animatronics and a corresponding Unity-based SAR simulator allowing for flexible exploration and validation of the technique and various parameters. 
    more » « less
  5. Tactile sensing has been increasingly utilized in robot control of unknown objects to infer physical properties and optimize manipulation. However, there is limited understanding about the contribution of different sensory modalities during interactive perception in complex interaction both in robots and in humans. This study investigated the effect of visual and haptic information on humans’ exploratory interactions with a ‘cup of coffee’, an object with nonlinear internal dynamics. Subjects were instructed to rhythmically transport a virtual cup with a rolling ball inside between two targets at a specified frequency, using a robotic interface. The cup and targets were displayed on a screen, and force feedback from the cup-andball dynamics was provided via the robotic manipulandum. Subjects were encouraged to explore and prepare the dynamics by “shaking” the cup-and-ball system to find the best initial conditions prior to the task. Two groups of subjects received the full haptic feedback about the cup-and-ball movement during the task; however, for one group the ball movement was visually occluded. Visual information about the ball movement had two distinctive effects on the performance: it reduced preparation time needed to understand the dynamics and, importantly, it led to simpler, more linear input-output interactions between hand and object. The results highlight how visual and haptic information regarding nonlinear internal dynamics have distinct roles for the interactive perception of complex objects. 
    more » « less