skip to main content


Title: COHESIV: Contrastive Object and Hand Embeddings for Segmentation In Video
In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand’s motion explains other pixels’ motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.  more » « less
Award ID(s):
2006619
NSF-PAR ID:
10349042
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in neural information processing systems
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Principles from human-human physical interaction may be necessary to design more intuitive and seamless robotic devices to aid human movement. Previous studies have shown that light touch can aid balance and that haptic communication can improve performance of physical tasks, but the effects of touch between two humans on walking balance has not been previously characterized. This study examines physical interaction between two persons when one person aids another in performing a beam-walking task. 12 pairs of healthy young adults held a force sensor with one hand while one person walked on a narrow balance beam (2 cm wide x 3.7 m long) and the other person walked overground by their side. We compare balance performance during partnered vs. solo beam-walking to examine the effects of haptic interaction, and we compare hand interaction mechanics during partnered beam-walking vs. overground walking to examine how the interaction aided balance. While holding the hand of a partner, participants were able to walk further on the beam without falling, reduce lateral sway, and decrease angular momentum in the frontal plane. We measured small hand force magnitudes (mean of 2.2 N laterally and 3.4 N vertically) that created opposing torque components about the beam axis and calculated the interaction torque, the overlapping opposing torque that does not contribute to motion of the beam-walker’s body. We found higher interaction torque magnitudes during partnered beam-walking vs . partnered overground walking, and correlation between interaction torque magnitude and reductions in lateral sway. To gain insight into feasible controller designs to emulate human-human physical interactions for aiding walking balance, we modeled the relationship between each torque component and motion of the beam-walker’s body as a mass-spring-damper system. Our model results show opposite types of mechanical elements (active vs . passive) for the two torque components. Our results demonstrate that hand interactions aid balance during partnered beam-walking by creating opposing torques that primarily serve haptic communication, and our model of the torques suggest control parameters for implementing human-human balance aid in human-robot interactions. 
    more » « less
  2. Motion tracking interfaces are intuitive for free-form teleoperation tasks. However, efficient manipulation control can be difficult with such interfaces because of issues like the interference of unintended motions and the limited precision of human motion control. The limitation in control efficiency reduces the operator's performance and increases their workload and frustration during robot teleoperation. To improve the efficiency, we proposed separating controlled degrees of freedom (DoFs) and adjusting the motion scaling ratio of a motion tracking interface. The motion tracking of handheld controllers from a Virtual Reality system was used for the interface. We separated the translation and rotational control into: 1) two controllers held in the dominant and non-dominant hands and 2) hand pose tracking and trackpad inputs of a controller. We scaled the control mapping ratio based on 1) the environmental constraints and 2) the teleoperator's control speed. We further conducted a user study to investigate the effectiveness of the proposed methods in increasing efficiency. Our results show that the separation of position and orientation control into two controllers and the environment-based scaling methods perform better than their alternatives. 
    more » « less
  3. Abstract Objectives

    The main objective was to test the hypothesis of a neuromechanical link in humans between the head and forearm during running mediated by the biceps brachii and superior trapezius muscles. We hypothesized that this linkage helps stabilize the head and combats rapid forward pitching during running which may interfere with gaze stability.

    Materials and methods

    Thirteen human participants walked and ran on a treadmill while motion capture recorded body segment kinematics and electromyographic sensors recorded muscle activation. To test perturbations to the linkage system we compared participants running normally as well as with added mass to the face and the hand.

    Results

    The results confirm the presence of a neuromechanical linkage between the head and forearm mediated by the biceps and superior trapezius during running but not during walking. In running, the biceps and superior trapezius activations were temporally linked during the stride cycle, and adding mass to either the head or hand increased activation in both muscles, consistent with our hypothesis. During walking the forces acting on the body segments and muscle activation levels were much smaller than during running, indicating no need for a linkage to keep the head and gaze stable.

    Discussion

    The results suggest that the evolution of long distance running in earlyHomomay have favored selection for reduced rotational inertia of both the head and forearm through synergistic muscle activation, contributing to the transition from australopith head and forelimb morphology to the more human‐like form ofHomo erectus. Selective pressures from the evolution of bipedal walking were likely much smaller, but may explain in part the intermediate form of the australopith scapula between that of extant apes and humans.

     
    more » « less
  4. This paper proposes SquiggleMilli, a system that approximates traditional Synthetic Aperture Radar (SAR) imaging on mobile millimeter-wave (mmWave) devices. The system is capable of imaging through obstructions, such as clothing, and under low visibility conditions. Unlike traditional SAR that relies on mechanical controllers or rigid bodies, SquiggleMilli is based on the hand-held, fluidic motion of the mmWave device. It enables mmWave imaging in hand-held settings by re-thinking existing motion compensation, compressed sensing, and voxel segmentation. Since mmWave imaging suffers from poor resolution due to specularity and weak reflectivity, the reconstructed shapes could be imperceptible by machines and humans. To this end, SquiggleMilli designs a machine learning model to recover the high spatial frequencies in the object to reconstruct an accurate 2D shape and predict its 3D features and category. We have customized SquiggleMilli for security applications, but the model is adaptable to other applications with limited training samples. We implement SquiggleMilli on off-the-shelf components and demonstrate its performance improvement over the traditional SAR qualitatively and quantitatively. 
    more » « less
  5. Electron microscopy images of carbon nanotube (CNT) forests are difficult to segment due to the long and thin nature of the CNTs; density of the CNT forests resulting in CNTs touching, crossing, and occluding each other; and low signal-to-noise ratio electron microscopy imagery. In addition, due to image complexity, it is not feasible to prepare training segmentation masks. In this paper, we propose CNTSegNet, a dual loss, orientation-guided, self-supervised, deep learning network for CNT forest segmentation in scanning electron microscopy (SEM) images. Our training labels consist of weak segmentation labels produced by intensity thresholding of the raw SEM images and self labels produced by estimating orientation distribution of CNTs in these raw images. The proposed network extends a U-net-like encoder-decoder architecture with a novel two-component loss function. The first component is dice loss computed between the predicted segmentation maps and the weak segmentation labels. The second component is mean squared error (MSE) loss measuring the difference between the orientation histogram of the predicted segmentation map and the original raw image. Weighted sum of these two loss functions is used to train the proposed CNTSegNet network. The dice loss forces the network to perform background-foreground segmentation using local intensity features. The MSE loss guides the network with global orientation features and leads to refined segmentation results. The proposed system needs only a few-shot dataset for training. Thanks to it’s self-supervised nature, it can easily be adapted to new datasets. 
    more » « less