COHESIV: Contrastive Object and Hand Embeddings for Segmentation In Video

Shan, Dandan; Higgins, Richard E.L.; Fouhey, David F.

Citation Details

In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand’s motion explains other pixels’ motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets. more »

Award ID(s):: 2006619

NSF-PAR ID:: 10349042

Author(s) / Creator(s):: Shan, Dandan; Higgins, Richard E.L.; Fouhey, David F.

Date Published:: 2021-01-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this