MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

Xu, Guangyue; Kordjamshidi, Parisa; Chai, Joyce

doi:10.18653/v1/2023.findings-emnlp.818

Citation Details

MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

Humans have the ability to learn novel compositional concepts by recalling and generalizing primitive concepts acquired from past experiences. Inspired by this observation, in this paper, we propose MetaReVision, a retrievalenhanced meta-learning model to address the visually grounded compositional concept learning problem. The proposed MetaReVision consists of a retrieval module and a metalearning module which are designed to incorporate retrieved primitive concepts as a supporting set to meta-train vision-language models for grounded compositional concept recognition. Through meta-learning from episodes constructed by the retriever, MetaReVision learns a generic compositional representation that can be fast updated to recognize novel compositional concepts. We create CompCOCO and CompFlickr to benchmark the grounded compositional concept learning. Our experimental results show that MetaReVision outperforms other competitive baselines and the retrieval module plays an important role in this compositional learning process. more »

Award ID(s):: 2028626

PAR ID:: 10547200

Author(s) / Creator(s):: Xu, Guangyue; Kordjamshidi, Parisa; Chai, Joyce

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2023-01-01

Page Range / eLocation ID:: 12224 to 12236

Format(s):: Medium: X

Location:: Singapore

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2023.findings-emnlp.818

More Like this