Exploring CLIP for Real World, Text-based Image Retrieval

Sultan, Manal; Jacobs, Lia; Stylianou, Abby; Pless, Robert

doi:10.1109/aipr60534.2023.10440710

Citation Details

Exploring CLIP for Real World, Text-based Image Retrieval

Abstract—We consider the ability of CLIP features to support text-driven image retrieval. Traditional image-based queries sometimes misalign with user intentions due to their focus on irrelevant image components. To overcome this, we explore the potential of text-based image retrieval, specifically using Contrastive Language-Image Pretraining (CLIP) models. CLIP models, trained on large datasets of image-caption pairs, offer a promising approach by allowing natural language descriptions for more targeted queries. We explore the effectiveness of textdriven image retrieval based on CLIP features by evaluating the image similarity for progressively more detailed queries. We find that there is a sweet-spot of detail in the text that gives best results and find that words describing the “tone” of a scene (such as messy, dingy) are quite important in maximizing text-image similarity. more »

Award ID(s):: 2125677

PAR ID:: 10521628

Author(s) / Creator(s):: Sultan, Manal; Jacobs, Lia; Stylianou, Abby; Pless, Robert

Publisher / Repository:: IEEE

Date Published:: 2023-09-27

ISBN:: 979-8-3503-5952-7

Page Range / eLocation ID:: 1 to 6

Format(s):: Medium: X

Location:: St. Louis, MO, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/aipr60534.2023.10440710

More Like this