IS-GGT: Iterative Scene Graph Generation with Generative Transformers

Kundu, Sanjoy; Aakur, Sathyanarayanan N.

doi:10.1109/CVPR52729.2023.00609

Citation Details

IS-GGT: Iterative Scene Graph Generation with Generative Transformers

Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format. This representation has proven useful in several tasks, such as question answering, captioning, and even object detection, to name a few. Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene, which adds computational overhead to the approach. This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction. Using two transformer-based components, we first sample a possible scene graph structure from detected objects and their visual features. We then perform predicate classification on the sampled edges to generate the final scene graph. This approach allows us to efficiently generate scene graphs from images with minimal inference overhead. Extensive experiments on the Visual Genome dataset demonstrate the efficiency of the proposed approach. Without bells and whistles, we obtain, on average, 20.7% mean recall (mR@100) across different settings for scene graph generation (SGG), outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased SGG approaches. more »

Award ID(s):: 2348690 2348689

PAR ID:: 10491046

Author(s) / Creator(s):: Kundu, Sanjoy; Aakur, Sathyanarayanan N.

Publisher / Repository:: IEEE

Date Published:: 2023-06-01

Journal Name:: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

ISSN:: 2575-7075

ISBN:: 979-8-3503-0129-8

Page Range / eLocation ID:: 6292 to 6301

Format(s):: Medium: X

Location:: Vancouver, BC, Canada

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/CVPR52729.2023.00609

More Like this