Embodied Scene Understanding for Vision Language Models via MetaVQA | NSF Public Access Repository

skip to main content

An official website of the United States government Here's how you know

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Citation Details

This content will become publicly available on June 11, 2026

Embodied Scene Understanding for Vision Language Models via MetaVQA

Award ID(s):: 2235012 2339769

PAR ID:: 10635725

Author(s) / Creator(s):: Wang, Weizhen; Duan, Chenda; Peng, Zhenghao; Liu, Yuxin; Zhou, Bolei

Publisher / Repository:: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Date Published:: 2025-06-11

Format(s):: Medium: X

Location:: Nashville TN

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 11, 2026
Conference Paper:
The DOI is not currently available.