A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle

Rubinstein, Jacob; Ferraro, Francis; Matuszek, Cynthia; Engel, Don

doi:10.1109/AIxVR59861.2024.00006

Citation Details

A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle

Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object. more »

Award ID(s):: 2145642 2024878

PAR ID:: 10511952

Author(s) / Creator(s):: Rubinstein, Jacob; Ferraro, Francis; Matuszek, Cynthia; Engel, Don

Publisher / Repository:: IEEE

Date Published:: 2024-01-17

Journal Name:: Proceedings of the IEEE Artificial Intelligence x Virtual Reality (AIxVR) Conference

ISBN:: 979-8-3503-7202-1

Page Range / eLocation ID:: 14 to 15

Format(s):: Medium: X

Location:: Los Angeles, CA, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/AIxVR59861.2024.00006

More Like this