skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Russakovsky, O."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Does language help make sense of the visual world? How important is it to actually see the world rather than having it described with words? These basic questions about the na- ture of intelligence have been difficult to answer because we only had one example of an intelligent system – humans – and limited access to cases that isolated language or vision. How- ever, the development of sophisticated Vision-Language Mod- els (VLMs) by artificial intelligence researchers offers us new opportunities to explore the contributions that language and vi- sion make to learning about the world. We ablate components from the cognitive architecture of these models to identify their contributions to learning new tasks from limited data. We find that a language model leveraging all components recovers a majority of a VLM’s performance, despite its lack of visual in- put, and that language seems to allow this by providing access to prior knowledge and reasoning. 
    more » « less
    Free, publicly-accessible full text available July 24, 2025
  2. For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learn- ing by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated with the performance of visual classifi- cation and captioning systems, over and above the expected effects of word frequency. The performance of the computer vision systems is correlated with human judgments of the con- creteness of words, which are in turn a predictor of children’s word learning, suggesting that these models are capturing the relationship between words and visual phenomena. 
    more » « less