skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting Word Learning in Children from the Performance of Computer Vision Systems
For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learn- ing by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated with the performance of visual classifi- cation and captioning systems, over and above the expected effects of word frequency. The performance of the computer vision systems is correlated with human judgments of the con- creteness of words, which are in turn a predictor of children’s word learning, suggesting that these models are capturing the relationship between words and visual phenomena.  more » « less
Award ID(s):
2107048
PAR ID:
10542039
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Proceedings of the Annual Meeting of the Cognitive Science Society
Date Published:
ISSN:
1069-7977
Format(s):
Medium: X
Location:
Sydney, Australia
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate the roles of linguistic and sensory experience in the early-produced visual, auditory, and abstract words of congenitally-blind toddlers, deaf toddlers, and typicallysighted/ hearing peers. We also assess the role of language access by comparing early word production in children learning English or American Sign Language (ASL) from birth, versus at a delay. Using parental report data on child word production from the MacArthur-Bates Communicative Development Inventory, we found evidence that while children produced words referring to imperceptible referents before age 2, such words were less likely to be produced relative to words with perceptible referents. For instance, blind (vs. sighted) children said fewer highly visual words like “blue” or “see”; deaf signing (vs. hearing) children produced fewer auditory signs like HEAR. Additionally, in spoken English and ASL, children who received delayed language access were less likely to produce words overall. These results demonstrate and begin to quantify how linguistic and sensory access may influence which words young children produce. 
    more » « less
  2. Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations for both image and non-image datasets. Our learned representations significantly improve performance in downstream classification tasks and, similarly to word vectors, allow visual analogies to be obtained via simple arithmetic in the latent space. 
    more » « less
  3. The ability to process social information is a critical component of children’s early language and cognitive development. However, as children reach their first birthday, they begin to locomote themselves, dramatically affecting their visual access to this information. How do these postural and locomotor changes affect children’s access to the social information relevant for word-learning? Here, we explore this question by using head-mounted cameras to record 36 infants’ (8-16 months of age) egocentric visual perspective and use computer vision algorithms to estimate the proportion of faces and hands in infants’ environments. We find that infants’ posture and orientation to their caregiver modulates their access to social information, confirming previous work that suggests motoric developments play a significant role in the emergence of children’s linguistic and social capacities. We suggest that the combined use of head-mounted cameras and the application of new computer vision techniques is a promising avenue for understanding the statistics of infants’ visual and linguistic experience. 
    more » « less
  4. Holistic processing (HP) of faces refers to the obligatory, simultaneous processing of the parts and their relations, and it emerges over the course of development. HP is manifest in a decrement in the perception of inverted versus upright faces and a reduction in face processing ability when the relations between parts are perturbed. Here, adopting the HP framework for faces, we examined the developmental emergence of HP in another domain for which human adults have expertise, namely, visual word processing. Children, adolescents, and adults performed a lexical decision task and we used two established signatures of HP for faces: the advantage in perception of upright over inverted words and nonwords and the reduced sensitivity to increasing parts (word length). Relative to the other groups, children showed less of an advantage for upright versus inverted trials and lexical decision was more affected by increasing word length. Performance on these HP indices was strongly associated with age and with reading proficiency. Also, the emergence of HP for word perception was not simply a result of improved visual perception over the course of development as no group differences were observed on an object decision task. These results reveal the developmental emergence of HP for orthographic input, and reflect a further instance of experience-dependent tuning of visual perception. These results also add to existing findings on the commonalities of mechanisms of word and face recognition. 
    more » « less
  5. It is well-known that children rapidly learn words, following a range of heuristics. What is less well appreciated is that – because most words are polysemous and have multiple meanings (e.g., ‘glass’ can label a material and drinking vessel) – children will often be learning a new meaning for a known word, rather than an entirely new word. Across four experiments we show that children flexibly adapt a well-known heuristic – the shape bias – when learning polysemous words. Consistent with previous studies, we find that children and adults preferentially extend a new object label to other objects of the same shape. But we also find that when a new word for an object (‘a gup’) has previously been used to label the material composing that object (‘some gup’), children and adults override the shape bias, and are more likely to extend the object label by material (Experiments 1 and 3). Further, we find that, just as an older meaning of a polysemous word constrains interpretations of a new word meaning, encountering a new word meaning leads learners to update their interpretations of an older meaning (Experiment 2). Finally, we find that these effects only arise when learners can perceive that a word’s meanings are related, not when they are arbitrarily paired (Experiment 4). Together, these findings show that children can exploit cues from polysemy to infer how new word meanings should be extended, suggesting that polysemy may facilitate word learning and invite children to construe categories in new ways. 
    more » « less