skip to main content

Title: Seeing the meaning: Vision meets semantics in solving pictorial analogy problems
We report a first effort to model the solution of meaningful four-term visual analogies, by combining a machine-vision model (ResNet50-A) that can classify pixel-level images into object categories, with a cognitive model (BART) that takes semantic representations of words as input and identifies semantic relations instantiated by a word pair. Each model achieves above-chance performance in selecting the best analogical option from a set of four. However, combining the visual and the semantic models increases analogical performance above the level achieved by either model alone. The contribution of vision to reasoning thus may extend beyond simply generating verbal representations from images. These findings provide a proof of concept that a comprehensive model can solve semantically-rich analogies from pixel-level inputs.
Authors:
; ; ; ;
Award ID(s):
1827374
Publication Date:
NSF-PAR ID:
10093529
Journal Name:
Proceedings of the Annual Conference of the Cognitive Science Society
ISSN:
1069-7977
Sponsoring Org:
National Science Foundation
More Like this
  1. Fitch, T. ; Lamm, C. ; Leder, H. ; Teßmar-Raible, K. (Ed.)
    Is analogical reasoning a task that must be learned to solve from scratch by applying deep learning models to massive numbers of reasoning problems? Or are analogies solved by computing similarities between structured representations of analogs? We address this question by comparing human performance on visual analogies created using images of familiar three-dimensional objects (cars and their subregions) with the performance of alternative computational models. Human reasoners achieved above-chance accuracy for all problem types, but made more errors in several conditions (e.g., when relevant subregions were occluded). We compared human performance to that of two recent deep learning models (Siamesemore »Network and Relation Network) directly trained to solve these analogy problems, as well as to that of a compositional model that assesses relational similarity between part-based representations. The compositional model based on part representations, but not the deep learning models, generated qualitative performance similar to that of human reasoners.« less
  2. By middle childhood, humans are able to learn abstract semantic relations (e.g., antonym, synonym, category membership) and use them to reason by analogy. A deep theoretical challenge is to show how such abstract relations can arise from nonrelational inputs, thereby providing key elements of a protosymbolic representation system. We have developed a computational model that exploits the potential synergy between deep learning from “big data” (to create semantic features for individual words) and supervised learning from “small data” (to create representations of semantic relations between words). Given as inputs labeled pairs of lexical representations extracted by deep learning, the modelmore »creates augmented representations by remapping features according to the rank of differences between values for the two words in each pair. These augmented representations aid in coping with the feature alignment problem (e.g., matching those features that make “love-hate” an antonym with the different features that make “rich-poor” an antonym). The model extracts weight distributions that are used to estimate the probabilities that new word pairs instantiate each relation, capturing the pattern of human typicality judgments for a broad range of abstract semantic relations. A measure of relational similarity can be derived and used to solve simple verbal analogies with human-level accuracy. Because each acquired relation has a modular representation, basic symbolic operations are enabled (notably, the converse of any learned relation can be formed without additional training). Abstract semantic relations can be induced by bootstrapping from nonrelational inputs, thereby enabling relational generalization and analogical reasoning.

    « less
  3. Analogical reasoning fundamentally involves exploiting redundancy in a given task, but there are many different ways an intelligent agent can choose to define and exploit redundancy, often resulting in very different levels of task performance. We explore such variations in analogical reasoning within the domain of geometric matrix reasoning tasks, namely on the Raven’s Standard Progressive Matrices intelligence test. We show how different analogical constructions used by the same basic visual-imagery-based computational model—varying only in how they “slice” a matrix problem into parts and do search and optimization within/across these parts—achieve very different levels of test performance, ranging from 13/60more »correct all the way up to 57/60 correct. Our findings suggest that the ability to select or build effective high-level analogical constructions can be as important as an agent’s competencies in low-level reasoning skills, which raises interesting open questions about the extent to which building the “right” analogies might contribute to individual differences in human matrix reasoning performance, and how intelligent agents might learn to build or select from among different analogical constructions in the first place.« less
  4. We see the external world as consisting not only of objects and their parts, but also of relations that hold between them. Visual analogy, which depends on similarities between relations, provides a clear example of how perception supports reasoning. Here we report an experiment in which we quantitatively measured the human ability to find analogical mappings between parts of different objects, where the objects to be compared were drawn either from the same category (e.g., images of two mammals, such as a dog and a horse), or from two dissimilar categories (e.g., a chair image mapped to a cat image).more »Humans showed systematic mapping patterns, but with greater variability in mapping responses when objects were drawn from dissimilar categories. We simulated the human response of analogical mapping using a computational model of mapping between 3D objects, visiPAM (visual Probabilistic Analogical Mapping). VisiPAM takes point-cloud representations of two 3D objects as inputs, and outputs the mapping between analogous parts of the two objects. VisiPAM consists of a visual module that constructs structural representations of individual objects, and a reasoning module that identifies a probabilistic mapping between parts of the two 3D objects. Model simulations not only capture the qualitative pattern of human mapping performance cross conditions, but also approach human-level reliability in solving visual analogy problems.« less
  5. Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models have shown tremendous progress towards adapting to new environments by focusing either on discovering domain invariant representations or by mapping between unpaired image domains. While feature space methods are difficult to interpret and sometimes fail to capture pixel-level and low-level domain shifts, image space methods sometimes fail to incorporate high level semantic knowledge relevant for the end task. We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. Our approach, Cycle-Consistent Adversarial Domain Adaptation (CyCADA), guides transfer betweenmore »domains according to a specific discriminatively trained task and avoids divergence by enforcing consistency of the relevant semantics before and after adaptation. We evaluate our method on a variety of visual recognition and prediction settings, including digit classification and semantic segmentation of road scenes, advancing state-of-the-art performance for unsupervised adaptation from synthetic to real world driving domains.« less