skip to main content

This content will become publicly available on July 1, 2022

Title: Visual analogy: Deep learning versus compositional models
Is analogical reasoning a task that must be learned to solve from scratch by applying deep learning models to massive numbers of reasoning problems? Or are analogies solved by computing similarities between structured representations of analogs? We address this question by comparing human performance on visual analogies created using images of familiar three-dimensional objects (cars and their subregions) with the performance of alternative computational models. Human reasoners achieved above-chance accuracy for all problem types, but made more errors in several conditions (e.g., when relevant subregions were occluded). We compared human performance to that of two recent deep learning models (Siamese Network and Relation Network) directly trained to solve these analogy problems, as well as to that of a compositional model that assesses relational similarity between part-based representations. The compositional model based on part representations, but not the deep learning models, generated qualitative performance similar to that of human reasoners.
; ; ; ; ;
Fitch, T.; Lamm, C.; Leder, H.; Teßmar-Raible, K.
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the 43rd Annual Meeting of the Cognitive Science Society
Sponsoring Org:
National Science Foundation
More Like this
  1. By middle childhood, humans are able to learn abstract semantic relations (e.g., antonym, synonym, category membership) and use them to reason by analogy. A deep theoretical challenge is to show how such abstract relations can arise from nonrelational inputs, thereby providing key elements of a protosymbolic representation system. We have developed a computational model that exploits the potential synergy between deep learning from “big data” (to create semantic features for individual words) and supervised learning from “small data” (to create representations of semantic relations between words). Given as inputs labeled pairs of lexical representations extracted by deep learning, the modelmore »creates augmented representations by remapping features according to the rank of differences between values for the two words in each pair. These augmented representations aid in coping with the feature alignment problem (e.g., matching those features that make “love-hate” an antonym with the different features that make “rich-poor” an antonym). The model extracts weight distributions that are used to estimate the probabilities that new word pairs instantiate each relation, capturing the pattern of human typicality judgments for a broad range of abstract semantic relations. A measure of relational similarity can be derived and used to solve simple verbal analogies with human-level accuracy. Because each acquired relation has a modular representation, basic symbolic operations are enabled (notably, the converse of any learned relation can be formed without additional training). Abstract semantic relations can be induced by bootstrapping from nonrelational inputs, thereby enabling relational generalization and analogical reasoning.

    « less
  2. We report a first effort to model the solution of meaningful four-term visual analogies, by combining a machine-vision model (ResNet50-A) that can classify pixel-level images into object categories, with a cognitive model (BART) that takes semantic representations of words as input and identifies semantic relations instantiated by a word pair. Each model achieves above-chance performance in selecting the best analogical option from a set of four. However, combining the visual and the semantic models increases analogical performance above the level achieved by either model alone. The contribution of vision to reasoning thus may extend beyond simply generating verbal representations frommore »images. These findings provide a proof of concept that a comprehensive model can solve semantically-rich analogies from pixel-level inputs.« less
  3. There is a large gap between the ability of experts and students in grasping spatial concepts and representations. Engineering and the geosciences require the highest expertise in spatial thinking, and weak spatial skills are a significant barrier to success for many students [1]. Spatial skills are also highly malleable [2]; therefore, a current challenge is to identify how to promote students’ spatial thinking. Interdisciplinary research on how students think about spatially-demanding problems in the geosciences has identified several major barriers for students and interventions to help scaffold learning at a variety of levels from high school through upper level undergraduatemore »majors. The Geoscience Education Transdisciplinary Spatial Learning Network (GET-Spatial; is an NSF-funded collaboration between geoscientists, cognitive psychologists, and education researchers. Our goal is to help students overcome initial hurdles in reasoning about spatial problems in an effort to diversify the geoscience workforce. Examples of spatial problems in the fields of geochemistry include scaling, both in size and time; penetrative thinking to make inferences about internal structures from surface properties; and graph-reading, especially ternary diagrams. Understanding scales outside of direct human experience, both very large (e.g. cosmochemistry, deep time) and very small (e.g. mineralogy, nanoparticles) can be acutely difficult for students. However, interventions have successfully resulted in improvements to scale estimations and improve exam performance [3]. We will discuss best practices for developing effective interdisciplinary teams, and how to overcome challenges of working across disciplines and across grade levels. We will provide examples of spatial interventions in scaling and penetrative thinking. [1] Hegarty et al. (2010) in Spatial Cognition VII 6222, 85- 94. [2] Uttal et al. (2012) Psychology of Learning and Motivation 57, 147-181. [3] Resnick et al. (2016) Educational Psychology Review, 1-15.« less
  4. Compositional models represent patterns with hierarchies of meaningful parts and subparts. Their ability to characterize high-order relationships among body parts helps resolve low-level ambiguities in human pose estimation (HPE). However, prior compositional models make unrealistic assumptions on subpart-part relationships, making them incapable to characterize complex compositional patterns. Moreover, state spaces of their higher-level parts can be exponentially large, complicating both inference and learning. To address these issues, this paper introduces a novel framework, termed as Deeply Learned Compositional Model (DLCM), for HPE. It exploits deep neural networks to learn the compositionality of human bodies. This results in a novel networkmore »with a hierarchical compositional architecture and bottom-up/top-down inference stages. In addition, we propose a novel bone-based part representation. It not only compactly encodes orientations, scales and shapes of parts, but also avoids their potentially large state spaces. With significantly lower complexities, our approach outperforms state-of-the-art methods on three benchmark datasets.« less
  5. Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch ofmore »inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.« less