skip to main content


Title: Automating creativity assessment with SemDis: An open platform for computing semantic distance
Abstract Creativity research requires assessing the quality of ideas and products. In practice, conducting creativity research often involves asking several human raters to judge participants’ responses to creativity tasks, such as judging the novelty of ideas from the alternate uses task (AUT). Although such subjective scoring methods have proved useful, they have two inherent limitations—labor cost (raters typically code thousands of responses) and subjectivity (raters vary on their perceptions and preferences)—raising classic psychometric threats to reliability and validity. We sought to address the limitations of subjective scoring by capitalizing on recent developments in automated scoring of verbal creativity via semantic distance, a computational method that uses natural language processing to quantify the semantic relatedness of texts. In five studies, we compare the top performing semantic models (e.g., GloVe, continuous bag of words) previously shown to have the highest correspondence to human relatedness judgements. We assessed these semantic models in relation to human creativity ratings from a canonical verbal creativity task (AUT; Studies 1–3) and novelty/creativity ratings from two word association tasks (Studies 4–5). We find that a latent semantic distance factor—comprised of the common variance from five semantic models—reliably and strongly predicts human creativity and novelty ratings across a range of creativity tasks. We also replicate an established experimental effect in the creativity literature (i.e., the serial order effect) and show that semantic distance correlates with other creativity measures, demonstrating convergent validity. We provide an open platform to efficiently compute semantic distance, including tutorials and documentation ( https://osf.io/gz4fc/ ).  more » « less
Award ID(s):
1920682
NSF-PAR ID:
10285441
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Behavior Research Methods
Volume:
53
Issue:
2
ISSN:
1554-3528
Page Range / eLocation ID:
757 to 780
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The visual modality is central to both reception and expression of human creativity. Creativity assessment paradigms, such as structured drawing tasks Barbot (2018), seek to characterize this key modality of creative ideation. However, visual creativity assessment paradigms often rely on cohorts of expert or naïve raters to gauge the level of creativity of the outputs. This comes at the cost of substantial human investment in both time and labor. To address these issues, recent work has leveraged the power of machine learning techniques to automatically extract creativity scores in the verbal domain (e.g., SemDis; Beaty & Johnson 2021). Yet, a comparably well-vetted solution for the assessment of visual creativity is missing. Here, we introduce AuDrA – an Automated Drawing Assessment platform to extract visual creativity scores from simple drawing productions. Using a collection of line drawings and human creativity ratings, we trained AuDrA and tested its generalizability to untrained drawing sets, raters, and tasks. Across four datasets, nearly 60 raters, and over 13,000 drawings, we found AuDrA scores to be highly correlated with human creativity ratings for new drawings on the same drawing task (r= .65 to .81; mean = .76). Importantly, correlations between AuDrA scores and human raters surpassed those between drawings’ elaboration (i.e., ink on the page) and human creativity raters, suggesting that AuDrA is sensitive to features of drawings beyond simple degree of complexity. We discuss future directions, limitations, and link the trained AuDrA model and a tutorial (https://osf.io/kqn9v/) to enable researchers to efficiently assess new drawings.

     
    more » « less
  2. Assessing similarity between design ideas is an inherent part of many design evaluations to measure novelty. In such evaluation tasks, humans excel at making mental connections among diverse knowledge sets and scoring ideas on their uniqueness. However, their decisions on novelty are often subjective and difficult to explain. In this paper, we demonstrate a way to uncover human judgment of design idea similarity using two dimensional idea maps. We derive these maps by asking humans for simple similarity comparisons of the form “Is idea A more similar to idea B or to idea C?” We show that these maps give insight into the relationships between ideas and help understand the domain. We also propose that the novelty of ideas can be estimated by measuring how far items are on these maps. We demonstrate our methodology through the experimental evaluations on two datasets of colored polygons (known answer) and milk frothers (unknown answer) sketches. We show that these maps shed light on factors considered by raters in judging idea similarity. We also show how maps change when less data is available or false/noisy ratings are provided. This method provides a new direction of research into deriving ground truth novelty metrics by combining human judgments and computational methods. 
    more » « less
  3. null (Ed.)
    Recent studies of creative cognition have revealed interactions between functional brain networks involved in the generation of novel ideas; however, the neural basis of creativity is highly complex and presents a great challenge in the field of cognitive neuroscience, partly because of ambiguity around how to assess creativity. We applied a novel computational method of verbal creativity assessment—semantic distance—and performed weighted degree functional connectivity analyses to explore how individual differences in assembly of resting-state networks are associated with this objective creativity assessment. To measure creative performance, a sample of healthy adults ( n = 175) completed a battery of divergent thinking (DT) tasks, in which they were asked to think of unusual uses for everyday objects. Computational semantic models were applied to calculate the semantic distance between objects and responses to obtain an objective measure of DT performance. All participants underwent resting-state imaging, from which we computed voxel-wise connectivity matrices between all gray matter voxels. A linear regression analysis was applied between DT and weighted degree of the connectivity matrices. Our analysis revealed a significant connectivity decrease in the visual-temporal and parietal regions, in relation to increased levels of DT. Link-level analyses showed higher local connectivity within visual regions was associated with lower DT, whereas projections from the precuneus to the right inferior occipital and temporal cortex were positively associated with DT. Our results demonstrate differential patterns of resting-state connectivity associated with individual creative thinking ability, extending past work using a new application to automatically assess creativity via semantic distance. 
    more » « less
  4. Semantic distance scoring provides an attractive alternative to other scoring approaches for responses in creative thinking tasks. In addition, evidence in support of semantic distance scoring has increased over the last few years. In one recent approach, it has been proposed to combine multiple semantic spaces to better balance the idiosyncratic influences of each space. Thereby, final semantic distance scores for each response are represented by a composite or factor score. However, semantic spaces are not necessarily equally weighted in mean scores, and the usage of factor scores requires high levels of factor determinacy (i.e., the correlation between estimates and true factor scores). Hence, in this work, we examined the weighting underlying mean scores, mean scores of standardized variables, factor loadings, weights that maximize reliability, and equally effective weights on common verbal creative thinking tasks. Both empirical and simulated factor determinacy, as well as Gilmer-Feldt’s composite reliability, were mostly good to excellent (i.e., > .80) across two task types (Alternate Uses and Creative Word Association), eight samples of data, and all weighting approaches. Person-level validity findings were further highly comparable across weighting approaches. Observed nuances and challenges of different weightings and the question of using composites vs. factor scores are thoroughly provided. 
    more » « less
  5. null (Ed.)
    Creativity is the driver of innovation in engineering. Hence, assessing the effectiveness of a curriculum, a method, or a technique in enhancing the creativity of engineering students is no doubt important. In this paper, the process involved in quantifying creativity when measured through the alternative uses task (AUT) is explained in detail. The AUT is a commonly used test for divergent thinking ability, which is a main aspect of creativity. Although it is commonly used, the processes used to score this task are far from standardized and tend to differ across studies. In this paper, we introduce these problems and move towards a standardized process by providing a detailed account of our quantification process. This quantification process takes into consideration four commonly used dimensions of creativity: originality, flexibility, fluency, and elaboration. AUT data from a preliminary case study were used to illustrate how the AUT and the quantification process can be used. The study was performed to understand the effect of the stereotype threat on the creativity of 25 female engineering students. The results indicate that after the stereotype threat intervention, participants generated more diverse and original ideas. 
    more » « less