skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A MAD method to assess idea novelty: Improving validity of automatic scoring using maximum associative distance (MAD)
Creativity research often relies on human raters to judge the novelty of participants’ responses on open-ended tasks, such as the Alternate Uses Task (AUT). Albeit useful, manual ratings are subjective and labor intensive. To address these limitations, researchers increasingly use automatic scoring methods based on a natural language processing technique for quantifying the semantic distance between words. However, many methodological choices remain open on how to obtain semantic distance scores for ideas, which can significantly impact reliability and validity. In this project, we propose a new semantic distance-based method, maximum associative distance (MAD), for assessing response novelty in AUT. Within a response, MAD uses the semantic distance of the word that is maximally remote from the prompt word to reflect response novelty. We compare the results from MAD with other competing semantic distance-based methods, including element-wise-multiplication—a commonly used compositional model—across three published datasets including a total of 447 participants. We found MAD to be more strongly correlated with human creativity ratings than the competing methods. In addition, MAD scores reliably predict external measures such as openness to experience. We further explored how idea elaboration affects the performance of various scoring methods and found that MAD is closely aligned with human raters in processing multi-word responses. The MAD method thus improves the psychometrics of semantic distance for automatic creativity assessment, and it provides clues about what human raters find creative about ideas.  more » « less
Award ID(s):
1920653
PAR ID:
10525784
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
American Psychological Association
Date Published:
Journal Name:
Psychology of Aesthetics, Creativity, and the Arts
ISSN:
1931-3896
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Creativity research requires assessing the quality of ideas and products. In practice, conducting creativity research often involves asking several human raters to judge participants’ responses to creativity tasks, such as judging the novelty of ideas from the alternate uses task (AUT). Although such subjective scoring methods have proved useful, they have two inherent limitations—labor cost (raters typically code thousands of responses) and subjectivity (raters vary on their perceptions and preferences)—raising classic psychometric threats to reliability and validity. We sought to address the limitations of subjective scoring by capitalizing on recent developments in automated scoring of verbal creativity via semantic distance, a computational method that uses natural language processing to quantify the semantic relatedness of texts. In five studies, we compare the top performing semantic models (e.g., GloVe, continuous bag of words) previously shown to have the highest correspondence to human relatedness judgements. We assessed these semantic models in relation to human creativity ratings from a canonical verbal creativity task (AUT; Studies 1–3) and novelty/creativity ratings from two word association tasks (Studies 4–5). We find that a latent semantic distance factor—comprised of the common variance from five semantic models—reliably and strongly predicts human creativity and novelty ratings across a range of creativity tasks. We also replicate an established experimental effect in the creativity literature (i.e., the serial order effect) and show that semantic distance correlates with other creativity measures, demonstrating convergent validity. We provide an open platform to efficiently compute semantic distance, including tutorials and documentation ( https://osf.io/gz4fc/ ). 
    more » « less
  2. Metaphor is crucial in human cognition and creativity, facilitating abstract thinking, analogical reasoning, and idea generation. Typically, human raters manually score the originality of responses to creative thinking tasks – a laborious and error-prone process. Previous research sought to remedy these risks by scoring creativity tasks automatically using semantic distance and large language models (LLMs). Here, we extend research on automatic creativity scoring to metaphor generation – the ability to creatively describe episodes and concepts using nonliteral language. Metaphor is arguably more abstract and naturalistic than prior targets of automated creativity assessment. We collected 4,589 responses from 1,546 participants to various metaphor prompts and corresponding human creativity ratings. We fine-tuned two open-source LLMs (RoBERTa and GPT-2) – effectively “teaching” them to score metaphors like humans – before testing their ability to accurately assess the creativity of new metaphors. Results showed both models reliably predicted new human creativity ratings (RoBERTa r = .72, GPT-2 r = .70), significantly more strongly than semantic distance (r = .42). Importantly, the fine-tuned models generalized accurately to metaphor prompts they had not been trained on (RoBERTa r = .68, GPT-2 r = .63). We provide open access to the fine-tuned models, allowing researchers to assess metaphor creativity in a reproducible and timely manner. 
    more » « less
  3. Scoring divergent thinking tasks opens multiple avenues and possibilities – decisions researchers have to make. While some scholars postulate that scoring should focus on the best ideas provided, the measurement of the best responses (e.g., “top scoring”) comes along with challenges. More specifically, compared to the average quality across all responses, top scoring uses less information—the “bad” ideas are thrown away—which decreases reliability. To resolve this issue, this article introduces a multidimensional top-scoring approach analogous to linear growth modeling which retains information provided by all responses (best ideas and “bad” ideas). Across two studies, using both subjective human ratings and semantic distance originality scoring of responses to over a dozen divergent thinking tasks, we demonstrated that Maximum (the best idea) and Top2 Scoring (two best ideas) could surpass typically applied average scoring in measurement precision when the “bad” ideas’ originality is used as auxiliary information (i.e., additional information in the analysis). We thus recommend retaining all ideas when scoring divergent thinking tasks, and we discuss the potential this new approach holds for creativity research and practice. 
    more » « less
  4. Creative divergent thinking involves the generation of unique ideas by pulling from semantic memory stores and exercising cognitive flexibility to shape these memories into something new. Although cognitive abilities such as episodic memory decline with age, semantic memory tends to remain intact. The current study aims to take advantage of older adults’ strength in semantic memory to investigate the effectiveness of a brief cognitive training to improve creative divergent thinking. Specifically, older adults were trained using a semantic retrieval strategy known as the disassembly strategy in order to improve creativity in the Alternate Uses Task (AUT), which involves generating original uses for objects. We also investigated whether this strategy would transfer to other creativity tasks, specifically, the Divergent Association Task (DAT). Participants were tested on the AUT and DAT across three time points in a single session: before the disassembly strategy was introduced (T0 and T1) and afterwards (T2). Results showed that the disassembly strategy enhances idea novelty in the AUT, though this enhancement did not transfer to DAT performance. Additionally, participants that initially scored lowest on the AUT at T0 showed the greatest increase in AUT performance at T2. This finding provides evidence that older adults can effectively use a semantic retrieval strategy to engage and enhance elements of creative divergent thinking. 
    more » « less
  5. null (Ed.)
    Design researchers have long sought to understand the mechanisms that support creative idea development. However, one of the key challenges faced by the design community is how to effectively measure the nebulous construct of creativity. The social science and engineering communities have adopted two vastly different approaches to solving this problem, both of which have been deployed throughout engineering design research. The goal of this paper was to compare and contrast these two approaches using design ratings of nearly 1000 engineering design ideas paired with a qualitative study with expert raters. The results of this study identify that while these two methods provide similar ratings of idea quality, there was a statistically significant negative relationship between these methods for ratings of idea novelty. Qualitative analysis of recordings from expert raters’ think aloud concept mapping points to potential sources of disagreement. In addition, the results show that while quasi-expert and expert raters provided similar ratings of design novelty, there was not significant agreement between these groups for ratings of design quality. The results of this study provide guidance for the deployment of idea ratings in engineering design research and evidence for the development and potential modification of engineering design creativity metrics. 
    more » « less