skip to main content


Title: Semantic Spaces Are Not Created Equal – How Should We Weigh Them in the Sequel?: On Composites in Automated Creativity Scoring
Semantic distance scoring provides an attractive alternative to other scoring approaches for responses in creative thinking tasks. In addition, evidence in support of semantic distance scoring has increased over the last few years. In one recent approach, it has been proposed to combine multiple semantic spaces to better balance the idiosyncratic influences of each space. Thereby, final semantic distance scores for each response are represented by a composite or factor score. However, semantic spaces are not necessarily equally weighted in mean scores, and the usage of factor scores requires high levels of factor determinacy (i.e., the correlation between estimates and true factor scores). Hence, in this work, we examined the weighting underlying mean scores, mean scores of standardized variables, factor loadings, weights that maximize reliability, and equally effective weights on common verbal creative thinking tasks. Both empirical and simulated factor determinacy, as well as Gilmer-Feldt’s composite reliability, were mostly good to excellent (i.e., > .80) across two task types (Alternate Uses and Creative Word Association), eight samples of data, and all weighting approaches. Person-level validity findings were further highly comparable across weighting approaches. Observed nuances and challenges of different weightings and the question of using composites vs. factor scores are thoroughly provided.  more » « less
Award ID(s):
1920653
NSF-PAR ID:
10352010
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
European Journal of Psychological Assessment
ISSN:
1015-5759
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Creativity research requires assessing the quality of ideas and products. In practice, conducting creativity research often involves asking several human raters to judge participants’ responses to creativity tasks, such as judging the novelty of ideas from the alternate uses task (AUT). Although such subjective scoring methods have proved useful, they have two inherent limitations—labor cost (raters typically code thousands of responses) and subjectivity (raters vary on their perceptions and preferences)—raising classic psychometric threats to reliability and validity. We sought to address the limitations of subjective scoring by capitalizing on recent developments in automated scoring of verbal creativity via semantic distance, a computational method that uses natural language processing to quantify the semantic relatedness of texts. In five studies, we compare the top performing semantic models (e.g., GloVe, continuous bag of words) previously shown to have the highest correspondence to human relatedness judgements. We assessed these semantic models in relation to human creativity ratings from a canonical verbal creativity task (AUT; Studies 1–3) and novelty/creativity ratings from two word association tasks (Studies 4–5). We find that a latent semantic distance factor—comprised of the common variance from five semantic models—reliably and strongly predicts human creativity and novelty ratings across a range of creativity tasks. We also replicate an established experimental effect in the creativity literature (i.e., the serial order effect) and show that semantic distance correlates with other creativity measures, demonstrating convergent validity. We provide an open platform to efficiently compute semantic distance, including tutorials and documentation ( https://osf.io/gz4fc/ ). 
    more » « less
  2. Abstract

    The current study addresses gaps in our understanding of the relationship between creative cognition, intelligence (IQ), and executive functioning (EF). Undergraduate students completed an IQ test, verbal and figural divergent thinking (DT) tests, and a self‐assessment of EF, across four study sessions. Participant data (N = 199) were analyzed using linear regression and PROCESS moderation models. Results demonstrated that EF interacts with IQ to predict figural and verbal DT in distinct ways, with different patterns emerging from different methods of scoring DT. Using traditional DT scoring,Gf(but notGc) significantly moderated the relationship between EF and scores on both verbal and figural DT tasks. Low EF was associated with diminished DT scores for those with lowGfscores, unrelated for those with relatively higherGf, and enhanced scores for those with the highestGf. Using originality ratio scores, low EF was associated with diminished originality in verbal DT responses for those with low IQ (bothGfandGc), unrelated for those with relatively higher IQ, and enhanced originality for those with the highestGc(but notGf) scores. Thus, there are several nuances in the way that EF interacts with IQ to predict DT.

     
    more » « less
  3. null (Ed.)
    Abstract Neuroimaging and transcranial direct current stimulation (tDCS) research has revealed that generating novel ideas is associated with both reductions and increases in prefrontal cortex (PFC) activity, and engagement of posterior occipital cortex, among other regions. However, there is substantial variability in the robustness of these tDCS‐induced effects due to heterogeneous sample sizes, different creativity measures, and methodological diversity in the application of tDCS across laboratories. To address these shortcomings, we used twelve different montages within a standardized tDCS protocol to investigate how altering activity in frontotemporal and occipital cortex impacts creative thinking. Across four experiments, 246 participants generated either the common or an uncommon use for 60 object pictures while undergoing tDCS. Participants also completed a control short-term memory task. We applied active tDCS for 20 min at 1.5 mA through two 5 cm × 5 cm electrodes over left or right ventrolateral prefrontal (areas F7, F8) or occipital (areas O1, O2) cortex, concurrent bilateral stimulation of these regions across polarities, or sham stimulation. Cathodal stimulation of the left, but not right, ventrolateral PFC improved fluency in creative idea generation, but had no effects on originality, as approximated by measures of semantic distance. No effects were obtained for the control tasks. Concurrent bilateral stimulation of the ventrolateral PFC regardless of polarity direction, and excitatory stimulation of occipital cortex did not alter task performance. Highlighting the importance of cross-experimental methodological consistency, these results extend our past findings and contribute to our understanding of the role of left PFC in creative thinking. 
    more » « less
  4. null (Ed.)
    Recent studies of creative cognition have revealed interactions between functional brain networks involved in the generation of novel ideas; however, the neural basis of creativity is highly complex and presents a great challenge in the field of cognitive neuroscience, partly because of ambiguity around how to assess creativity. We applied a novel computational method of verbal creativity assessment—semantic distance—and performed weighted degree functional connectivity analyses to explore how individual differences in assembly of resting-state networks are associated with this objective creativity assessment. To measure creative performance, a sample of healthy adults ( n = 175) completed a battery of divergent thinking (DT) tasks, in which they were asked to think of unusual uses for everyday objects. Computational semantic models were applied to calculate the semantic distance between objects and responses to obtain an objective measure of DT performance. All participants underwent resting-state imaging, from which we computed voxel-wise connectivity matrices between all gray matter voxels. A linear regression analysis was applied between DT and weighted degree of the connectivity matrices. Our analysis revealed a significant connectivity decrease in the visual-temporal and parietal regions, in relation to increased levels of DT. Link-level analyses showed higher local connectivity within visual regions was associated with lower DT, whereas projections from the precuneus to the right inferior occipital and temporal cortex were positively associated with DT. Our results demonstrate differential patterns of resting-state connectivity associated with individual creative thinking ability, extending past work using a new application to automatically assess creativity via semantic distance. 
    more » « less
  5. Abstract

    Argumentation, a key scientific practice presented in theFramework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexityandDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructureand model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.

     
    more » « less