skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Examining the Validity of Adaptive Comparative Judgment for Peer Evaluation in a Design Thinking Course
Adaptive comparative judgment (ACJ) is a holistic judgment approach used to evaluate the quality of something (e.g., student work) in which individuals are presented with pairs of work and select the better item from each pair. This approach has demonstrated high levels of reliability with less bias than other approaches, hence providing accurate values in summative and formative assessment in educational settings. Though ACJ itself has demonstrated significantly high reliability levels, relatively few studies have investigated the validity of peer-evaluated ACJ in the context of design thinking. This study explored peer-evaluation, facilitated through ACJ, in terms of construct validity and criterion validity (concurrent validity and predictive validity) in the context of a design thinking course. Using ACJ, undergraduate students ( n = 597) who took a design thinking course during Spring 2019 were invited to evaluate design point-of-view (POV) statements written by their peers. As a result of this ACJ exercise, each POV statement attained a specific parameter value, which reflects the quality of POV statements. In order to examine the construct validity, researchers conducted a content analysis, comparing the contents of the 10 POV statements with highest scores (parameter values) and the 10 POV statements with the lowest scores (parameter values)—as derived from the ACJ session. For the criterion validity, we studied the relationship between peer-evaluated ACJ and grader’s rubric-based grading. To study the concurrent validity, we investigated the correlation between peer-evaluated ACJ parameter values and grades assigned by course instructors for the same POV writing task. Then, predictive validity was studied by exploring if peer-evaluated ACJ of POV statements were predictive of students’ grades on the final project. Results showed that the contents of the statements with the highest parameter values were of better quality compared to the statements with the lowest parameter values. Therefore, peer-evaluated ACJ showed construct validity. Also, though peer-evaluated ACJ did not show concurrent validity, it did show moderate predictive validity.  more » « less
Award ID(s):
2101235
PAR ID:
10340842
Author(s) / Creator(s):
; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Education
Volume:
6
ISSN:
2504-284X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This research investigates students’ argumentation quality in engineering design thinking. We implemented Learning by Evaluating (LbE) using Adaptive Comparative Judgment (ACJ), where students assess pairs of items to determine the superior one. In ACJ, students provided rationales for their critiques, explaining their selections. Fifteen students participated in an LbE exercise before starting their backpack design projects, critically evaluating multiple backpack designs and producing 145 comments. Writing comments required students to discern and justify the superior design, fostering informed judgment and articulation of their reasoning. The study used the Claim, Evidence, and Reasoning (CER) framework, adapted for engineering design thinking, to analyse these critiques. The framework emphasized three aspects: Empathy (understanding user needs), Ideation (deriving design inspiration), and Insight (gaining valuable understanding from evaluated designs). We employed both deductive and inductive content analysis to evaluate the argumentation quality in students’ critiques. High-quality argumentation was identified based on six codes: user-focused empathy, design inspirations, logical rationalizations, multi-criteria evaluations, aesthetic considerations, and cultural awareness. Poor-quality argumentation lacked these elements and was characterized by vagueness, uncertainty, brevity, inappropriateness, irrelevance, gender bias, and cultural stereotyping. By identifying critical elements of effective argumentation and common challenges students may face, this study aims to enhance argumentation skills in engineering design thinking at the secondary education level. These insights are intended to help educators prepare students for insightful and successful argumentation in engineering design projects. 
    more » « less
  2. Adaptive comparative judgment (ACJ) has been widely used to evaluate classroom artifacts with reliability and validity. In the ACJ experience we examined, students were provided a pair of images related to backpack design. For each pair, students were required to select which image could help them ideate better. Then, they were prompted to provide a justification for their decision. Data were from 15 high school students taking engineering design courses. The current study investigated how students’ reasoning differed based on selection. Researchers analyzed the comments in two ways: (1) computer-aided quantitative content analysis and (2) qualitative content analysis. In the first analysis, we performed sentiment analysis and word frequency analysis using natural language processing. Based on the findings, we explored how the design thinking process was embedded in student reasoning, and if the reasoning varied depending on the claim. Results from sentiment analysis showed that students tend to reveal more strong positive sentiment with short comments when providing reasoning for the selected design. In contrast, when providing reasoning for those items not chosen, results showed a weaker negative sentiment with more detailed reasons. Findings from word frequency analysis showed that students valued the function of design as well as the user perspective, specifically, convenience. Additionally, students took aesthetic features of each design into consideration when identifying the best of the two pairs. Within the engineering design thinking context, we found students empathize by identifying themselves as users, define user’s needs, and ideate products from provided examples. 
    more » « less
  3. This Complete Research paper investigates the holistic assessment of creativity in design solutions in engineering education. Design is a key element in contemporary engineering education, given the emphasis on its development through the ABET criteria. As such, design projects play a central role in many first-year engineering courses. Creativity is a vital component of design capability which can influence design performance; however, it is difficult to measure through traditional assessment rubrics and holistic assessment approaches may be more suitable to assess creativity of design solutions. One such holistic assessment approach is Adaptive Comparative Judgement (ACJ). In this system, student designs are presented to judges in pairs, and they are asked to select the item of work that they deem to have demonstrated the greatest level of a specific criterion or set of criteria. Each judge is asked to make multiple judgements where the work they are presented with is adaptively paired in order to create a ranked order of all items in the sample. The use of this assessment approach in technology education has demonstrated high levels of reliability among judges (~0.9) irrespective of whether the judges are students or faculty. This research aimed to investigate the use of ACJ to holistically assess the creativity of first-year engineering students design solutions. The research also sought to explore the differences, if any, that would exist between the rank order produced by first-year engineering students and the faculty who regularly teach first-year students. Forty-six first-year engineering students and 23 faculty participated in this research. A separate ACJ session was carried out with each of these groups; however, both groups were asked to assess the same items of work. Participants were instructed to assess the creativity of 101 solutions to a design task, a “Ping Pong problem,” where undergraduate engineering students had been asked to design a ping pong ball launcher to meet specific criteria. In both ACJ sessions each item of work was included in at least 11 pairwise comparisons, with the maximum number of comparisons for a single item being 29 in the faculty ACJ session and 50 in the student ACJ session. The data from the ACJ sessions were analyzed to determine the reliability of using ACJ to assess creativity of design solutions in first-year engineering education, and to explore whether the rankings produced from the first-year engineering students ACJ session differed significantly from those of the faculty. The results indicate a reasonably high level of reliability in both sessions as measured by the Scale Separation Reliability (SSR) coefficient, SSRfaculty = 0.65 ± 0.02, SSRstudents = 0.71 ± 0.02. Further a strong correlation was observed between the ACJ ranks produced by the students and faculty both when considered in terms of the relative differences between items of work, r = .533, p < .001, and their absolute rank position, σ = .553, p < .001. These findings indicate that ACJ is a promising tool for holistically assessing design solutions in engineering education. Additionally, given the strong correlation between ranks of students and faculty, ACJ could be used to include students in their own assessment to reduce the faculty grading burden or to develop a shared construct of capability which could increase the alignment of teaching and learning. 
    more » « less
  4. Three studies developed and validated a linguistic dictionary to measure negative affective polarization in English and Spanish political texts. It captures three dimensions: negative affect, delegitimization, and political context. In the first study, two independent judges evaluated the candidate words, and reliability indicators were calculated, showing acceptable values for short texts (.572 in English, .541 in Spanish) and higher values for larger corpora (.964 in English, .957 in Spanish). The second study tested discriminant validity by comparing negative affective polarization scores in social media comments on politics and entertainment. Results showed significantly higher polarization scores in political content, confirming the dictionary's validity. The third study compared the dictionary to an existing online polarization measure, finding greater coverage and alignment with the construct. Additionally, it was observed that polarization scores were higher in texts containing hate speech compared to those where it was absent. The findings suggest that the dictionary in both languages have strong psychometric properties, making it a valuable tool for analyzing online content, particularly social media comments. It can be used as an independent measure or as input for machine and deep learning models. 
    more » « less
  5. Lam, Hon-Ming (Ed.)
    Peer review, commonly used in grant funding decisions, relies on scientists’ ability to evaluate research proposals’ quality. Such judgments are sometimes beyond reviewers’ discriminatory power and could lead to a reliance on subjective biases, including preferences for lower risk, incremental projects. However, peer reviewers’ risk tolerance has not been well studied. We conducted a cross-sectional experiment of peer reviewers’ evaluations of mock primary reviewers’ comments in which the level and sources of risks and weaknesses were manipulated. Here we show that proposal risks more strongly predicted reviewers’ scores than proposal strengths based on mock proposal evaluations. Risk tolerance was not predictive of scores but reviewer scoring leniency was predictive of overall and criteria scores. The evaluation of risks dominates reviewers’ evaluation of research proposals and is a source of inter-reviewer variability. These results suggest that reviewer scoring variability may be attributed to the interpretation of proposal risks, and could benefit from intervention to improve the reliability of reviews. Additionally, the valuation of risk drives proposal evaluations and may reduce the chances that risky, but highly impactful science, is supported. 
    more » « less