skip to main content


Title: Examining the Validity of Adaptive Comparative Judgment for Peer Evaluation in a Design Thinking Course
Adaptive comparative judgment (ACJ) is a holistic judgment approach used to evaluate the quality of something (e.g., student work) in which individuals are presented with pairs of work and select the better item from each pair. This approach has demonstrated high levels of reliability with less bias than other approaches, hence providing accurate values in summative and formative assessment in educational settings. Though ACJ itself has demonstrated significantly high reliability levels, relatively few studies have investigated the validity of peer-evaluated ACJ in the context of design thinking. This study explored peer-evaluation, facilitated through ACJ, in terms of construct validity and criterion validity (concurrent validity and predictive validity) in the context of a design thinking course. Using ACJ, undergraduate students ( n = 597) who took a design thinking course during Spring 2019 were invited to evaluate design point-of-view (POV) statements written by their peers. As a result of this ACJ exercise, each POV statement attained a specific parameter value, which reflects the quality of POV statements. In order to examine the construct validity, researchers conducted a content analysis, comparing the contents of the 10 POV statements with highest scores (parameter values) and the 10 POV statements with the lowest scores (parameter values)—as derived from the ACJ session. For the criterion validity, we studied the relationship between peer-evaluated ACJ and grader’s rubric-based grading. To study the concurrent validity, we investigated the correlation between peer-evaluated ACJ parameter values and grades assigned by course instructors for the same POV writing task. Then, predictive validity was studied by exploring if peer-evaluated ACJ of POV statements were predictive of students’ grades on the final project. Results showed that the contents of the statements with the highest parameter values were of better quality compared to the statements with the lowest parameter values. Therefore, peer-evaluated ACJ showed construct validity. Also, though peer-evaluated ACJ did not show concurrent validity, it did show moderate predictive validity.  more » « less
Award ID(s):
2101235
NSF-PAR ID:
10340842
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Frontiers in Education
Volume:
6
ISSN:
2504-284X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Adoption of new instructional standards in science demands high-quality information about classroom practice. Teacher portfolios can be used to assess instructional practice and support teacher self-reflection anchored in authentic evidence from classrooms. This study investigated a new type of electronic portfolio tool that allows efficient capture of classroom artifacts in multimedia formats using mobile devices. We assess the psychometric properties of measures of quality instruction in middle school science classrooms derived from the contents of portfolios collected using this novel tool—with instruction operationalized through dimensions aligned to the Next Generation Science Standards. Results reflect low rater error and adequate reliability for several dimensions, a dominant underlying factor, and significant relations to some relevant concurrent indicators. Although no relation was found to student standardized test scores or course grades, portfolio ratings did relate to student self-efficacy perceptions and enjoyment of science. We examine factors influencing measurement error, and consider the broader implications of the results for assessing the validity of portfolio score interpretations, and the feasibility and potential value of this type of tool for summative and formative uses, in the context of large-scale instructional improvement efforts.

     
    more » « less
  2. null (Ed.)
    The purpose of this study is to re-examine the validity evidence of the engineering design self-efficacy (EDSE) scale scores by Carberry et al. (2010) within the context of secondary education. Self-efficacy refers to individuals’ belief in their capabilities to perform a domain-specific task. In engineering education, significant efforts have been made to understand the role of self-efficacy for students considering its positive impact on student outcomes such as performance and persistence. These studies have investigated and developed measures for different domains of engineering self-efficacy (e.g., general academic, domain-general, and task-specific self-efficacy). The EDSE scale is a frequently cited measure that examines task-specific self-efficacy within the domain of engineering design. The original scale contains nine items that are intended to represent the engineering design process. Initial score validity evidence was collected using a sample consisting of 202 respondents with varying degrees of engineering experience including undergraduate/graduate students and faculty members. This scale has been primarily used by researchers and practitioners with engineering undergraduate students to assess changes in their engineering design self-efficacy as a result of active learning interventions, such as project-based learning. Our work has begun to experiment using the scale in a secondary education context in conjunction with an increased introduction to engineering in K-12 education. Yet, there still is a need to examine score validity and reliability of this scale in non-undergraduate populations such as secondary school student populations. This study fills this important gap by testing construct validity of the original nine items of the EDSE scale, supporting proper use of the scale for researchers and practitioners. This study was conducted as part of a larger, e4usa project investigating the development and implementation of a yearlong project-based engineering design course for secondary school students. Evidence of construct validity and reliability was collected using a multi-step process. First, a survey that includes the EDSE scale was administered to the project participating students at nine associated secondary schools across the US at the beginning of Spring 2020. Analysis of collected data is in progress and includes Exploratory Factor Analysis (EFA) on the 137 responses. The evidence of score reliability will be obtained by computing the internal consistency of each resulting factor. The resulting factor structure and items will be analyzed by comparing it with the original EDSE scale. The full paper will provide details about the psychometric evaluation of the EDSE scale. The findings from this paper will provide insights on the future usage of the EDSE scale in the context of secondary engineering education. 
    more » « less
  3. The purpose of this study is to develop an instrument to measure student perceptions about the learning experiences in their online undergraduate engineering courses. Online education continues to grow broadly in higher education, but the movement toward acceptance and comprehensive utilization of online learning has generally been slower in engineering. Recently, however, there have been indicators that this could be changing. For example, ABET has accredited online undergraduate engineering degrees at Stony Brook University and Arizona State University (ASU), and an increasing number of other undergraduate engineering programs also offer online courses. During this period of transition in engineering education, further investigation about the online modality in the context of engineering education is needed, and survey instrumentation can support such investigations. The instrument presented in this paper is grounded in a Model for Online Course-level Persistence in Engineering (MOCPE), which was developed by our research team by combining two motivational frameworks used to study student persistence: the Expectancy x Value Theory of Achievement Motivation (EVT), and the ARCS model of motivational design. The initial MOCPE instrument contained 79 items related to students’ perceptions about the characteristics of their courses (i.e., the online learning management system, instructor practices, and peer support), expectancies of course success, course task values, perceived course difficulties, and intention to persist in the course. Evidence of validity and reliability was collected using a three-step process. First, we tested face and content validity of the instrument with experts in online engineering education and online undergraduate engineering students. Next, the survey was administered to the online undergraduate engineering student population at a large, Southwestern public university, and an exploratory factor analysis (EFA) was conducted on the responses. Lastly, evidence of reliability was obtained by computing the internal consistency of each resulting scale. The final instrument has seven scales with 67 items across 10 factors. The Cronbach alpha values for these scales range from 0.85 to 0.97. The full paper will provide complete details about the development and psychometric evaluation of the instrument, including evidence of and reliability. The instrument described in this paper will ultimately be used as part of a larger, National Science Foundation-funded project investigating the factors influencing online undergraduate engineering student persistence. It is currently being used in the context of this project to conduct a longitudinal study intended to understand the relationships between the experiences of online undergraduate engineering students in their courses and their intentions to persist in the course. We anticipate that the instrument will be of interest and use to other engineering education researchers who are also interested in studying the population of online students. 
    more » « less
  4. Abstract Background Individuals with hemiparesis post-stroke often have difficulty with tasks requiring upper extremity (UE) intra- and interlimb use, yet methods to quantify both are limited. Objective To develop a quantitative yet sensitive method to identify distinct features of UE intra- and interlimb use during task performance. Methods Twenty adults post-stroke and 20 controls wore five inertial sensors (wrists, upper arms, sternum) during 12 seated UE tasks. Three sensor modalities (acceleration, angular rate of change, orientation) were examined for three metrics (peak to peak amplitude, time, and frequency). To allow for comparison between sensor data, the resultant values were combined into one motion parameter, per sensor pair, using a novel algorithm. This motion parameter was compared in a group-by-task analysis of variance as a similarity score (0–1) between key sensor pairs: sternum to wrist, wrist to wrist, and wrist to upper arm. A use ratio (paretic/non-paretic arm) was calculated in persons post-stroke from wrist sensor data for each modality and compared to scores from the Adult Assisting Hand Assessment (Ad-AHA Stroke) and UE Fugl-Meyer (UEFM). Results A significant group × task interaction in the similarity score was found for all key sensor pairs. Post-hoc tests between task type revealed significant differences in similarity for sensor pairs in 8/9 comparisons for controls and 3/9 comparisons for persons post stroke. The use ratio was significantly predictive of the Ad-AHA Stroke and UEFM scores for each modality. Conclusions Our algorithm and sensor data analyses distinguished task type within and between groups and were predictive of clinical scores. Future work will assess reliability and validity of this novel metric to allow development of an easy-to-use app for clinicians. 
    more » « less
  5. Abstract

    We present a multiple‐choice test, the Montana State University Formal Reasoning Test (FORT), to assess college students' scientific reasoning ability. The test defines scientific reasoning to be equivalent to formal operational reasoning. It contains 20 questions divided evenly among five types of problems: control of variables, hypothesis testing, correlational reasoning, proportional reasoning, and probability. The test development process included the drafting and psychometric analysis of 23 instruments related to formal operational reasoning. These instruments were administered to almost 10,000 students enrolled in introductory science courses at American universities. Questions with high discrimination were identified and assembled into an instrument that was intended to measure the reasoning ability of students across the entire spectrum of abilities in college science courses. We present four types of validity evidence for the FORT. (a) The test has a one‐dimensional psychometric structure consistent with its design. (b) Test scores in an introductory biology course had an empirical reliability of 0.82. (c) Student interviews confirmed responses to the FORT were accurate indications of student thinking. (d) A regression analysis of student learning in an introductory biology course showed that scores on the FORT predicted how well students learned one of the most challenging concepts in biology, natural selection.

     
    more » « less