skip to main content


Title: Validity of a Content Agnostic Game Based Stealth Assessment
In an attempt to predict the learning of a player during a content agnostic educational video game session, this study used a dynamic bayesian network in which participants’ game play interactions were continuously recorded. Their actions were captured and used to make real-time inferences of the learning performance using a dynamic bayesian network. The predicted learning was then correlated with the post-test scores to establish the validity of assessment. The assessment was moderately positively correlated with the post-test scores demonstrating support for its validity.  more » « less
Award ID(s):
1828010
NSF-PAR ID:
10344413
Author(s) / Creator(s):
Date Published:
Journal Name:
International Conference on Games and Learning Alliance (GALA 2021)
Volume:
13134
Page Range / eLocation ID:
121–130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this study, we investigated the validity of a stealth assessment of physics understanding in an educational game, as well as the effectiveness of different game‐level delivery methods and various in‐game supports on learning. Using a game calledPhysics Playground, we randomly assigned 263 ninth‐ to eleventh‐grade students into four groups: adaptive, linear, free choice and no‐treatment control. Each condition had access to the same in‐game learning supports during gameplay. Results showed that: (a) the stealth assessment estimates of physics understanding were valid—significantly correlating with the external physics test scores; (b) there was no significant effect of game‐level delivery method on students' learning; and (c) physics animations were the most effective (among eight supports tested) in predicting both learning outcome and in‐game performance (e.g. number of game levels solved). We included student enjoyment, gender and ethnicity in our analyses as moderators to further investigate the research questions.

     
    more » « less
  2. Literacy assessment is essential for effective literacy instruction and training. However, traditional paper-based literacy assessments are typically decontextualized and may cause stress and anxiety for test takers. In contrast, serious games and game environments allow for the assessment of literacy in more authentic and engaging ways, which has some potential to increase the assessment’s validity and reliability. The primary objective of this study is to examine the feasibility of a novel approach for stealthily assessing literacy skills using games in an intelligent tutoring system (ITS) designed for reading comprehension strategy training. We investigated the degree to which learners’ game performance and enjoyment predicted their scores on standardized reading tests. Amazon Mechanical Turk participants (n = 211) played three games in iSTART and self-reported their level of game enjoyment after each game. Participants also completed the Gates–MacGinitie Reading Test (GMRT), which includes vocabulary knowledge and reading comprehension measures. The results indicated that participants’ performance in each game as well as the combined performance across all three games predicted their literacy skills. However, the relations between game enjoyment and literacy skills varied across games. These findings suggest the potential of leveraging serious games to assess students’ literacy skills and improve the adaptivity of game-based learning environments. 
    more » « less
  3. Bayesian Knowledge Tracing (BKT) is a commonly used approach for student modeling, and Long Short Term Memory (LSTM) is a versatile model that can be applied to a wide range of tasks, such as language translation. In this work, we directly compared three models: BKT, its variant Intervention-BKT (IBKT), and LSTM, on two types of student modeling tasks: post-test scores prediction and learning gains prediction. Additionally, while previous work on student learning has often used skill/knowledge components identified by domain experts, we incorporated an automatic skill discovery method (SK), which includes a nonparametric prior over the exercise-skill assignments, to all three models. Thus, we explored a total of six models: BKT, BKT+SK, IBKT, IBKT+SK, LSTM, and LSTM+SK. Two training datasets were employed, one was collected from a natural language physics intelligent tutoring system named Cordillera, and the other was from a standard probability intelligent tutoring system named Pyrenees. Overall, our results showed that BKT and BKT+SK outperformed the others on predicting post-test scores, whereas LSTM and LSTM+SK achieved the highest accuracy, F1-measure, and area under the ROC curve (AUC) on predicting learning gains. Furthermore, we demonstrated that by combining SK with the BKT model, BKT+SK could reliably predict post-test scores using only the earliest 50% of the entire training sequences. For learning gain early prediction, using the earliest 70% of the entire sequences, LSTM can deliver a comparable prediction as using the entire training sequences. The findings yield a learning environment that can foretell students’ performance and learning gains early, and can render adaptive pedagogical strategy accordingly. 
    more » « less
  4. Abstract  
    more » « less
  5. This evidence-based practices paper discusses the method employed in validating the use of a project modified version of the PROCESS tool (Grigg, Van Dyken, Benson, & Morkos, 2013) for measuring student problem solving skills. The PROCESS tool allows raters to score students’ ability in the domains of Problem definition, Representing the problem, Organizing information, Calculations, Evaluating the solution, Solution communication, and Self-assessment. Specifically, this research compares student performance on solving traditional textbook problems with novel, student-generated learning activities (i.e. reverse engineering videos in order to then create their own homework problem and solution). The use of student-generated learning activities to assess student problem solving skills has theoretical underpinning in Felder’s (1987) work of “creating creative engineers,” as well as the need to develop students’ abilities to transfer learning and solve problems in a variety of real world settings. In this study, four raters used the PROCESS tool to score the performance of 70 students randomly selected from two undergraduate chemical engineering cohorts at two Midwest universities. Students from both cohorts solved 12 traditional textbook style problems and students from the second cohort solved an additional nine student-generated video problems. Any large scale assessment where multiple raters use a rating tool requires the investigation of several aspects of validity. The many-facets Rasch measurement model (MFRM; Linacre, 1989) has the psychometric properties to determine if there are any characteristics other than “student problem solving skills” that influence the scores assigned, such as rater bias, problem difficulty, or student demographics. Before implementing the full rating plan, MFRM was used to examine how raters interacted with the six items on the modified PROCESS tool to score a random selection of 20 students’ performance in solving one problem. An external evaluator led “inter-rater reliability” meetings where raters deliberated rationale for their ratings and differences were resolved by recourse to Pretz, et al.’s (2003) problem-solving cycle that informed the development of the PROCESS tool. To test the new understandings of the PROCESS tool, raters were assigned to score one new problem from a different randomly selected group of six students. Those results were then analyzed in the same manner as before. This iterative process resulted in substantial increases in reliability, which can be attributed to increased confidence that raters were operating with common definitions of the items on the PROCESS tool and rating with consistent and comparable severity. This presentation will include examples of the student-generated problems and a discussion of common discrepancies and solutions to the raters’ initial use of the PROCESS tool. Findings as well as the adapted PROCESS tool used in this study can be useful to engineering educators and engineering education researchers. 
    more » « less