skip to main content


Title: Exploring Policies for Dynamically Teaming Up Students through Log Data Simulation
Constructing effective and well-balanced learning groups is important for collaborative learning. Past research explored how group formation policies affect learners’ behaviors and performance. With the different classroom contexts, many group formation policies work in theory, yet their feasibility is rarely investigated in authentic class sessions. In the current work, we define feasibility as the ratio of students being able to find available partners that satisfy a given group formation policy. Informed by user-centered research in K-12 classrooms, we simulated pairing policies on historical data from an intelligent tutoring system (ITS), a process we refer to as SimPairing. As part of the process for designing a pairing orchestration tool, this study contributes insights into the feasibility of four dynamic pairing policies, and how the feasibility varies depending on parameters in the pairing policies or different classes. We found that on average, dynamically pairing students based on their in-the-moment wheel-spinning status can pair most struggling students, even with moderate constraints of restricted pairings. In addition, we found there is a trade-off between the required knowledge heterogeneity and policy feasibility. Furthermore, the feasibility of pairing policies can vary across different classes, suggesting a need for customization regarding pairing policies.  more » « less
Award ID(s):
1822861
NSF-PAR ID:
10291167
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Hsiao, I.; Sahebi, S.; Bouchet, F.; Vie, J. J.
Date Published:
Journal Name:
Fourteenth International Conference on Educational Data Mining (EDM 2021)
Page Range / eLocation ID:
183-194
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For many forms of e-learning environments, the system's behaviors can be viewed as a sequential decision process wherein, at each discrete step, the system is responsible for deciding the next system action when there are multiple ones available. Each of these system decisions a ects the user's successive actions and performance and some of them are more important than others. Thus, this raises an open ques- tion: how can we identify the critical system interactive de- cisions that are linked to student learning from a long trajec- tory of decisions? In this work, we proposed and evaluated Critical-Reinforcement Learning (Critical-RL), an adversar- ial deep reinforcement learning (ADRL) based framework to identify critical decisions and induce compact yet e ective policies. Speci cally, it induces a pair of adversarial policies based upon Deep Q-Network (DQN) with opposite goals: one is to improve student learning while the other is to hin- der; critical decisions are identi ed by comparing the two adversarial policies and using their corresponding Q-value di erences; nally, a Critical policy is induced by giving op- timal action on critical decisions but random yet reason- able decisions on others. We evaluated the e ectiveness of Critical policy against a random yet reasonable (Random) policy. While no signi cant di erence was found between the two condition, it is probably because of small sample sizes. Much to our surprise, we found that students often experience so-called Critical phase: a consecutive sequence of critical decisions with the same action. Students were further divided into High vs. Low based on the number of Critical phases they experienced and our results showed that while no signi cant was found between the two Low groups, the High Critical group learned signi cantly more than the High Random group. 
    more » « less
  2. An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect. 
    more » « less
  3. null (Ed.)
    An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect. 
    more » « less
  4. An important goal in the design and development of Intelligent Tutoring Systems (ITSs) is to have a system that adaptively reacts to students’ behavior in the short term and effectively improves their learning performance in the long term. Inducing effective pedagogical strategies that accomplish this goal is an essential challenge. To address this challenge, we explore three aspects of a Markov Decision Process (MDP) framework through four experiments. The three aspects are: 1) reward function, detecting the impact of immediate and delayed reward on effectiveness of the policies; 2) state representation, exploring ECR-based, correlation-based, and ensemble feature selection approaches for representing the MDP state space; and 3) policy execution, investigating the effectiveness of stochastic and deterministic policy executions on learning. The most important result of this work is that there exists an aptitude-treatment interaction (ATI) effect in our experiments: the policies have significantly different impacts on the particular types of students as opposed to the entire population. We refer the students who are sensitive to the policies as the Responsive group. All our following results are based on the Responsive group. First, we find that an immediate reward can facilitate a more effective induced policy than a delayed reward. Second, The MDP policies induced based on low correlation-based and ensemble feature selection approaches are more effective than a Random yet reasonable policy. Third, no significant improvement was found using stochastic policy execution due to a ceiling effect. 
    more » « less
  5. Studies have shown that the graduation rate for underrepresented minorities (URM) students enrolled in engineering doctorates is significantly lower than their peers. In response, we created the “Rising Doctoral Institute (RDI)”. This project aims to address issues that URM students encounter when transitioning into a Ph.D. in engineering and their decision to persist in the program. To suggest institutional policies that increase the likelihood of URM students to persist in their doctorate, we identify and analyze some factors in the academic system that reinforce or hinder the retention of URM students in doctoral education. Although the factors that influence persistence in URM students have been largely studied as direct causes of attrition or retention, there is a need for a system perspective that takes into account the complexity and dynamic interaction that exists between those factors. The academic system is a complex system that, by nature, is policy resistant. This means that a positive variation of a factor can incur unintended consequences that could lead to a negative variation in other factors and ultimately hinder the positive outcomes of that policy. In this work-in-progress article, we analyze the dynamics of the factors in the academic system that reinforce or hinder the retention of URM graduate students in engineering. The purpose is to build some of the causal loops that involve those factors, to improve the understanding of how the complex system works, and prevent unintended consequences of institutional policies. We used Causal Loop Diagrams (CLD) to model the feedback loops of the system based on initial hypotheses of causal relationships between the factors. We followed a process that started with establishing hypotheses from a previous literature review, then using a different set of articles we identified the factors related to the hypotheses and the causal links between them. Next, we did axial coding to group the concepts into smaller categories and established the causal relations between categories. With these categories and relations, we created the CLDs for each hypothesis. For the CLDs that have connections missing to close the loop, we went to find additional literature to close them. Finally, we analyzed the implications of each CLD. In this article, we analyze and describe three major CLDs found in literature. The first one was built around the factor of having a positive relationship with the supervisor. The second centered on the student’s experience. The third focused on factors that relate to university initiatives 
    more » « less