skip to main content

Search for: All records

Creators/Authors contains: "Zhou, G."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. A reversible diffusion process is initialized at position x0 and run until it first hits any of several targets. What is the probability that it terminates at a particular target? We propose a computationally efficient approach for estimating this probability, focused on those situations in which it takes a long time to hit any target. In these cases, direct simulation of the hitting probabilities becomes prohibitively expensive. On the other hand, if the timescales are sufficiently long, then the system will essentially “forget” its initial condition before it encounters a target. In these cases the hitting probabilities can be accurately approximated using only local simulations around each target, obviating the need for direct simulations. In empirical tests, we find that these local estimates can be computed in the same time it would take to compute a single direct simulation, but that they achieve an accuracy that would require thousands of direct simulation runs.
  2. Abstract: Identifying critical decisions is one of the most challenging decision-making problems in real-world applications. In this work, we propose a novel Reinforcement Learning (RL) based Long-Short Term Rewards (LSTR) framework for critical decisions identification. RL is a machine learning area concerned with inducing effective decision-making policies, following which result in the maximum cumulative "reward." Many RL algorithms find the optimal policy via estimating the optimal Q-values, which specify the maximum cumulative reward the agent can receive. In our LSTR framework, the "long term" rewards are defined as "Q-values" and the "short term" rewards are determined by the "reward function." Experiments on a synthetic GridWorld game and real-world Intelligent Tutoring System datasets show that the proposed LSTR framework indeed identifies the critical decisions in the sequences. Furthermore, our results show that carrying out the critical decisions alone is as effective as a fully-executed policy.
  3. Abstract: Motivated by the recent advances of reinforcement learning and the traditional grounded Self Determination Theory (SDT), we explored the impact of hierarchical reinforcement learning (HRL) induced pedagogical policies and data-driven explanations of the HRL-induced policies on student experience in an Intelligent Tutoring System (ITS). We explored their impacts first independently and then jointly. Overall our results showed that 1) the HRL induced policies could significantly improve students' learning performance, and 2) explaining the tutor's decisions to students through data-driven explanations could improve the student-system interaction in terms of students' engagement and autonomy.