skip to main content


Search for: All records

Creators/Authors contains: "Gupta, Piyush"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Akaishi, Rei (Ed.)
    Cognitive rehabilitation, STEM (science, technology, engineering, and math) skill acquisition, and coaching games such as chess often require tutoring decision-making strategies. The advancement of AI-driven tutoring systems for facilitating human learning requires an understanding of the impact of evaluative feedback on human decision-making and skill development. To this end, we conduct human experiments using Amazon Mechanical Turk to study the influence of evaluative feedback on human decision-making in sequential tasks. In these experiments, participants solve the Tower of Hanoi puzzle and receive AI-generated feedback while solving it. We examine how this feedback affects their learning and skill transfer to related tasks. Additionally, treating humans as noisy optimal agents, we employ maximum entropy inverse reinforcement learning to analyze the effect of feedback on the implicit human reward structure that guides their decision making. Lastly, we explore various computational models to understand how people incorporate evaluative feedback into their decision-making processes. Our findings underscore that humans perceive evaluative feedback as indicative of their long-term strategic success, thus aiding in skill acquisition and transfer in sequential decision-making tasks. Moreover, we demonstrate that evaluative feedback fosters a more structured and organized learning experience compared to learning without feedback. Furthermore, our results indicate that providing intermediate goals alone does not significantly enhance human learning outcomes.

     
    more » « less
    Free, publicly-accessible full text available May 28, 2025
  2. We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time. 
    more » « less
  3. Abstract

    Mammary morphogenesis is an orchestrated process involving differentiation, proliferation and organization of cells to form a bi-layered epithelial network of ducts and lobules embedded in stromal tissue. We have engineered a 3D biomimetic human breast that makes it possible to study how stem cell fate decisions translate to tissue-level structure and function. Using this advancement, we describe the mechanism by which breast epithelial cells build a complex three-dimensional, multi-lineage tissue by signaling through a collagen receptor. Discoidin domain receptor tyrosine kinase 1 induces stem cells to differentiate into basal cells, which in turn stimulate luminal progenitor cells via Notch signaling to differentiate and form lobules. These findings demonstrate how human breast tissue regeneration is triggered by transmission of signals from the extracellular matrix through an epithelial bilayer to coordinate structural changes that lead to formation of a complex ductal-lobular network.

     
    more » « less
  4. We consider a team of heterogeneous agents that is collectively responsible for servicing and subsequently reviewing a stream of homogeneous tasks. Each agent (autonomous system or human operator) has an associated mean service time and mean review time for servicing and reviewing the tasks, respectively, which are based on their expertise and skill-sets. The team objective is to collaboratively maximize the number of "serviced and reviewed" tasks. To this end, we formulate a Common-Pool Resource (CPR) game and design utility functions to incentivize collaboration among team-members. We show the existence and uniqueness of the Pure Nash Equilibrium (PNE) for the CPR game. Additionally, we characterize the structure of the PNE and study the effect of heterogeneity among the agents at the PNE. We show that the formulated CPR game is a best response potential game for which both sequential best response dynamics and simultaneous best reply dynamics converge to the Nash equilibrium. Finally, we numerically illustrate the price of anarchy for the PNE. 
    more » « less
  5. As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. 
    more » « less
  6. We study optimal fidelity selection for a human operator servicing a queue of homogeneous tasks. The service time distribution of the human operator depends on her cognitive dynamics and the level of fidelity selected for servicing the task. Cognitive dynamics of the operator evolve as a Markov chain in which the cognitive state increases (decreases) with high probability whenever she is busy (resting). The tasks arrive according to a Poisson process and each task waiting in the queue loses its value at a fixed rate. We address the trade-off between high quality service of a task and consequent loss in value of future tasks using a Semi-Markov Decision Process (SMDP) framework. We numerically determine an optimal policy and establish its structural properties. 
    more » « less