skip to main content


Search for: All records

Creators/Authors contains: "Chen, Yuxin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Machine teaching has traditionally been constrained by the assumption of a fixed learner’s model. In this paper, we challenge this notion by proposing a novel black-box Markov learner model, drawing inspiration from decision psychology and neuroscience where learners are often viewed as black boxes with adaptable parameters. We model the learner’s dynamics as a Markov decision process (MDP) with unknown parameters, encompassing a wide range of learner types studied in machine teaching literature. This approach reduces teaching complexity to finding an optimal policy for the underlying MDP. Building on this, we introduce an algorithm for teaching in this black-box setting and provide an analysis of teaching costs under different scenarios. We further establish a connection between our model and two types of learners in psychology and neuroscience, the epiphany learner and the non-epiphany learner, linking them with discounted and non-discounted black-box Markov learners respectively. This alignment offers a psychologically and neuroscientifically grounded perspective to our work. Supported by numerical study results, this paper delivers a significant contribution to machine teaching, introducing a robust, versatile learner model with a rigorous theoretical foundation. 
    more » « less
    Free, publicly-accessible full text available July 28, 2024
  2. Free, publicly-accessible full text available June 30, 2024
  3. Algorithmic case-based decision support provides examples to help human make sense of predicted labels and aid human in decision-making tasks. Despite the promising performance of supervised learning, representations learned by supervised models may not align well with human intuitions: what models consider as similar examples can be perceived as distinct by humans. As a result, they have limited effectiveness in case-based decision support. In this work, we incorporate ideas from metric learning with supervised learning to examine the importance of alignment for effective decision support. In addition to instance-level labels, we use human-provided triplet judgments to learn human-compatible decision-focused representations. Using both synthetic data and human subject experiments in multiple classification tasks, we demonstrate that such representation is better aligned with human perception than representation solely optimized for classification. Human-compatible representations identify nearest neighbors that are perceived as more similar by humans and allow humans to make more accurate predictions, leading to substantial improvements in human decision accuracies (17.8% in butterfly vs. moth classification and 13.2% in pneumonia classification). 
    more » « less
  4. Abstract Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs. 
    more » « less
  5. Abstract

    The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$γ-discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$Sand the effective horizon$$\frac{1}{1-\gamma }$$11-γ, both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ηcan take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$1η|S|2Ω(11-γ)iterationsto converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.

     
    more » « less
  6. We present Atos, a dynamic scheduling framework for multi-node-GPU systems that supports PGAS-style lightweight one-sided memory operations within and between nodes. Atos's lightweight GPU-to-GPU communication enables latency hiding and can smooth the interconnection usage for bisection-limited problems. These benefits are significant for dynamic, irregular applications that often involve fine-grained communication at unpredictable times and without predetermined patterns. Some principles for high performance: (1) do not involve the CPU in the communication control path; (2) allow GPU communication within kernels, addressing memory consistency directly rather than relying on synchronization with the CPU; (3) perform dynamic communication aggregation when interconnections have limited bandwidth. By lowering the overhead of communication and allowing it within GPU kernels, we support large, high-utilization GPU kernels but with more frequent communication. We evaluate Atos on two irregular problems: Breadth-First-Search and PageRank. Atos outperforms the state-of-the-art graph libraries Gunrock, Groute and Galois on both single-node-multi-GPU and multi-node-GPU settings. 
    more » « less
  7. We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs. 
    more » « less