Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Estimating the unknown reward functions driving agents' behavior is a central challenge in inverse games and reinforcement learning. This paper introduces a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization. Given observed player strategies and actions, we aim to reconstruct the underlying reward functions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish reward function identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building on this theoretical foundation, we propose an algorithm to learn reward from observed actions, designed to capture all plausible reward parameters by constructing confidence sets. Our algorithm works in both static and dynamic settings and is adaptable to incorporate other methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample-efficiency of our algorithm. Empirical results demonstrate the framework’s effectiveness in accurately recovering reward functions across various scenarios, offering new insights into decision-making in competitive environments.more » « lessFree, publicly-accessible full text available August 15, 2026
- 
            Transformer models have achieved remarkable empirical successes, largely due to their in-context learning capabilities. Inspired by this, we explore training an autoregressive transformer for in-context reinforcement learning (ICRL). In this setting, we initially train a transformer on an offline dataset consisting of trajectories collected from various RL tasks, and then fix and use this transformer to create an action policy for new RL tasks. Notably, we consider the setting where the offline dataset contains trajectories sampled from suboptimal behavioral policies. In this case, standard autoregressive training corresponds to imitation learning and results in suboptimal performance. To address this, we propose the Decision Importance Transformer (DIT) framework, which emulates the actor-critic algorithm in an in-context manner. In particular, we first train a transformer-based value function that estimates the advantage functions of the behavior policies that collected the suboptimal trajectories. Then we train a transformer-based policy via a weighted maximum likelihood estimation loss, where the weights are constructed based on the trained value function to steer the suboptimal policies to the optimal ones. We conduct extensive experiments to test the performance of DIT on both bandit and Markov Decision Process problems. Our results show that DIT achieves superior performance, particularly when the offline dataset contains suboptimal historical data.more » « lessFree, publicly-accessible full text available August 15, 2026
- 
            We introduce IRIS, a geometric and heuristic-based scoring system for evaluating mathematical conjectures and theorems expressed as linear inequalities over numerical invariants. The IRIS score reflects multiple dimensions of significance—including sharpness, diversity, difficulty, and novelty—and enables the principled ranking of conjectures by their structural importance. As a tool for fully automated discovery, IRIS supports the generation and prioritization of high-value conjectures. We demonstrate its utility through case studies in convex geometry and graph theory, showing that IRIS can assist in both rediscovery of known results and proposal of novel, nontrivial conjectures.more » « lessFree, publicly-accessible full text available August 15, 2026
- 
            Augmented Lagrangian (AL) methods have proven remarkably useful in solving optimization problems with complicated constraints. The last decade has seen the development of overall complexity guarantees for inexact AL variants. Yet, a crucial gap persists in addressing nonsmooth convex constraints. To this end, we present a smoothed augmented Lagrangian (AL) framework where nonsmooth terms are progressively smoothed with a smoothing parameter $$\eta_k$$. The resulting AL subproblems are $$\eta_k$$-smooth, allowing for leveraging accelerated schemes. By a careful selection of the inexactness level (for inexact subproblem resolution), the penalty parameter $$\rho_k$$, and smoothing parameter $$\eta_k$$ at epoch k, we derive rate and complexity guarantees of $$\tilde{\mathcal{O}}(1/\epsilon^{3/2})$$ and $$\tilde{\mathcal{O}}(1/\epsilon)$$ in convex and strongly convex regimes for computing an -optimal solution, when $$\rho_k$$ increases at a geometric rate, a significant improvement over the best available guarantees for AL schemes for convex programs with nonsmooth constraints. Analogous guarantees are developed for settings with $$\rho_k=\rho$$ as well as $$\eta_k=\eta$$. Preliminary numerics on a fused Lasso problem display promise.more » « lessFree, publicly-accessible full text available August 1, 2026
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
