skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Co-training for Policy Learning
We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.  more » « less
Award ID(s):
1645832
PAR ID:
10207094
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
Volume:
115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Adams, RP; Gogate V (Ed.)
    We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization. 
    more » « less
  2. Mathematical optimization is fundamental to decision-making across diverse domains, from operations research to healthcare. Yet, translating real-world problems into optimization models remains a difficult task, often demanding specialized expertise. This paper approaches the problem of : the automated creation of solver-ready optimization models from natural language problem descriptions. We identify three core challenges of autoformulation: the vast, problem-dependent hypothesis space, efficient and diverse exploration of this space under uncertainty, and evaluation of formulation correctness against problem description. To address these challenges, we present a novel method leveraging (LLMs) with , exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations. To enhance search efficiency, we introduce symbolic pruning to eliminate trivially equivalent search paths (branches), and employ LLM-based evaluation of partial formulations to guide search. Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method's effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques. 
    more » « less
  3. Abstract Distributionally Favorable Optimization (DFO) is a framework for decision-making under uncertainty, with applications spanning various fields, including reinforcement learning, online learning, robust statistics, chance-constrained programming, and two-stage stochastic optimization without complete recourse. In contrast to the traditional Distributionally Robust Optimization (DRO) paradigm, DFO presents a unique challenge– the application of the inner infimum operator often fails to retain the convexity. In light of this challenge, we study the tractability and complexity of DFO. We establish sufficient and necessary conditions for determining when DFO problems are tractable (i.e., solvable in polynomial time) or intractable (i.e., not solvable in polynomial time). Despite the typical nonconvex nature of DFO problems, our results show that they are mixed-integer convex programming representable (MICP-R), thereby enabling solutions via standard optimization solvers. Finally, we numerically validate the efficacy of our MICP-R formulations. 
    more » « less
  4. In this paper, we propose a framework for achieving long-term fair sequential decision making. By conducting both the hard and soft interventions, we propose to take path-specific effects on the time-lagged causal graph as a quantitative tool for measuring long-term fairness. The problem of fair sequential decision making is then formulated as a constrained optimization problem with the utility as the objective and the long-term and short-term fairness as constraints. We show that such an optimization problem can be converted to a performative risk optimization. Finally, repeated risk minimization (RRM) is used for model training, and the convergence of RRM is theoretically analyzed. The empirical evaluation shows the effectiveness of the proposed algorithm on synthetic and semi-synthetic temporal datasets. 
    more » « less
  5. Abstract In various applications of multi-robotics in disaster response, warehouse management, and manufacturing, tasks that are known a priori and tasks added during run time need to be assigned efficiently and without conflicts to robots in the team. This multi-robot task allocation (MRTA) process presents itself as a combinatorial optimization (CO) problem that is usually challenging to be solved in meaningful timescales using typical (mixed)integer (non)linear programming tools. Building on a growing body of work in using graph reinforcement learning to learn search heuristics for such complex CO problems, this paper presents a new graph neural network architecture called the covariant attention mechanism (CAM). CAM can not only generalize but also scale to larger problems than that encountered in training, and handle dynamic tasks. This architecture combines the concept of covariant compositional networks used here to embed the local structures in task graphs, with a context module that encodes the robots’ states. The encoded information is passed onto a decoder designed using multi-head attention mechanism. When applied to a class of MRTA problems with time deadlines, robot ferry range constraints, and multi-trip settings, CAM surpasses a state-of-the-art graph learning approach based on the attention mechanism, as well as a feasible random-walk baseline across various generalizability and scalability tests. Performance of CAM is also found to be at par with a high-performing non-learning baseline called BiG-MRTA, while noting up to a 70-fold improvement in decision-making efficiency over this baseline. 
    more » « less