skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Co-training for Policy Learning
We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.  more » « less
Award ID(s):
1645832
PAR ID:
10207094
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
Volume:
115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Adams, RP; Gogate V (Ed.)
    We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization. 
    more » « less
  2. Mathematical optimization is fundamental to decision-making across diverse domains, from operations research to healthcare. Yet, translating real-world problems into optimization models remains a difficult task, often demanding specialized expertise. This paper approaches the problem of : the automated creation of solver-ready optimization models from natural language problem descriptions. We identify three core challenges of autoformulation: the vast, problem-dependent hypothesis space, efficient and diverse exploration of this space under uncertainty, and evaluation of formulation correctness against problem description. To address these challenges, we present a novel method leveraging (LLMs) with , exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations. To enhance search efficiency, we introduce symbolic pruning to eliminate trivially equivalent search paths (branches), and employ LLM-based evaluation of partial formulations to guide search. Empirical analysis on linear and mixed-integer programming benchmarks demonstrates our method's effectiveness, with significant performance gains from both LLM-based value estimation and symbolic pruning techniques. 
    more » « less
  3. In this paper, we propose a framework for achieving long-term fair sequential decision making. By conducting both the hard and soft interventions, we propose to take path-specific effects on the time-lagged causal graph as a quantitative tool for measuring long-term fairness. The problem of fair sequential decision making is then formulated as a constrained optimization problem with the utility as the objective and the long-term and short-term fairness as constraints. We show that such an optimization problem can be converted to a performative risk optimization. Finally, repeated risk minimization (RRM) is used for model training, and the convergence of RRM is theoretically analyzed. The empirical evaluation shows the effectiveness of the proposed algorithm on synthetic and semi-synthetic temporal datasets. 
    more » « less
  4. Abstract In various applications of multi-robotics in disaster response, warehouse management, and manufacturing, tasks that are known a priori and tasks added during run time need to be assigned efficiently and without conflicts to robots in the team. This multi-robot task allocation (MRTA) process presents itself as a combinatorial optimization (CO) problem that is usually challenging to be solved in meaningful timescales using typical (mixed)integer (non)linear programming tools. Building on a growing body of work in using graph reinforcement learning to learn search heuristics for such complex CO problems, this paper presents a new graph neural network architecture called the covariant attention mechanism (CAM). CAM can not only generalize but also scale to larger problems than that encountered in training, and handle dynamic tasks. This architecture combines the concept of covariant compositional networks used here to embed the local structures in task graphs, with a context module that encodes the robots’ states. The encoded information is passed onto a decoder designed using multi-head attention mechanism. When applied to a class of MRTA problems with time deadlines, robot ferry range constraints, and multi-trip settings, CAM surpasses a state-of-the-art graph learning approach based on the attention mechanism, as well as a feasible random-walk baseline across various generalizability and scalability tests. Performance of CAM is also found to be at par with a high-performing non-learning baseline called BiG-MRTA, while noting up to a 70-fold improvement in decision-making efficiency over this baseline. 
    more » « less
  5. null (Ed.)
    Job shops are an important production environment for low-volume high-variety manufacturing. Its scheduling has recently been formulated as an Integer Linear Programming (ILP) problem to take advantages of popular Mixed-Integer Linear Programming (MILP) methods, e.g., branch-and-cut. When considering a large number of parts, MILP methods may combinatorial difficulties. To address this, a critical but much overlooked issue is formulation tightening. The idea is that if problem constraints can be transformed to directly delineate the problem convex hull in the data preprocessing stage, then a solution can be obtained by using linear programming methods without combinatorial difficulties. The tightening process, however, is fundamentally challenging because of the existence of integer variables. In this paper, an innovative and systematic approach is established for the first time to tighten the formulations of individual parts, each with multiple operations, in the data preprocessing stage. It is a major advancement of our previous work on problems with binary and continuous variables to integer variables. The idea is to first link integer variables to binary variables by innovatively combining constraints so that the integer variables are uniquely determined by the binary variables. With binary and continuous variables only, it is proved that the vertices of the convex hull can be obtained based on vertices of the linear problem after relaxing binary requirements. These vertices are then converted to tightened constraints for general use. This approach significantly improves our previous results on tightening individual operations. Numerical results demonstrate significant benefits on solution quality and computational efficiency. This approach also applies to other ILP problems with similar characteristics and fundamentally changes the way how such problems are formulated and solved. 
    more » « less