skip to main content


Title: Co-training for Policy Learning
We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.  more » « less
Award ID(s):
1645832
NSF-PAR ID:
10207094
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
Volume:
115
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Adams, RP ; Gogate V (Ed.)
    We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization. 
    more » « less
  2. The design of cyber-physical systems (CPSs) requires methods and tools that can efficiently reason about the interaction between discrete models, e.g., representing the behaviors of ``cyber'' components, and continuous models of physical processes. Boolean methods such as satisfiability (SAT) solving are successful in tackling large combinatorial search problems for the design and verification of hardware and software components. On the other hand, problems in control, communications, signal processing, and machine learning often rely on convex programming as a powerful solution engine. However, despite their strengths, neither approach would work in isolation for CPSs. In this paper, we present a new satisfiability modulo convex programming (SMC) framework that integrates SAT solving and convex optimization to efficiently reason about Boolean and convex constraints at the same time. We exploit the properties of a class of logic formulas over Boolean and nonlinear real predicates, termed monotone satisfiability modulo convex formulas, whose satisfiability can be checked via a finite number of convex programs. Following the lazy satisfiability modulo theory (SMT) paradigm, we develop a new decision procedure for monotone SMC formulas, which coordinates SAT solving and convex programming to provide a satisfying assignment or determine that the formula is unsatisfiable. A key step in our coordination scheme is the efficient generation of succinct infeasibility proofs for inconsistent constraints that can support conflict-driven learning and accelerate the search. We demonstrate our approach on different CPS design problems, including spacecraft docking mission control, robotic motion planning, and secure state estimation. We show that SMC can handle more complex problem instances than state-of-the-art alternative techniques based on SMT solving and mixed integer convex programming. 
    more » « less
  3. Urban Air Mobility, the scenario where hundreds of manned and Unmanned Aircraft Systems (UASs) carry out a wide variety of missions (e.g., moving humans and goods within the city), is gaining acceptance as a transportation solution of the future. One of the key requirements for this to happen is safely managing the air traffic in these urban airspaces. Due to the expected density of the airspace, this requires fast autonomous solutions that can be deployed online. We propose Learning-‘N-Flying (LNF), a multi-UAS Collision Avoidance (CA) framework. It is decentralized, works on the fly, and allows autonomous Unmanned Aircraft System (UAS)s managed by different operators to safely carry out complex missions, represented using Signal Temporal Logic, in a shared airspace. We initially formulate the problem of predictive collision avoidance for two UASs as a mixed-integer linear program, and show that it is intractable to solve online. Instead, we first develop Learning-to-Fly (L2F) by combining (1) learning-based decision-making and (2) decentralized convex optimization-based control. LNF extends L2F to cases where there are more than two UASs on a collision path. Through extensive simulations, we show that our method can run online (computation time in the order of milliseconds) and under certain assumptions has failure rates of less than 1% in the worst case, improving to near 0% in more relaxed operations. We show the applicability of our scheme to a wide variety of settings through multiple case studies. 
    more » « less
  4. null (Ed.)
    With increasing urban population, there is global interest in Urban Air Mobility (UAM), where hundreds of autonomous Unmanned Aircraft Systems (UAS) execute missions in the airspace above cities. Unlike traditional human-inthe-loop air traffic management, UAM requires decentralized autonomous approaches that scale for an order of magnitude higher aircraft densities and are applicable to urban settings. We present Learning-to-Fly (L2F), a decentralized on-demand airborne collision avoidance framework for multiple UAS that allows them to independently plan and safely execute missions with spatial, temporal and reactive objectives expressed using Signal Temporal Logic. We formulate the problem of predictively avoiding collisions between two UAS without violating mission objectives as a Mixed Integer Linear Program (MILP). This however is intractable to solve online. Instead, we develop L2F, a two-stage collision avoidance method that consists of: 1) a learning-based decision-making scheme and 2) a distributed, linear programming-based UAS control algorithm. Through extensive simulations, we show the real-time applicability of our method which is ≈6000× faster than the MILP approach and can resolve 100% of collisions when there is ample room to maneuver, and shows graceful degradation in performance otherwise. We also compare L2F to two other methods and demonstrate an implementation on quad-rotor robots. 
    more » « less
  5. null (Ed.)
    Job shops are an important production environment for low-volume high-variety manufacturing. Its scheduling has recently been formulated as an Integer Linear Programming (ILP) problem to take advantages of popular Mixed-Integer Linear Programming (MILP) methods, e.g., branch-and-cut. When considering a large number of parts, MILP methods may combinatorial difficulties. To address this, a critical but much overlooked issue is formulation tightening. The idea is that if problem constraints can be transformed to directly delineate the problem convex hull in the data preprocessing stage, then a solution can be obtained by using linear programming methods without combinatorial difficulties. The tightening process, however, is fundamentally challenging because of the existence of integer variables. In this paper, an innovative and systematic approach is established for the first time to tighten the formulations of individual parts, each with multiple operations, in the data preprocessing stage. It is a major advancement of our previous work on problems with binary and continuous variables to integer variables. The idea is to first link integer variables to binary variables by innovatively combining constraints so that the integer variables are uniquely determined by the binary variables. With binary and continuous variables only, it is proved that the vertices of the convex hull can be obtained based on vertices of the linear problem after relaxing binary requirements. These vertices are then converted to tightened constraints for general use. This approach significantly improves our previous results on tightening individual operations. Numerical results demonstrate significant benefits on solution quality and computational efficiency. This approach also applies to other ILP problems with similar characteristics and fundamentally changes the way how such problems are formulated and solved. 
    more » « less