skip to main content


Title: IMO^3: Interactive Multi-Objective Off-Policy Optimization
Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.  more » « less
Award ID(s):
2128019 2007492 1553568
PAR ID:
10381231
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22
Page Range / eLocation ID:
3523 to 3529
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context. 
    more » « less
  2. In this paper, we study an unmanned-aerial-vehicle (UAV) based full-duplex (FD) multi-user communication network, where a UAV is deployed as a multiple-input–multiple-output (MIMO) FD base station (BS) to serve multiple FD users on the ground. We propose a multi-objective optimization framework which considers two desirable objective functions, namely sum uplink (UL) rate maximization and sum downlink (DL) rate maximization while providing quality-of-service to all the users in the communication network. A novel resource allocation multi-objective-optimization-problem (MOOP) is designed which optimizes the downlink beamformer, the beamwidth angle, and the 3D position of the UAV, and also the UL power of the FD users. The formulated MOOP is a non-convex problem which is generally intractable. To handle the MOOP, a weighted Tchebycheff method is proposed, which converts the problem to the single-objective-optimization-problem (SOOP). Further, an alternative optimization approach is used, where SOOP is converted in to multiple sub-problems and optimization variables are operated alternatively. The numerical results show a trade-off region between sum UL and sum DL rate, and also validate that the considered FD system provides substantial improvement over traditional HD systems. 
    more » « less
  3. The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, which we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. For evaluating each design, FlexiBO selects the objective with higher relative gain by weighting the improvement of the hypervolume of the Pareto region with the measurement cost of each objective. This strategy, therefore, balances the expense of collecting new information with the knowledge gained through objective evaluations, preventing FlexiBO from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8% to 12.4% lower hypervolume error than the best method in state-of-the-art multi-objective optimization. 
    more » « less
  4. null (Ed.)
    Omega-regular properties—specified using linear time temporal logic or various forms of omega-automata—find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Büchi automata with restricted nondeterminism to reduce the search for an optimal policy for an Open image in new window-regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives. 
    more » « less
  5. null (Ed.)
    We consider the problem of multiagent optimization wherein an unknown subset of agents suffer Byzantine faults and thus behave adversarially. We assume that each agent i has a local cost function fi , and the overarching goal of the good agents is to collaboratively minimize a global objective that properly aggregates these local cost functions. To the best of our knowledge, we are among the first to study Byzantine-resilient optimization where no central coordinating agent exists, and we are the first to characterize the structures of the convex coefficients of the achievable global objectives. Dealing with Byzantine faults is very challenging. For example, in contrast to fault-free networks, reaching Byzantine-resilient agreement even in the simplest setting is far from trivial. We take a step toward solving the proposed Byzantine-resilient multiagent optimization problem by focusing on scalar local cost functions. Our results might provide useful insights for the general local cost functions. 
    more » « less