In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.
more »
« less
IMO^3: Interactive Multi-Objective Off-Policy Optimization
Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.
more »
« less
- PAR ID:
- 10381231
- Date Published:
- Journal Name:
- Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22
- Page Range / eLocation ID:
- 3523 to 3529
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.more » « less
-
In this paper, we study an unmanned-aerial-vehicle (UAV) based full-duplex (FD) multi-user communication network, where a UAV is deployed as a multiple-input–multiple-output (MIMO) FD base station (BS) to serve multiple FD users on the ground. We propose a multi-objective optimization framework which considers two desirable objective functions, namely sum uplink (UL) rate maximization and sum downlink (DL) rate maximization while providing quality-of-service to all the users in the communication network. A novel resource allocation multi-objective-optimization-problem (MOOP) is designed which optimizes the downlink beamformer, the beamwidth angle, and the 3D position of the UAV, and also the UL power of the FD users. The formulated MOOP is a non-convex problem which is generally intractable. To handle the MOOP, a weighted Tchebycheff method is proposed, which converts the problem to the single-objective-optimization-problem (SOOP). Further, an alternative optimization approach is used, where SOOP is converted in to multiple sub-problems and optimization variables are operated alternatively. The numerical results show a trade-off region between sum UL and sum DL rate, and also validate that the considered FD system provides substantial improvement over traditional HD systems.more » « less
-
The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, which we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. For evaluating each design, FlexiBO selects the objective with higher relative gain by weighting the improvement of the hypervolume of the Pareto region with the measurement cost of each objective. This strategy, therefore, balances the expense of collecting new information with the knowledge gained through objective evaluations, preventing FlexiBO from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8% to 12.4% lower hypervolume error than the best method in state-of-the-art multi-objective optimization.more » « less
-
The multi-objective optimization is to optimize several objective functions over a common feasible set. Because the objectives usually do not share a common optimizer, people often consider (weakly) Pareto points. This paper studies multi-objective optimization problems that are given by polynomial functions. First, we study the geometry for (weakly) Pareto values and represent Pareto front as the boundary of a convex set. Linear scalarization problems (LSPs) and Chebyshev scalarization problems (CSPs) are typical approaches for getting (weakly) Pareto points. For LSPs, we show how to use tight relaxations to solve them and how to detect existence or nonexistence of proper weights. For CSPs, we show how to solve them by moment relaxations. Moreover, we show how to check whether a given point is a (weakly) Pareto point or not and how to detect existence or nonexistence of (weakly) Pareto points. We also study how to detect unboundedness of polynomial optimization, which is used to detect nonexistence of proper weights or (weakly) Pareto points. Funding: J. Nie is partially supported by the National Science Foundation [Grant DMS-2110780].more » « less
An official website of the United States government

