In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP. 
                        more » 
                        « less   
                    
                            
                            IMO^3: Interactive Multi-Objective Off-Policy Optimization
                        
                    
    
            Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10381231
- Date Published:
- Journal Name:
- Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22
- Page Range / eLocation ID:
- 3523 to 3529
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Analog circuit optimization and design presents a unique set of challenges in the IC design process. Many applications require for the designer to optimize for multiple competing objectives which poses a crucial challenge. Motivated by these practical aspects, we propose a novel method to tackle multi-objective optimization for analog circuit design in continuous action spaces. In particular, we propose to: (i) Extrapolate current techniques in Multi-Objective Reinforcement Learning (MORL) to continuous state and action spaces. (ii) Provide for a dynamically tunable trained model to query user defined preferences in multi-objective optimization in the analog circuit design context.more » « less
- 
            In this paper, we study an unmanned-aerial-vehicle (UAV) based full-duplex (FD) multi-user communication network, where a UAV is deployed as a multiple-input–multiple-output (MIMO) FD base station (BS) to serve multiple FD users on the ground. We propose a multi-objective optimization framework which considers two desirable objective functions, namely sum uplink (UL) rate maximization and sum downlink (DL) rate maximization while providing quality-of-service to all the users in the communication network. A novel resource allocation multi-objective-optimization-problem (MOOP) is designed which optimizes the downlink beamformer, the beamwidth angle, and the 3D position of the UAV, and also the UL power of the FD users. The formulated MOOP is a non-convex problem which is generally intractable. To handle the MOOP, a weighted Tchebycheff method is proposed, which converts the problem to the single-objective-optimization-problem (SOOP). Further, an alternative optimization approach is used, where SOOP is converted in to multiple sub-problems and optimization variables are operated alternatively. The numerical results show a trade-off region between sum UL and sum DL rate, and also validate that the considered FD system provides substantial improvement over traditional HD systems.more » « less
- 
            The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, which we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. For evaluating each design, FlexiBO selects the objective with higher relative gain by weighting the improvement of the hypervolume of the Pareto region with the measurement cost of each objective. This strategy, therefore, balances the expense of collecting new information with the knowledge gained through objective evaluations, preventing FlexiBO from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8% to 12.4% lower hypervolume error than the best method in state-of-the-art multi-objective optimization.more » « less
- 
            null (Ed.)We consider the problem of multiagent optimization wherein an unknown subset of agents suffer Byzantine faults and thus behave adversarially. We assume that each agent i has a local cost function fi , and the overarching goal of the good agents is to collaboratively minimize a global objective that properly aggregates these local cost functions. To the best of our knowledge, we are among the first to study Byzantine-resilient optimization where no central coordinating agent exists, and we are the first to characterize the structures of the convex coefficients of the achievable global objectives. Dealing with Byzantine faults is very challenging. For example, in contrast to fault-free networks, reaching Byzantine-resilient agreement even in the simplest setting is far from trivial. We take a step toward solving the proposed Byzantine-resilient multiagent optimization problem by focusing on scalar local cost functions. Our results might provide useful insights for the general local cost functions.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    