NSF PAR Search | NSF Public Access Repository

In recommender systems, preference elicitation (PE) is an effective way to learn about a user's preferences to improve recommendation quality. Expected value of information (EVOI), a Bayesian technique that computes expected gain in user utility, has proven to be effective in selecting useful PE queries.Most EVOI methods use probabilistic models of user preferences and query responses to compute posterior utilities.By contrast, we develop model-free variants of EVOI that rely on function approximation to obviate the need for specific modeling assumptions.Specifically, we learn user response and utility models from existing data (often available in real-world recommender systems), which are used to estimate EVOI rather than relying on explicit probabilistic inference.We augment our approach by using online planning, specifically, Monte Carlo tree search, to further enhance our elicitation policies.We show that our approach offers significant improvement in recommendation quality over standard baselines on several PE tasks.

IMO^3: Interactive Multi-Objective Off-Policy Optimization

https://doi.org/10.24963/ijcai.2022/489

Wang, Nan; Wang, Hongning; Karimzadehgan, Maryam; Kveton, Branislav; Boutilier, Craig (July 2022, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22)

Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.

Full Text Available

Search for: All records