skip to main content


Title: A Ranked Bandit Approach for Multi-stakeholder Recommender Systems
Recommender systems traditionally find the most relevant products or services for users tailored to their needs or interests but they ignore the interests of the other sides of the market (aka stakeholders). In this paper, we propose to use a Ranked Bandit approach for an online multi-stakeholder recommender system that sequentially selects top š‘˜ items according to the relevance and priority of all the involved stakeholders. We presented three different criteria to consider the priority of each stakeholder when evaluating our approach. Our extensive experimental results on a movie dataset showed that the contextual multi-armed bandits with a relevance function make a higher level of satisfaction for all involved stakeholders in the long term. Keywords: Multi-stakeholder Recommender Systems; Multi-armed Bandits; Ranked Bandit;  more » « less
Award ID(s):
1739413
NSF-PAR ID:
10392161
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Workshop of Multi-Objective Recommender Systems (MORSā€™22)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traditional recommender systems help users find the most relevant products or services to match their needs and preferences. However, they overlook the preferences of other sides of the market (aka stakeholders) involved in the system. In this paper, we propose to use contextual bandit algorithms in multi-stakeholder platforms where a multi-sided relevance function with adjusting weights is modeled to consider the preferences of all involved stakeholders. This algorithm sequentially recommends the items based on the contextual features of users along with the priority of the stakeholders and their relevance to the items.Our extensive experimental results on a dataset consisting of MovieLens (1m), IMDB (81k+), and a synthetic dataset show that our proposed approach outperforms the baseline methods and provides a good trade-off between the satisfaction of different stakeholders over time. 
    more » « less
  2. null (Ed.)
    Federated multi-armed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning. It is inspired by practical applications in cognitive radio and recommender systems, and enjoys features that are analogous to FL. This paper proposes a general framework of FMAB and then studies two specific federated bandit models. We first study the approximate model where the heterogeneous local models are random realizations of the global model from an unknown distribution. This model introduces a new uncertainty of client sampling, as the global model may not be reliably learned even if the finite local models are perfectly known. Furthermore, this uncertainty cannot be quantified a priori without knowledge of the suboptimality gap. We solve the approximate model by proposing Federated Double UCB (Fed2-UCB), which constructs a novel ā€œdouble UCBā€ principle accounting for uncertainties from both arm and client sampling. We show that gradually admitting new clients is critical in achieving an O(log(T)) regret while explicitly considering the communication loss. The exact model, where the global bandit model is the exact average of heterogeneous local models, is then studied as a special case. We show that, somewhat surprisingly, the order-optimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity. Experiments using both synthetic and real-world datasets corroborate the theoretical analysis and demonstrate the effectiveness and efficiency of the proposed algorithms. 
    more » « less
  3. null (Ed.)
    Speakers communicate to influence their partner's beliefs and shape their actions. Belief- and action-based objectives have been explored independently in recent computational models, but it has been challenging to explicitly compare or integrate them. Indeed, we find that they are conflated in standard referential communication tasks. To distinguish these accounts, we introduce a new paradigm called signaling bandits, generalizing classic Lewis signaling games to a multi-armed bandit setting where all targets in the context have some relative value. We develop three speaker models: a belief-oriented speaker with a purely informative objective; an action-oriented speaker with an instrumental objective; and a combined speaker which integrates the two by inducing listener beliefs that generally lead to desirable actions. We then present a series of simulations demonstrating that grounding production choices in future listener actions results in relevance effects and flexible uses of nonliteral language. More broadly, our findings suggest that language games based on richer decision problems are a promising avenue for insight into rational communication. 
    more » « less
  4. Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get shown to users. This raises questions of fairness to the items ā€” and to the sellers, artists, and writers that benefit from this exposure. We argue that the conventional bandit formulation can lead to an undesirable and unfair winner-takes-all allocation of exposure. To remedy this problem, we propose a new bandit objective that guarantees merit-based fairness of exposure to the items while optimizing utility to the users. We formulate fairness regret and reward regret in this setting and present algorithms for both stochastic multi-armed bandits and stochastic linear bandits. We prove that the algorithms achieve sublinear fairness regret and reward regret. Beyond the theoretical analysis, we also provide empirical evidence that these algorithms can allocate exposure to different arms effectively. 
    more » « less
  5. Experience management (EM) agents in multiplayer serious games face unique challenges and responsibilities regarding the fair treatment of players. One such challenge is the Greedy Bandit Problem that arises when using traditional Multi-Armed Bandits (MABs) as EM agents, which results in some players routinely prioritized while others may be ignored. We will show that this problem can be a cause of player non-adherence in a multiplayer serious game played by human users. To mitigate this effect, we propose a new bandit strategy, the Shapley Bandit, which enforces fairness constraints in its treatment of players based on the Shapley Value. We evaluate our approach via simulation with virtual players, finding that the Shapley Bandit can be effective in providing more uniform treatment of players while incurring only a slight cost in overall performance to a typical greedy approach. Our findings highlight the importance of fair treatment among players as a goal of multiplayer EM agents and discuss how addressing this issue may lead to more effective agent operation overall. The study contributes to the understanding of player modeling and EM in serious games and provides a promising approach for balancing fairness and engagement in multiplayer environments.

     
    more » « less