skip to main content


Title: Competition, Alignment, and Equilibria in Digital Marketplaces

Competition between traditional platforms is known to improve user utility by aligning the platform's actions with user preferences. But to what extent is alignment exhibited in data-driven marketplaces? To study this question from a theoretical perspective, we introduce a duopoly market where platform actions are bandit algorithms and the two platforms compete for user participation. A salient feature of this market is that the quality of recommendations depends on both the bandit algorithm and the amount of data provided by interactions from users. This interdependency between the algorithm performance and the actions of users complicates the structure of market equilibria and their quality in terms of user utility. Our main finding is that competition in this market does not perfectly align market outcomes with user utility. Interestingly, market outcomes exhibit misalignment not only when the platforms have separate data repositories, but also when the platforms have a shared data repository. Nonetheless, the data sharing assumptions impact what mechanism drives misalignment and also affect the specific form of misalignment (e.g. the quality of the best-case and worst-case market outcomes). More broadly, our work illustrates that competition in digital marketplaces has subtle consequences for user utility that merit further investigation.

 
more » « less
Award ID(s):
2145898
PAR ID:
10482191
Author(s) / Creator(s):
; ;
Publisher / Repository:
AAAI
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
37
Issue:
5
ISSN:
2159-5399
Page Range / eLocation ID:
5689 to 5696
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A core tension in the operations of online marketplaces is between segmentation (wherein platforms can increase revenue by segmenting the market into ever smaller sub-markets) and thickness (wherein the size of the sub-market affects the utility experienced by an agent). An important example of this is in dynamic online marketplaces, where buyers and sellers, in addition to preferences for different matches, also have finite patience (or deadlines) for being matched. We formalize this trade-off via a novel optimization problem that we term as 'Two-sided Facility Location': we consider a market wherein agents arrive at nodes embedded in an underlying metric space, where the distance between a buyer and seller captures the quality of the corresponding match. The platform posts prices and wages at the nodes, and opens a set of virtual clearinghouses where agents are routed for matching. To ensure high match-quality, the platform imposes a distance constraint between an agent and its clearinghouse; to ensure thickness, the platform requires the flow to any clearinghouse be at least a pre-specified lower bound. Subject to these constraints, the goal of the platform is to maximize the social surplus subject to weak budget balance, i.e., profit being non-negative. Our work characterizes the complexity of this problem by providing both hardness results as well as algorithms for this setting; in particular, we present an algorithm that for any constant ε > 0 yields a (1 + ε ) approximation for the gains from trade, while relaxing the match quality (i.e., maximum distance of any match) by a constant factor. 
    more » « less
  2. A common explanation for negative user impacts of content recommender systems is misalignment between the platform’s objective and user welfare. In this work, we show that misalignment in the platform’s objective is not the only potential cause of unintended impacts on users: even when the platform’s objective is fully aligned with user welfare, the platform’s learning algorithm can induce negative downstream impacts on users. The source of these user impacts is that different pieces of content may generate observable user reactions (feedback information) at different rates; these feedback rates may correlate with content properties, such as controversiality or demographic similarity of the creator, that affect the user experience. Since differences in feedback rates can impact how often the learning algorithm engages with different content, the learning algorithm may inadvertently promote content with certain such properties. Using the multi-armed bandit framework with probabilistic feedback, we examine the relationship between feedback rates and a learning algorithm’s engagement with individual arms for different no-regret algorithms. We prove that no-regret algorithms can exhibit a wide range of dependencies: if the feedback rate of an arm increases, some no-regret algorithms engage with the arm more, some no-regret algorithms engage with the arm less, and other no-regret algorithms engage with the arm approximately the same number of times. From a platform design perspective, our results highlight the importance of looking beyond regret when measuring an algorithm’s performance, and assessing the nature of a learning algorithm’s engagement with different types of content as well as their resulting downstream impacts. 
    more » « less
  3. Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get shown to users. This raises questions of fairness to the items — and to the sellers, artists, and writers that benefit from this exposure. We argue that the conventional bandit formulation can lead to an undesirable and unfair winner-takes-all allocation of exposure. To remedy this problem, we propose a new bandit objective that guarantees merit-based fairness of exposure to the items while optimizing utility to the users. We formulate fairness regret and reward regret in this setting and present algorithms for both stochastic multi-armed bandits and stochastic linear bandits. We prove that the algorithms achieve sublinear fairness regret and reward regret. Beyond the theoretical analysis, we also provide empirical evidence that these algorithms can allocate exposure to different arms effectively. 
    more » « less
  4. Personalized recommendation based on multi-arm bandit (MAB) algorithms has shown to lead to high utility and efficiency as it can dynamically adapt the recommendation strategy based on feedback. However, unfairness could incur in personalized recommendation. In this paper, we study how to achieve user-side fairness in personalized recommendation. We formulate our fair personalized recommendation as a modified contextual bandit and focus on achieving fairness on the individual whom is being recommended an item as opposed to achieving fairness on the items that are being recommended. We introduce and define a metric that captures the fairness in terms of rewards received for both the privileged and protected groups. We develop a fair contextual bandit algorithm, Fair-LinUCB, that improves upon the traditional LinUCB algorithm to achieve group-level fairness of users. Our algorithm detects and monitors unfairness while it learns to recommend personalized videos to students to achieve high efficiency. We provide a theoretical regret analysis and show that our algorithm has a slightly higher regret bound than LinUCB. We conduct numerous experimental evaluations to compare the performances of our fair contextual bandit to that of LinUCB and show that our approach achieves group-level fairness while maintaining a high utility. 
    more » « less
  5. The multiple-user terminals in a satellite transponder’s communication channel compete for limited radio resources to meet their own data rate needs. Because inter-user interference limits on the satellite transponder’s performance, the transponder’s power-control system needs to coordinate all its users to reduce interference and maximizes overall performance of this channel. This paper studies Stackelberg competition among the asymmetrical users in a transponder’s channel, where some users called leader have priority to choose their power control strategy, but other users called followers have to optimize their power control strategy with given leader’s controls. A Stackelberg Differential Game (SDG) is set up to model the Stackelberg competition in a transponder’s communication channel. Each user’s utility function is a trade-off between transmission data rate and power consumption. The dynamics of the system is the changing of channel gain. The optimality condition of Stackelberg equilibrium of leaders and followers is a set of Differential Algebraic Equations (DAE) with an imbedded control strategies from its counterpart. In order to solve for Stackelberg equilibrium, an algorithm based on optimizing leaders’ and followers’ Hamiltonians iteratively is developed. The numerical solution of the SDG model provides the transponder’s power control system with each user’s power-control strategy at the Stackelberg equilibrium. 
    more » « less