Problem definition: Inspired by new developments in dynamic spectrum access, we study the dynamic pricing of wireless Internet access when demand and capacity (bandwidth) are stochastic. Academic/practical relevance: The demand for wireless Internet access has increased enormously. However, the spectrum available to wireless service providers is limited. The industry has, thus, altered conventional license-based spectrum access policies through unlicensed spectrum operations. The additional spectrum obtained through these operations has stochastic capacity. Thus, the pricing of this service by the service provider has novel challenges. The problem considered in this paper is, therefore, of high practical relevance and new to the academic literature. Methodology: We study this pricing problem using a Markov decision process model in which customers are posted dynamic prices based on their bandwidth requirement and the available capacity. Results: We characterize the structure of the optimal pricing policy as a function of the system state and of the input parameters. Because it is impossible to solve this problem for practically large state spaces, we propose a heuristic dynamic pricing policy that performs very well, particularly when the ratio of capacity to demand rate is low. Managerial implications: We demonstrate the value of using a dynamic heuristic pricing policy compared with the myopic and optimal static policies. The previous literature has studied similar systems with fixed capacity and has characterized conditions under which myopic policies perform well. In contrast, our setting has dynamic (stochastic) capacity, and we find that identifying good state-dependent heuristic pricing policies is of greater importance. Our heuristic policy is computationally more tractable and easier to implement than the optimal dynamic and static pricing policies. It also provides a significant performance improvement relative to the myopic and optimal static policies when capacity is scarce, a condition that holds for the practical setting that motivated this research.
more »
« less
On the Optimality of Greedy Policies in Dynamic Matching
Hindsight Optimality in Two-Way Matching Networks In “On the Optimality of Greedy Policies in Dynamic Matching”, Kerimov, Ashlagi, and Gurvich study centralized dynamic matching markets with finitely many agent types and heterogeneous match values. A matching policy is hindsight optimal if the policy can (nearly) maximize the total value simultaneously at all times. The article establishes that suitably designed greedy policies are hindsight optimal in two-way matching networks. This implies that there is essentially no positive externality from having agents waiting to form future matches. Proposed policies include the greedy longest-queue policy, with a minor variation, as well as a greedy static priority policy. The matching networks considered in this work satisfy a general position condition. General position is a weak (but necessary) condition that holds when the static-planning problem (a linear program that optimizes the first-order matching rates) has a unique and nondegenerate optimal solution.
more »
« less
- Award ID(s):
- 2137286
- PAR ID:
- 10538046
- Publisher / Repository:
- INFORMS
- Date Published:
- Journal Name:
- Operations Research
- ISSN:
- 0030-364X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study how to optimally match agents in a dynamic matching market with heterogeneous match cardinalities and values. A network topology determines the feasible matches in the market. In general, a fundamental tradeoff exists between short-term value—which calls for performing matches frequently—and long-term value—which calls, sometimes, for delaying match decisions in order to perform better matches. We find that in networks that satisfy a general position condition, the tension between short- and long-term value is limited, and a simple periodic clearing policy (nearly) maximizes the total match value simultaneously at all times. Central to our results is the general position gap ϵ; a proxy for capacity slack in the market. With the exception of trivial cases, no policy can achieve an all-time regret that is smaller, in terms of order, than [Formula: see text]. We achieve this lower bound with a policy, which periodically resolves a natural matching integer linear program, provided that the delay between resolving periods is of the order of [Formula: see text]. Examples illustrate the necessity of some delay to alleviate the tension between short- and long-term value. This paper was accepted by David Simchi-Levi, revenue management and market analytics. Funding: This work was supported by the National Science Foundation [Grant CMM-2010940] and the U.S. Department of Defense [Grant STTR A18B-T007].more » « less
-
null (Ed.)We study the dynamic assortment planning problem, where for each arriving customer, the seller offers an assortment of substitutable products and the customer makes the purchase among offered products according to an uncapacitated multinomial logit (MNL) model. Because all the utility parameters of the MNL model are unknown, the seller needs to simultaneously learn customers’ choice behavior and make dynamic decisions on assortments based on the current knowledge. The goal of the seller is to maximize the expected revenue, or, equivalently, to minimize the expected regret. Although dynamic assortment planning problem has received an increasing attention in revenue management, most existing policies require the estimation of mean utility for each product and the final regret usually involves the number of products [Formula: see text]. The optimal regret of the dynamic assortment planning problem under the most basic and popular choice model—the MNL model—is still open. By carefully analyzing a revenue potential function, we develop a trisection-based policy combined with adaptive confidence bound construction, which achieves an item-independent regret bound of [Formula: see text], where [Formula: see text] is the length of selling horizon. We further establish the matching lower bound result to show the optimality of our policy. There are two major advantages of the proposed policy. First, the regret of all our policies has no dependence on [Formula: see text]. Second, our policies are almost assumption-free: there is no assumption on mean utility nor any “separability” condition on the expected revenues for different assortments. We also extend our trisection search algorithm to capacitated MNL models and obtain the optimal regret [Formula: see text] (up to logrithmic factors) without any assumption on the mean utility parameters of items.more » « less
-
We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.more » « less
-
null (Ed.)We develop a new framework for designing online policies given access to an oracle providing statistical information about an off-line benchmark. Having access to such prediction oracles enables simple and natural Bayesian selection policies and raises the question as to how these policies perform in different settings. Our work makes two important contributions toward this question: First, we develop a general technique we call compensated coupling, which can be used to derive bounds on the expected regret (i.e., additive loss with respect to a benchmark) for any online policy and off-line benchmark. Second, using this technique, we show that a natural greedy policy, which we call the Bayes selector, has constant expected regret (i.e., independent of the number of arrivals and resource levels) for a large class of problems we refer to as “online allocation with finite types,” which includes widely studied online packing and online matching problems. Our results generalize and simplify several existing results for online packing and online matching and suggest a promising pathway for obtaining oracle-driven policies for other online decision-making settings. This paper was accepted by George Shanthikumar, big data analytics.more » « less
An official website of the United States government

