skip to main content

Search for: All records

Creators/Authors contains: "Zhou, Yuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult thanmore »existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.« less
  2. Zooming in on cells reveals patterns on their outer surfaces. These patterns are actually a collection of distinct areas of the cell surface, each containing specific combinations of molecules. The outer layers of pollen grains consist of a cell wall, and a softer cell membrane that sits underneath. As a pollen grain develops, it recruits certain fats and proteins to specific areas of the cell membrane, known as ‘aperture domains’. The composition of these domains blocks the cell wall from forming over them, leading to gaps in the wall called ‘pollen apertures’. Pollen apertures can open and close, aiding reproduction and protecting pollen grains from dehydration. The number, location, and shape of pollen apertures vary between different plant species, but are consistent within the same species. In the plant species Arabidopsis thaliana , pollen normally develops three long and narrow, equally spaced apertures, but it remains unclear how pollen grains control the number and location of aperture domains. Zhou et al. found that mutations in two closely related A. thaliana proteins – ELMOD_A and MCR – alter the number and positions of pollen apertures. When A. thaliana plants were genetically modified so that they would produce different levels of ELMOD_Amore »and MCR, Zhou et al. observed that when more of these proteins were present in a pollen grain, more apertures were generated on the pollen surface. This finding suggests that the levels of these proteins must be tightly regulated to control pollen aperture numbers. Further tests revealed that another related protein, called ELMOD_E, also has a role in domain formation. When artificially produced in developing pollen grains, it interfered with the activity of ELMOD_A and MCR, changing pollen aperture shape, number, and location. Zhou et al. identified a group of proteins that help control the formation of domains in the cell membranes of A. thaliana pollen grains. Further research will be required to determine what exactly these proteins do to promote formation of aperture domains and whether similar proteins control domain development in other organisms.« less
  3. We study the dynamic assortment planning problem, where for each arriving customer, the seller offers an assortment of substitutable products and the customer makes the purchase among offered products according to an uncapacitated multinomial logit (MNL) model. Because all the utility parameters of the MNL model are unknown, the seller needs to simultaneously learn customers’ choice behavior and make dynamic decisions on assortments based on the current knowledge. The goal of the seller is to maximize the expected revenue, or, equivalently, to minimize the expected regret. Although dynamic assortment planning problem has received an increasing attention in revenue management, most existing policies require the estimation of mean utility for each product and the final regret usually involves the number of products [Formula: see text]. The optimal regret of the dynamic assortment planning problem under the most basic and popular choice model—the MNL model—is still open. By carefully analyzing a revenue potential function, we develop a trisection-based policy combined with adaptive confidence bound construction, which achieves an item-independent regret bound of [Formula: see text], where [Formula: see text] is the length of selling horizon. We further establish the matching lower bound result to show the optimality of our policy. There aremore »two major advantages of the proposed policy. First, the regret of all our policies has no dependence on [Formula: see text]. Second, our policies are almost assumption-free: there is no assumption on mean utility nor any “separability” condition on the expected revenues for different assortments. We also extend our trisection search algorithm to capacitated MNL models and obtain the optimal regret [Formula: see text] (up to logrithmic factors) without any assumption on the mean utility parameters of items.« less