skip to main content

This content will become publicly available on March 29, 2023

Title: Hidden Integrality and Semirandom Robustness of SDP Relaxation for Sub-Gaussian Mixture Model
We consider the problem of estimating the discrete clustering structures under the sub-Gaussian mixture model. Our main results establish a hidden integrality property of a semidefinite programming (SDP) relaxation for this problem: while the optimal solution to the SDP is not integer-valued in general, its estimation error can be upper bounded by that of an idealized integer program. The error of the integer program, and hence that of the SDP, are further shown to decay exponentially in the signal-to-noise ratio. In addition, we show that the SDP relaxation is robust under the semirandom setting in which an adversary can modify the data generated from the mixture model. In particular, we generalize the hidden integrality property to the semirandom model and thereby show that SDP achieves the optimal error bound in this setting. These results together highlight the “global-to-local” mechanism that drives the performance of the SDP relaxation. To the best of our knowledge, our result is the first exponentially decaying error bound for convex relaxations of mixture models. A corollary of our results shows that in certain regimes, the SDP solutions are in fact integral and exact. More generally, our results establish sufficient conditions for the SDP to correctly recover more » the cluster memberships of [Formula: see text] fraction of the points for any [Formula: see text]. As a special case, we show that under the [Formula: see text]-dimensional stochastic ball model, SDP achieves nontrivial (sometimes exact) recovery when the center separation is as small as [Formula: see text], which improves upon previous exact recovery results that require constant separation. « less
Award ID(s):
Publication Date:
Journal Name:
Mathematics of Operations Research
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo tree search (MCTS), in the context of the infinite-horizon discounted cost Markov decision process (MDP). Although MCTS is believed to provide an approximate value function for a given state with enough simulations, the claimed proof of this property is incomplete. This is because the variant of MCTS, the upper confidence bound for trees (UCT), analyzed in prior works, uses “logarithmic” bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multiarm bandit (MAB) literature. In effect, such an approach assumes that the regret of the underlying recursively dependent nonstationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold, even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of nonstationary MABs. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has a claimed property. Interestingly enough, empirically successful approaches use a similar polynomial form of MCTS as suggested by our result. Using this as a building block, we arguemore »that MCTS, combined with nearest neighbor supervised learning, acts as a “policy improvement” operator; that is, it iteratively improves value function approximation for all states because of combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn an ε approximation of the value function with respect to [Formula: see text] norm, MCTS combined with nearest neighbor requires a sample size scaling as [Formula: see text], where d is the dimension of the state space. This is nearly optimal because of a minimax lower bound of [Formula: see text], suggesting the strength of the variant of MCTS we propose here and our resulting analysis.« less
  2. We study the dynamic assortment planning problem, where for each arriving customer, the seller offers an assortment of substitutable products and the customer makes the purchase among offered products according to an uncapacitated multinomial logit (MNL) model. Because all the utility parameters of the MNL model are unknown, the seller needs to simultaneously learn customers’ choice behavior and make dynamic decisions on assortments based on the current knowledge. The goal of the seller is to maximize the expected revenue, or, equivalently, to minimize the expected regret. Although dynamic assortment planning problem has received an increasing attention in revenue management, most existing policies require the estimation of mean utility for each product and the final regret usually involves the number of products [Formula: see text]. The optimal regret of the dynamic assortment planning problem under the most basic and popular choice model—the MNL model—is still open. By carefully analyzing a revenue potential function, we develop a trisection-based policy combined with adaptive confidence bound construction, which achieves an item-independent regret bound of [Formula: see text], where [Formula: see text] is the length of selling horizon. We further establish the matching lower bound result to show the optimality of our policy. There aremore »two major advantages of the proposed policy. First, the regret of all our policies has no dependence on [Formula: see text]. Second, our policies are almost assumption-free: there is no assumption on mean utility nor any “separability” condition on the expected revenues for different assortments. We also extend our trisection search algorithm to capacitated MNL models and obtain the optimal regret [Formula: see text] (up to logrithmic factors) without any assumption on the mean utility parameters of items.« less
  3. Price-based revenue management is an important problem in operations management with many practical applications. The problem considers a seller who sells one or multiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, are often applied as computationally tractable alternatives. In this paper, we show the following two results for price-based network revenue management under a continuous price set. First, we prove that a natural resolving heuristic attains O(1) regret compared with the value of the optimal policy. This improves the [Formula: see text] regret upper bound established in the prior work by Jasin in 2014. Second, we prove that there is an [Formula: see text] gap between the value of the optimal policy and that of the fluid model. This complements our upper bound result by showing that the fluid is not an adequate information-relaxed benchmark when analyzing price-based revenue management algorithms. Funding: This work was supported in part by the National Science Foundation [Grant CMMI-2145661].
  4. Queueing models that are used to capture various service settings typically assume that customers require a single unit of resource (server) to be processed. However, there are many service settings where such an assumption may fail to capture the heterogeneity in resource requirements of different customers. We propose a multiserver queueing model with multiple customer classes in which customers from different classes may require different amounts of resources to be served. We study the optimal scheduling policy for such systems. To balance holding costs, service rates, resource requirement, and priority-induced idleness, we develop an index-based policy that we refer to as the idle-avoid [Formula: see text] rule. For a two-class two-server model, where policy-induced idleness can have a big impact on system performance, we characterize cases where the idle-avoid [Formula: see text] rule is optimal. In other cases, we establish a uniform performance bound on the amount of suboptimality incurred by the idle-avoid [Formula: see text] rule. For general multiclass multiserver queues, we establish the asymptotic optimality of the idle-avoid [Formula: see text] rule in the many-server regime. For long-time horizons, we show that the idle-avoid [Formula: see text] is throughput optimal. Our theoretical results, along with numerical experiments, providemore »support for the good and robust performance of the proposed policy.« less
  5. We consider a revenue-maximizing seller with m heterogeneous items and a single buyer whose valuation for the items may exhibit both substitutes and complements. We show that the better of selling the items separately and bundling them together— guarantees a [Formula: see text]-fraction of the optimal revenue, where d is a measure of the degree of complementarity; it extends prior work showing that the same simple mechanism achieves a constant-factor approximation when buyer valuations are subadditive (the most general class of complement-free valuations). Our proof is enabled by a recent duality framework, which we use to obtain a bound on the optimal revenue in the generalized setting. Our technical contributions are domain specific to handle the intricacies of settings with complements. One key modeling contribution is a tractable notion of “degree of complementarity” that admits meaningful results and insights—we demonstrate that previous definitions fall short in this regard.