skip to main content

Title: Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design
We study optimal design problems in which the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector. We study the [Formula: see text]-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. We introduce the proportional volume sampling algorithm to obtain nearly optimal bounds in the asymptotic regime when the number [Formula: see text] of measurements made is significantly larger than the dimension [Formula: see text] and obtain the first approximation algorithms whose approximation factor does not degrade with the number of possible measurements when [Formula: see text] is small. The algorithm also gives approximation guarantees for other optimal design objectives such as [Formula: see text]-optimality and the generalized ratio objective, matching or improving the previously best-known results. We further show that bounds similar to ours cannot be obtained for [Formula: see text]-optimal design and that [Formula: see text]-optimal design is NP-hard to approximate within a fixed constant when [Formula: see text].  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;  
Date Published:
Journal Name:
Mathematics of Operations Research
Page Range / eLocation ID:
847 to 877
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the following general network design problem. The input is an asymmetric metric (V, c), root [Formula: see text], monotone submodular function [Formula: see text], and budget B. The goal is to find an r-rooted arborescence T of cost at most B that maximizes f(T). Our main result is a simple quasi-polynomial time [Formula: see text]-approximation algorithm for this problem, in which [Formula: see text] is the number of vertices in an optimal solution. As a consequence, we obtain an [Formula: see text]-approximation algorithm for directed (polymatroid) Steiner tree in quasi-polynomial time. We also extend our main result to a setting with additional length bounds at vertices, which leads to improved [Formula: see text]-approximation algorithms for the single-source buy-at-bulk and priority Steiner tree problems. For the usual directed Steiner tree problem, our result matches the best previous approximation ratio but improves significantly on the running time. For polymatroid Steiner tree and single-source buy-at-bulk, our result improves prior approximation ratios by a logarithmic factor. For directed priority Steiner tree, our result seems to be the first nontrivial approximation ratio. Under certain complexity assumptions, our approximation ratios are the best possible (up to constant factors). 
    more » « less
  2. We study the assortment optimization problem when customer choices are governed by the paired combinatorial logit model. We study unconstrained, cardinality-constrained, and knapsack-constrained versions of this problem, which are all known to be NP-hard. We design efficient algorithms that compute approximately optimal solutions, using a novel relation to the maximum directed cut problem and suitable linear-program rounding algorithms. We obtain a randomized polynomial time approximation scheme for the unconstrained version and performance guarantees of 50% and [Formula: see text] for the cardinality-constrained and knapsack-constrained versions, respectively. These bounds improve significantly over prior work. We also obtain a performance guarantee of 38.5% for the assortment problem under more general constraints, such as multidimensional knapsack (where products have multiple attributes and there is a knapsack constraint on each attribute) and partition constraints (where products are partitioned into groups and there is a limit on the number of products selected from each group). In addition, we implemented our algorithms and tested them on random instances available in prior literature. We compared our algorithms against an upper bound obtained using a linear program. Our average performance bounds for the unconstrained, cardinality-constrained, knapsack-constrained, and partition-constrained versions are over 99%, 99%, 96%, and 99%, respectively. 
    more » « less
  3. We study the A-optimal design problem where we are given vectors υ1, …, υn ∊ ℝd, an integer k ≥ d, and the goal is to select a set S of k vectors that minimizes the trace of (∑i∊Svivi⊺)−1. Traditionally, the problem is an instance of optimal design of experiments in statistics [35] where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick k of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks [22], sparse least squares regression [8], feature selection for k-means clustering [9], and matrix approximation [13, 14, 5]. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design. Given a matrix, proportional volume sampling involves picking a set of columns S of size k with probability proportional to µ(S) times det(∑i∊Svivi⊺) for some measure µ. Our main result is to show the approximability of the A-optimal design problem can be reduced to approximate independence properties of the measure µ. We appeal to hardcore distributions as candidate distributions µ that allow us to obtain improved approximation algorithms for the A-optimal design. Our results include a d-approximation when k = d, an (1 + ∊)-approximation when and -approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for k ≤ d and obtain a k-approximation. We also show that the proportional volume sampling algorithm gives approximation algorithms for other optimal design objectives (such as D-optimal design [36] and generalized ratio objective [27]) matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the E-optimal design problem. We also show that the A-optimal design problem is NP-hard to approximate within a fixed constant when k = d. 
    more » « less
  4. Experimental design is a classical area in statistics and has also found new applications in machine learning. In the combinatorial experimental design problem, the aim is to estimate an unknown m-dimensional vector x from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x0 can be obtained by a least squares computation. One of the robust measures of error estimation is the D-optimality criterion which aims to minimize the generalized variance of the estimator. This corresponds to minimizing the volume of the standard confidence ellipsoid for the estimation error x − x0. The problem gives rise to two natural variants depending on whether repetitions of experiments is allowed or not. The latter variant, while being more general, has also found applications in geographical location of sensors. We show a close connection between approximation algorithms for the D-optimal design problem and constructions of approximately m-wise positively correlated distributions. This connection allows us to obtain first approximation algorithms for the D-optimal design problem with and without repetitions. We then consider the case when the number of experiments chosen is much larger than the dimension m and show one can obtain asymptotically optimal algorithms in this case. 
    more » « less
  5. World Scientific (Ed.)
    In general, it is difficult to measure distances in the Weil–Petersson metric on Teichmüller space. Here we consider the distance between strata in the Weil–Petersson completion of Teichmüller space of a surface of finite type. Wolpert showed that for strata whose closures do not intersect, there is a definite separation independent of the topology of the surface. We prove that the optimal value for this minimal separation is a constant [Formula: see text] and show that it is realized exactly by strata whose nodes intersect once. We also give a nearly sharp estimate for [Formula: see text] and give a lower bound on the size of the gap between [Formula: see text] and the other distances. A major component of the paper is an effective version of Wolpert’s upper bound on [Formula: see text], the inner product of the Weil–Petersson gradient of length functions. We further bound the distance to the boundary of Teichmüller space of a hyperbolic surface in terms of the length of the systole of the surface. We also obtain new lower bounds on the systole for the Weil–Petersson metric on the moduli space of a punctured torus. 
    more » « less