Experimental design is a classical area in statistics and has also found new applications in machine learning. In the combinatorial experimental design problem, the aim is to estimate an unknown mdimensional vector x from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x0 can be obtained by a least squares computation. One of the robust measures of error estimation is the Doptimality criterion which aims to minimize the generalized variance of the estimator. This corresponds to minimizing the volume of the standard confidence ellipsoid for the estimation error x − x0. The problem gives rise to two natural variants depending on whether repetitions of experiments is allowed or not. The latter variant, while being more general, has also found applications in geographical location of sensors. We show a close connection between approximation algorithms for the Doptimal design problem and constructions of approximately mwise positively correlated distributions. This connection allows us to obtain first approximation algorithms for the Doptimal design problem withmore »
Approximation Algorithms for D optimal Design
Experimental design is a classical statistics problem, and its aim is to estimate an unknown vector from linear measurements where a Gaussian noise is introduced in each measurement. For the combinatorial experimental design problem, the goal is to pick a subset of experiments so as to make the most accurate estimate of the unknown parameters. In this paper, we will study one of the most robust measures of error estimation—the Doptimality criterion, which corresponds to minimizing the volume of the confidence ellipsoid for the estimation error. The problem gives rise to two natural variants depending on whether repetitions of experiments are allowed or not. We first propose an approximation algorithm with a 1/eapproximation for the Doptimal design problem with and without repetitions, giving the first constantfactor approximation for the problem. We then analyze another sampling approximation algorithm and prove that it is asymptotically optimal. Finally, for Doptimal design with repetitions, we study a different algorithm proposed by the literature and show that it can improve this asymptotic approximation ratio. All the sampling algorithms studied in this paper are shown to admit polynomialtime deterministic implementations.
 Award ID(s):
 1910423
 Publication Date:
 NSFPAR ID:
 10300810
 Journal Name:
 Mathematics of Operations Research
 Volume:
 45
 Issue:
 4
 Page Range or eLocationID:
 1512 to 1534
 ISSN:
 0364765X
 Sponsoring Org:
 National Science Foundation
More Like this


We study the Aoptimal design problem where we are given vectors υ1, …, υn ∊ ℝd, an integer k ≥ d, and the goal is to select a set S of k vectors that minimizes the trace of (∑i∊Svivi⊺)−1. Traditionally, the problem is an instance of optimal design of experiments in statistics [35] where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick k of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks [22], sparse least squares regression [8], feature selection for kmeans clustering [9], and matrix approximation [13, 14, 5]. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for Aoptimal design. Given a matrix, proportional volume sampling involves picking a set of columns S of size k with probability proportional to µ(S) times det(∑i∊Svivi⊺) for some measure µ. Our main result is to show the approximability of the Aoptimal design problem can be reduced to approximate independence properties of the measure µ. We appeal to hardcore distributions as candidate distributions µ that allow usmore »

We study optimal design problems in which the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector. We study the [Formula: see text]optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. We introduce the proportional volume sampling algorithm to obtain nearly optimal bounds in the asymptotic regime when the number [Formula: see text] of measurements made is significantly larger than the dimension [Formula: see text] and obtain the first approximation algorithms whose approximation factor does not degrade with the number of possible measurements when [Formula: see text] is small. The algorithm also gives approximation guarantees for other optimal design objectives such as [Formula: see text]optimality and the generalized ratio objective, matching or improving the previously bestknown results. We further show that bounds similar to ours cannot be obtained for [Formula: see text]optimal design and that [Formula: see text]optimal design is NPhard to approximate within a fixed constant when [Formula: see text].

We consider the problem of designing sublinear time algorithms for estimating the cost of minimum] metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any epsilon > 0, there exists an O^~(n/epsilon^O(1))time algorithm that returns a (1+epsilon)approximate estimate of the MST cost. This result immediately implies an O^~(n/epsilon^O(1)) time algorithm to estimate the TSP cost to within a (2 + epsilon) factor for any epsilon > 0. However, no o(n^2)time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + epsilon)approximate estimation algorithms for metric TSP with O^~ (n) time for any fixed epsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. On the algorithmic side, we first consider the graphic TSP problemmore »

In an optimal design problem, we are given a set of linear experiments v1,…,vn∈Rd and k≥d, and our goal is to select a set or a multiset S⊆[n] of size k such that Φ((∑i∈Sviv⊤i)−1) is minimized. When Φ(M)=Determinant(M)1/d, the problem is known as the Doptimal design problem, and when Φ(M)=Trace(M), it is known as the Aoptimal design problem. One of the most common heuristics used in practice to solve these problems is the local search heuristic, also known as the Fedorov’s exchange method (Fedorov, 1972). This is due to its simplicity and its empirical performance (Cook and Nachtrheim, 1980; Miller and Nguyen, 1994; Atkinson et al., 2007). However, despite its wide usage no theoretical bound has been proven for this algorithm. In this paper, we bridge this gap and prove approximation guarantees for the local search algorithms for Doptimal design and Aoptimal design problems. We show that the local search algorithms are asymptotically optimal when kd is large. In addition to this, we also prove similar approximation guarantees for the greedy algorithms for Doptimal design and Aoptimal design problems when k/d is large.