skip to main content

Title: Combinatorial Algorithms for Optimal Design
In an optimal design problem, we are given a set of linear experiments v1,…,vn∈Rd and k≥d, and our goal is to select a set or a multiset S⊆[n] of size k such that Φ((∑i∈Sviv⊤i)−1) is minimized. When Φ(M)=Determinant(M)1/d, the problem is known as the D-optimal design problem, and when Φ(M)=Trace(M), it is known as the A-optimal design problem. One of the most common heuristics used in practice to solve these problems is the local search heuristic, also known as the Fedorov’s exchange method (Fedorov, 1972). This is due to its simplicity and its empirical performance (Cook and Nachtrheim, 1980; Miller and Nguyen, 1994; Atkinson et al., 2007). However, despite its wide usage no theoretical bound has been proven for this algorithm. In this paper, we bridge this gap and prove approximation guarantees for the local search algorithms for D-optimal design and A-optimal design problems. We show that the local search algorithms are asymptotically optimal when kd is large. In addition to this, we also prove similar approximation guarantees for the greedy algorithms for D-optimal design and A-optimal design problems when kd is large.
Authors:
; ; ;
Award ID(s):
1910423
Publication Date:
NSF-PAR ID:
10196079
Journal Name:
Conference on Learning Theory, COLT 2019
Volume:
99
Page Range or eLocation-ID:
2210-2258
Sponsoring Org:
National Science Foundation
More Like this
  1. In an optimal design problem, we are given a set of linear experiments v1,…,vn∈Rd and k≥d, and our goal is to select a set or a multiset S⊆[n] of size k such that Φ((∑i∈Sviv⊤i)−1) is minimized. When Φ(M)=Determinant(M)1/d, the problem is known as the D-optimal design problem, and when Φ(M)=Trace(M), it is known as the A-optimal design problem. One of the most common heuristics used in practice to solve these problems is the local search heuristic, also known as the Fedorov’s exchange method (Fedorov, 1972). This is due to its simplicity and its empirical performance (Cook and Nachtrheim, 1980; Millermore »and Nguyen, 1994; Atkinson et al., 2007). However, despite its wide usage no theoretical bound has been proven for this algorithm. In this paper, we bridge this gap and prove approximation guarantees for the local search algorithms for D-optimal design and A-optimal design problems. We show that the local search algorithms are asymptotically optimal when kd is large. In addition to this, we also prove similar approximation guarantees for the greedy algorithms for D-optimal design and A-optimal design problems when k/d is large.« less
  2. We study the A-optimal design problem where we are given vectors υ1, …, υn ∊ ℝd, an integer k ≥ d, and the goal is to select a set S of k vectors that minimizes the trace of (∑i∊Svivi⊺)−1. Traditionally, the problem is an instance of optimal design of experiments in statistics [35] where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick k of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placementmore »in wireless networks [22], sparse least squares regression [8], feature selection for k-means clustering [9], and matrix approximation [13, 14, 5]. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design. Given a matrix, proportional volume sampling involves picking a set of columns S of size k with probability proportional to µ(S) times det(∑i∊Svivi⊺) for some measure µ. Our main result is to show the approximability of the A-optimal design problem can be reduced to approximate independence properties of the measure µ. We appeal to hardcore distributions as candidate distributions µ that allow us to obtain improved approximation algorithms for the A-optimal design. Our results include a d-approximation when k = d, an (1 + ∊)-approximation when and -approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for k ≤ d and obtain a k-approximation. We also show that the proportional volume sampling algorithm gives approximation algorithms for other optimal design objectives (such as D-optimal design [36] and generalized ratio objective [27]) matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the E-optimal design problem. We also show that the A-optimal design problem is NP-hard to approximate within a fixed constant when k = d.« less
  3. We study the classic Maximum Independent Set problem under the notion of stability introduced by Bilu and Linial (2010): a weighted instance of Independent Set is γ-stable if it has a unique optimal solution that remains the unique optimal solution under multiplicative perturbations of the weights by a factor of at most γ ≥ 1. The goal then is to efficiently recover this “pronounced” optimal solution exactly. In this work, we solve stable instances of Independent Set on several classes of graphs: we improve upon previous results by solving \tilde{O}(∆/sqrt(log ∆))-stable instances on graphs of maximum degree ∆, (k −more »1)-stable instances on k-colorable graphs and (1 + ε)-stable instances on planar graphs (for any fixed ε > 0), using both combinatorial techniques as well as LPs and the Sherali-Adams hierarchy. For general graphs, we give an algorithm for (εn)-stable instances, for any fixed ε > 0, and lower bounds based on the planted clique conjecture. As a by-product of our techniques, we give algorithms as well as lower bounds for stable instances of Node Multiway Cut (a generalization of Edge Multiway Cut), by exploiting its connections to Vertex Cover. Furthermore, we prove a general structural result showing that the integrality gap of convex relaxations of several maximization problems reduces dramatically on stable instances. Moreover, we initiate the study of certified algorithms for Independent Set. The notion of a γ-certified algorithm was introduced very recently by Makarychev and Makarychev (2018) and it is a class of γ-approximation algorithms that satisfy one crucial property: the solution returned is optimal for a perturbation of the original instance, where perturbations are again multiplicative up to a factor of γ ≥ 1 (hence, such algorithms not only solve γ-stable instances optimally, but also have guarantees even on unstable instances). Here, we obtain ∆-certified algorithms for Independent Set on graphs of maximum degree ∆, and (1 + ε)-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Fürer (1994) and prove that it is a ((∆+1)/3 + ε)-certified algorithm for Independent Set on graphs of maximum degree ∆ where all weights are equal to 1.« less
  4. We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of m sets over n elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in [17] to the sub-linear query model, that returns an α-approximate cover using Õ(m(n/k)^1/(α–1) + nk) queries to the input, where k denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of k, themore »required number of queries is , even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require Ω(nk) queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter k. On the other hand, we show that this bound is not optimal for larger values of the parameter k, as there exists a (1 + ε)-approximation algorithm with Õ(mn/kε^2) queries. We show that this bound is essentially tight for sufficiently small constant ε, by establishing a lower bound of query complexity. Our lower-bound results follow by carefully designing two distributions of instances that are hard to distinguish. In particular, our first lower bound involves a probabilistic construction of a certain set system with a minimum set cover of size αk, with the key property that a small number of “almost uniformly distributed” modifications can reduce the minimum set cover size down to k. Thus, these modifications are not detectable unless a large number of queries are asked. We believe that our probabilistic construction technique might find applications to lower bounds for other combinatorial optimization problems.« less
  5. Experimental design is a classical area in statistics and has also found new applications in machine learning. In the combinatorial experimental design problem, the aim is to estimate an unknown m-dimensional vector x from linear measurements where a Gaussian noise is introduced in each measurement. The goal is to pick k out of the given n experiments so as to make the most accurate estimate of the unknown parameter x. Given a set S of chosen experiments, the most likelihood estimate x0 can be obtained by a least squares computation. One of the robust measures of error estimation is themore »D-optimality criterion which aims to minimize the generalized variance of the estimator. This corresponds to minimizing the volume of the standard confidence ellipsoid for the estimation error x − x0. The problem gives rise to two natural variants depending on whether repetitions of experiments is allowed or not. The latter variant, while being more general, has also found applications in geographical location of sensors. We show a close connection between approximation algorithms for the D-optimal design problem and constructions of approximately m-wise positively correlated distributions. This connection allows us to obtain first approximation algorithms for the D-optimal design problem with and without repetitions. We then consider the case when the number of experiments chosen is much larger than the dimension m and show one can obtain asymptotically optimal algorithms in this case.« less