skip to main content


Title: Combinatorial Algorithms for Optimal Design
In an optimal design problem, we are given a set of linear experiments v1,…,vn∈Rd and k≥d, and our goal is to select a set or a multiset S⊆[n] of size k such that Φ((∑i∈Sviv⊤i)−1) is minimized. When Φ(M)=Determinant(M)1/d, the problem is known as the D-optimal design problem, and when Φ(M)=Trace(M), it is known as the A-optimal design problem. One of the most common heuristics used in practice to solve these problems is the local search heuristic, also known as the Fedorov’s exchange method (Fedorov, 1972). This is due to its simplicity and its empirical performance (Cook and Nachtrheim, 1980; Miller and Nguyen, 1994; Atkinson et al., 2007). However, despite its wide usage no theoretical bound has been proven for this algorithm. In this paper, we bridge this gap and prove approximation guarantees for the local search algorithms for D-optimal design and A-optimal design problems. We show that the local search algorithms are asymptotically optimal when kd is large. In addition to this, we also prove similar approximation guarantees for the greedy algorithms for D-optimal design and A-optimal design problems when k/d is large.  more » « less
Award ID(s):
1717947
NSF-PAR ID:
10106918
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Annual Conference on Learning Theory
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In an optimal design problem, we are given a set of linear experiments v1,…,vn∈Rd and k≥d, and our goal is to select a set or a multiset S⊆[n] of size k such that Φ((∑i∈Sviv⊤i)−1) is minimized. When Φ(M)=Determinant(M)1/d, the problem is known as the D-optimal design problem, and when Φ(M)=Trace(M), it is known as the A-optimal design problem. One of the most common heuristics used in practice to solve these problems is the local search heuristic, also known as the Fedorov’s exchange method (Fedorov, 1972). This is due to its simplicity and its empirical performance (Cook and Nachtrheim, 1980; Miller and Nguyen, 1994; Atkinson et al., 2007). However, despite its wide usage no theoretical bound has been proven for this algorithm. In this paper, we bridge this gap and prove approximation guarantees for the local search algorithms for D-optimal design and A-optimal design problems. We show that the local search algorithms are asymptotically optimal when kd is large. In addition to this, we also prove similar approximation guarantees for the greedy algorithms for D-optimal design and A-optimal design problems when kd is large. 
    more » « less
  2. We study the A-optimal design problem where we are given vectors υ1, …, υn ∊ ℝd, an integer k ≥ d, and the goal is to select a set S of k vectors that minimizes the trace of (∑i∊Svivi⊺)−1. Traditionally, the problem is an instance of optimal design of experiments in statistics [35] where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick k of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks [22], sparse least squares regression [8], feature selection for k-means clustering [9], and matrix approximation [13, 14, 5]. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design. Given a matrix, proportional volume sampling involves picking a set of columns S of size k with probability proportional to µ(S) times det(∑i∊Svivi⊺) for some measure µ. Our main result is to show the approximability of the A-optimal design problem can be reduced to approximate independence properties of the measure µ. We appeal to hardcore distributions as candidate distributions µ that allow us to obtain improved approximation algorithms for the A-optimal design. Our results include a d-approximation when k = d, an (1 + ∊)-approximation when and -approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for k ≤ d and obtain a k-approximation. We also show that the proportional volume sampling algorithm gives approximation algorithms for other optimal design objectives (such as D-optimal design [36] and generalized ratio objective [27]) matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the E-optimal design problem. We also show that the A-optimal design problem is NP-hard to approximate within a fixed constant when k = d. 
    more » « less
  3. We study the D-optimal Data Fusion (DDF) problem, which aims to select new data points, given an existing Fisher information matrix, so as to maximize the logarithm of the determinant of the overall Fisher information matrix. We show that the DDF problem is NP-hard and has no constant-factor polynomial-time approximation algorithm unless P = NP. Therefore, to solve the DDF problem effectively, we propose two convex integer-programming formulations and investigate their corresponding complementary and Lagrangian-dual problems. Leveraging the concavity of the objective functions in the two proposed convex integer-programming formulations, we design an exact algorithm, aimed at solving the DDF problem to optimality. We further derive a family of submodular valid inequalities and optimality cuts, which can significantly enhance the algorithm performance. We also develop scalable randomized-sampling and local-search algorithms with provable performance guarantees. Finally, we test our algorithms using real-world data on the new phasor-measurement-units placement problem for modern power grids, considering the existing conventional sensors. Our numerical study demonstrates the efficiency of our exact algorithm and the scalability and high-quality outputs of our approximation algorithms. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms—Discrete. Funding: Y. Li and W. Xie were supported in part by Division of Civil, Mechanical and Manufacturing Innovation [Grant 2046414] and Division of Computing and Communication Foundations [Grant 2246417]. J. Lee was supported in part by Air Force Office of Scientific Research [Grants FA9550-19-1-0175 and FA9550-22-1-0172]. M. Fampa was supported in part by Conselho Nacional de Desenvolvimento Científico e Tecnológico [Grants 305444/2019-0 and 434683/2018-3]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/ijoc.2022.0235 . 
    more » « less
  4. We study the classic Maximum Independent Set problem under the notion of stability introduced by Bilu and Linial (2010): a weighted instance of Independent Set is γ-stable if it has a unique optimal solution that remains the unique optimal solution under multiplicative perturbations of the weights by a factor of at most γ ≥ 1. The goal then is to efficiently recover this “pronounced” optimal solution exactly. In this work, we solve stable instances of Independent Set on several classes of graphs: we improve upon previous results by solving \tilde{O}(∆/sqrt(log ∆))-stable instances on graphs of maximum degree ∆, (k − 1)-stable instances on k-colorable graphs and (1 + ε)-stable instances on planar graphs (for any fixed ε > 0), using both combinatorial techniques as well as LPs and the Sherali-Adams hierarchy. For general graphs, we give an algorithm for (εn)-stable instances, for any fixed ε > 0, and lower bounds based on the planted clique conjecture. As a by-product of our techniques, we give algorithms as well as lower bounds for stable instances of Node Multiway Cut (a generalization of Edge Multiway Cut), by exploiting its connections to Vertex Cover. Furthermore, we prove a general structural result showing that the integrality gap of convex relaxations of several maximization problems reduces dramatically on stable instances. Moreover, we initiate the study of certified algorithms for Independent Set. The notion of a γ-certified algorithm was introduced very recently by Makarychev and Makarychev (2018) and it is a class of γ-approximation algorithms that satisfy one crucial property: the solution returned is optimal for a perturbation of the original instance, where perturbations are again multiplicative up to a factor of γ ≥ 1 (hence, such algorithms not only solve γ-stable instances optimally, but also have guarantees even on unstable instances). Here, we obtain ∆-certified algorithms for Independent Set on graphs of maximum degree ∆, and (1 + ε)-certified algorithms on planar graphs. Finally, we analyze the algorithm of Berman and Fürer (1994) and prove that it is a ((∆+1)/3 + ε)-certified algorithm for Independent Set on graphs of maximum degree ∆ where all weights are equal to 1. 
    more » « less
  5. Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21). 
    more » « less