skip to main content


Title: Algorithms and Complexities of Matching Variants in Covariate Balancing
Here, we study several variants of matching problems that arise in covariate balancing. Covariate balancing problems can be viewed as variants of matching, or b-matching, with global side constraints. We present here a comprehensive complexity study of the covariate balancing problems providing polynomial time algorithms, or a proof of NP-hardness. The polynomial time algorithms described are mostly combinatorial and rely on network flow techniques. In addition, we present several fixed-parameter tractable results for problems where the number of covariates and the number of levels of each covariate are seen as a parameter.  more » « less
Award ID(s):
1760102
NSF-PAR ID:
10356989
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Operations Research
ISSN:
0030-364X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with subexponential dependence on d, while maintaining polylogarithmic dependence on n. We present: -a randomized LCA for computing maximal independent sets whose time and space complexities are quasi-polynomial in d and polylogarithmic in n; -for constant ε>0, a randomized LCA that provides a (1−ε)-approximation to maximum matching with high probability, whose time and space complexities are polynomial in d and polylogarithmic in n. 
    more » « less
  2. In this article, we study a wide range of variants for computing the (discrete and continuous) Fréchet distance between uncertain curves. An uncertain curve is a sequence of uncertainty regions, where each region is a disk, a line segment, or a set of points. A realisation of a curve is a polyline connecting one point from each region. Given an uncertain curve and a second (certain or uncertain) curve, we seek to compute the lower and upper bound Fréchet distance, which are the minimum and maximum Fréchet distance for any realisations of the curves. We prove that both problems are NP-hard for the Fréchet distance in several uncertainty models, and that the upper bound problem remains hard for the discrete Fréchet distance. In contrast, the lower bound (discrete [ 5 ] and continuous) Fréchet distance can be computed in polynomial time in some models. Furthermore, we show that computing the expected (discrete and continuous) Fréchet distance is #P-hard in some models. On the positive side, we present an FPTAS in constant dimension for the lower bound problem when Δ/δ is polynomially bounded, where δ is the Fréchet distance and Δ bounds the diameter of the regions. We also show a near-linear-time 3-approximation for the decision problem on roughly δ-separated convex regions. Finally, we study the setting with Sakoe–Chiba time bands, where we restrict the alignment between the curves, and give polynomial-time algorithms for the upper bound and expected discrete and continuous Fréchet distance for uncertainty modelled as point sets. 
    more » « less
  3. In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs and that Wheeler graphs can be indexed in a way which is space-efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present - The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA's for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA's where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1; - There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time; - We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation; - We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1); The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial-time solvable, raising the open question of which parameters determine this problem's difficulty. 
    more » « less
  4. Mikołaj Boja´nczyk, Emanuela Merelli (Ed.)
    We initiate a systematic study of algorithms that are both differentially-private and run in sublinear time for several problems in which the goal is to estimate natural graph parameters. Our main result is a differentially-private $(1+\rho)$-approximation algorithm for the problem of computing the average degree of a graph, for every $\rho>0$. The running time of the algorithm is roughly the same (for sparse graphs) as its non-private version proposed by Goldreich and Ron (Sublinear Algorithms, 2005). We also obtain the first differentially-private sublinear-time approximation algorithms for the maximum matching size and the minimum vertex cover size of a graph. An overarching technique we employ is the notion of \emph{coupled global sensitivity} of randomized algorithms. Related variants of this notion of sensitivity have been used in the literature in ad-hoc ways. Here we formalize the notion and develop it as a unifying framework for privacy analysis of randomized approximation algorithms. 
    more » « less
  5. null (Ed.)
    Waiting at the right location at the right time can be critically important in certain variants of time-dependent shortest path problems. We investigate the computational complexity of time-dependent shortest path problems in which there is either a penalty on waiting or a limit on the total time spent waiting at a given subset of the nodes. We show that some cases are nondeterministic polynomial-time hard, and others can be solved in polynomial time, depending on the choice of the subset of nodes, on whether waiting is penalized or constrained, and on the magnitude of the penalty/waiting limit parameter. Summary of Contributions: This paper addresses simple yet relevant extensions of a fundamental problem in Operations Research: the Shortest Path Problem (SPP). It considers time-dependent variants of SPP, which can account for changing traffic and/or weather conditions. The first variant that is tackled allows for waiting at certain nodes but at a cost. The second variant instead places a limit on the total waiting. Both variants have applications in transportation, e.g., when it is possible to wait at certain locations if the benefits outweigh the costs. The paper investigates these problems using complexity analysis and algorithm design, both tools from the field of computing. Different cases are considered depending on which of the nodes contribute to the waiting cost or waiting limit (all nodes, all nodes except the origin, a subset of nodes…). The computational complexity of all cases is determined, providing complexity proofs for the variants that are NP-Hard and polynomial time algorithms for the variants that are in P. 
    more » « less