skip to main content


This content will become publicly available on August 22, 2024

Title: D-Optimal Data Fusion: Exact and Approximation Algorithms
We study the D-optimal Data Fusion (DDF) problem, which aims to select new data points, given an existing Fisher information matrix, so as to maximize the logarithm of the determinant of the overall Fisher information matrix. We show that the DDF problem is NP-hard and has no constant-factor polynomial-time approximation algorithm unless P = NP. Therefore, to solve the DDF problem effectively, we propose two convex integer-programming formulations and investigate their corresponding complementary and Lagrangian-dual problems. Leveraging the concavity of the objective functions in the two proposed convex integer-programming formulations, we design an exact algorithm, aimed at solving the DDF problem to optimality. We further derive a family of submodular valid inequalities and optimality cuts, which can significantly enhance the algorithm performance. We also develop scalable randomized-sampling and local-search algorithms with provable performance guarantees. Finally, we test our algorithms using real-world data on the new phasor-measurement-units placement problem for modern power grids, considering the existing conventional sensors. Our numerical study demonstrates the efficiency of our exact algorithm and the scalability and high-quality outputs of our approximation algorithms. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms—Discrete. Funding: Y. Li and W. Xie were supported in part by Division of Civil, Mechanical and Manufacturing Innovation [Grant 2046414] and Division of Computing and Communication Foundations [Grant 2246417]. J. Lee was supported in part by Air Force Office of Scientific Research [Grants FA9550-19-1-0175 and FA9550-22-1-0172]. M. Fampa was supported in part by Conselho Nacional de Desenvolvimento Científico e Tecnológico [Grants 305444/2019-0 and 434683/2018-3]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/ijoc.2022.0235 .  more » « less
Award ID(s):
2246417 2246414
NSF-PAR ID:
10444682
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
INFORMS Journal on Computing
ISSN:
1091-9856
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to develop a sampling algorithm and derive its approximation bound for MESP, which improves the best known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. Besides, we investigate the widely used local search algorithm and prove its first known approximation bound for MESP. The proof techniques further inspire for us an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-size and large-scale instances to near optimality. Finally, we extend the analyses to the A-optimal MESP, for which the objective is to minimize the trace of the inverse of the selected principal submatrix. Funding: This work was supported by the National Science Foundation Division of Information and Intelligent Systems [Grant 2246417] and Division of Civil, Mechanical and Manufacturing Innovation [Grant 2246414]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2023.2488 . 
    more » « less
  2. This paper studies the distributionally robust fair transit resource allocation model (DrFRAM) under the Wasserstein ambiguity set to optimize the public transit resource allocation during a pandemic. We show that the proposed DrFRAM is highly nonconvex and nonlinear, and it is NP-hard in general. Fortunately, we show that DrFRAM can be reformulated as a mixed integer linear programming (MILP) by leveraging the equivalent representation of distributionally robust optimization and monotonicity properties, binarizing integer variables, and linearizing nonconvex terms. To improve the proposed MILP formulation, we derive stronger ones and develop valid inequalities by exploiting the model structures. Additionally, we develop scenario decomposition methods using different MILP formulations to solve the scenario subproblems and introduce a simple yet effective no one left-based approximation algorithm with a provable approximation guarantee to solve the model to near optimality. Finally, we numerically demonstrate the effectiveness of the proposed approaches and apply them to real-world data provided by the Blacksburg Transit.

    History: This paper has been accepted for the Transportation Science Special Issue on Emerging Topics in Transportation Science and Logistics.

    Funding: This work was supported by the Division of Computing and Communication Foundations [Grant 2153607] and the Division of Civil, Mechanical and Manufacturing Innovation [Grant 2046426].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2022.1159 .

     
    more » « less
  3. Cliques and their generalizations are frequently used to model “tightly knit” clusters in graphs and identifying such clusters is a popular technique used in graph-based data mining. One such model is the s-club, which is a vertex subset that induces a subgraph of diameter at most s. This model has found use in a variety of fields because low-diameter clusters have practical significance in many applications. As this property is not hereditary on vertex-induced subgraphs, the diameter of a subgraph could increase upon the removal of some vertices and the subgraph could even become disconnected. For example, star graphs have diameter two but can be disconnected by removing the central vertex. The pursuit of a fault-tolerant extension of the s-club model has spawned two variants that we study in this article: robust s-clubs and hereditary s-clubs. We analyze the complexity of the verification and optimization problems associated with these variants. Then, we propose cut-like integer programming formulations for both variants whenever possible and investigate the separation complexity of the cut-like constraints. We demonstrate through our extensive computational experiments that the algorithmic ideas we introduce enable us to solve the problems to optimality on benchmark instances with several thousand vertices. This work lays the foundations for effective mathematical programming approaches for finding fault-tolerant s-clubs in large-scale networks. History: Accepted by David Alderson, Area Editor for Network Optimization: Algorithms & Applications. Funding: The computing for this project was performed at the High Performance Computing Center at Oklahoma State University supported in part through the National Science Foundation [Grant OAC-1531128]. This material is based upon work supported by the National Science Foundation under [Grants 1662757 and 1942065]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.1231 . 
    more » « less
  4. Ride-pooling, which accommodates multiple passenger requests in a single trip, has the potential to substantially enhance the throughput of mobility-on-demand (MoD) systems. This paper investigates MoD systems that operate mixed fleets composed of “basic supply” and “augmented supply” vehicles. When the basic supply is insufficient to satisfy demand, augmented supply vehicles can be repositioned to serve rides at a higher operational cost. We formulate the joint vehicle repositioning and ride-pooling assignment problem as a two-stage stochastic integer program, where repositioning augmented supply vehicles precedes the realization of ride requests. Sequential ride-pooling assignments aim to maximize total utility or profit on a shareability graph: a hypergraph representing the matching compatibility between available vehicles and pending requests. Two approximation algorithms for midcapacity and high-capacity vehicles are proposed in this paper; the respective approximation ratios are [Formula: see text] and [Formula: see text], where p is the maximum vehicle capacity plus one. Our study evaluates the performance of these approximation algorithms using an MoD simulator, demonstrating that these algorithms can parallelize computations and achieve solutions with small optimality gaps (typically within 1%). These efficient algorithms pave the way for various multimodal and multiclass MoD applications.

    History: This paper has been accepted for the Transportation Science Special Issue on Emerging Topics in Transportation Science and Logistics.

    Funding: This work was supported by the National Science Foundation [Grants CCF-2006778 and FW-HTF-P 2222806], the Ford Motor Company, and the Division of Civil, Mechanical, and Manufacturing Innovation [Grants CMMI-1854684, CMMI-1904575, and CMMI-1940766].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2021.0349 .

     
    more » « less
  5. Abstract

    This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “$$\ell _0$$0-path” where the discontinuous$$\ell _0$$0-function provides the exact sparsity count; the “$$\ell _1$$1-path” where the$$\ell _1$$1-function provides a convex surrogate of sparsity count; and the “capped$$\ell _1$$1-path” where the nonconvex nondifferentiable capped$$\ell _1$$1-function aims to enhance the$$\ell _1$$1-approximation. Serving different purposes, each of these three formulations is different from each other, both analytically and computationally. Our results deepen the understanding of (old and new) properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped$$\ell _1$$1-path. Our study of the capped$$\ell _1$$1-path is interesting in its own right as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths, we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped$$\ell _1$$1-regularized problem offers interesting theoretical properties and practical compromise between the$$\ell _0$$0-path and the$$\ell _1$$1-path. Indeed, while the$$\ell _0$$0-path is computationally prohibitive and greatly handicapped by the repeated solution of mixed-integer nonlinear programs, the quality of$$\ell _1$$1-path, in terms of the two criteria—loss and sparsity—in the estimation objective, is inferior to the capped$$\ell _1$$1-path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.

     
    more » « less