skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Makhija, Neha"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e.,we want to find a formula that minimizes the number of variable repetitions. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean formula minimization is Σp2-complete, we show that the problem is NP-Complete in our case. Additionally, we identify a large class of queries that can be solved in PTIME, expanding beyond the previously known tractable cases of read-once formulas and hierarchical queries. We describe connections between factorizations, Variable Elimination Orders (VEOs), and minimal query plans. We leverage these insights to create an Integer Linear Program (ILP) that can solve the minimal factorization problem exactly. We also propose a Max-Flow Min-Cut (MFMC) based algorithm that gives an efficient approximate solution. Importantly, we show that both the Linear Programming (LP) relaxation of our ILP, and our MFMC-based algorithm are always correct for all currently known PTIME cases. Thus, we present two unified algorithms (ILP and MFMC) that can both recover all known PTIME cases in PTIME, yet also solve NP-Complete cases either exactly (ILP) or approximately (MFMC), as desired. 
    more » « less
    Free, publicly-accessible full text available May 10, 2025
  2. What is a minimal set of tuples to delete from a database in order to eliminate all query answers? This problem is called the resilience of a query and is one of the key algorithmic problems underlying various forms of reverse data management, such as view maintenance, deletion propagation and causal responsibility. A long-open question is determining the conjunctive queries (CQs) for which resilience can be solved in PTIME. We shed new light on this problem by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both previously studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (all CQs under set or bag semantics). It is also unified in that all queries and all database instances are treated with the same approach, yet the algorithm is guaranteed to terminate in PTIME for all known PTIME cases. In particular, we prove that for all known easy cases, the optimal solution to our ILP is identical to a simpler Linear Programming (LP) relaxation, which implies that standard ILP solvers return the optimal solution to the original ILP in PTIME. Our approach allows us to explore new variants and obtain new complexity results. 1) It works under bag semantics, for which we give the first dichotomy results in the problem space. 2) We extend our approach to the related problem of causal responsibility and give a more fine-grained analysis of its complexity. 3) We recover easy instances for generally hard queries, including instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture about a unified hardness criterion from PODS 2020 and prove the hardness of several queries of previously unknown complexity. 5) Experiments confirm that our findings accurately predict the asymptotic running times, and that our universal ILP is at times even quicker than a previously proposed dedicated flow algorithm. 
    more » « less