skip to main content


Search for: All records

Award ID contains: 2130608

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Kumar, Amit ; Ron-Zewi, Noga (Ed.)
    Estimating the size of the union of a stream of sets S₁, S₂, …, S_M where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the well-studied 𝖥₀ estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets S_i can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set S_i is an axis-parallel rectangle in d-dimensional spaces (Ω = [Δ]^d where [Δ] := {1, … ,Δ} and Δ ∈ ℕ). Recently, Meel, Chakraborty, and Vinodchandran (PODS-21, PODS-22) designed a streaming algorithm for (ε,δ)-estimation of the size of the union of set streams over Delphic family with space and update time complexity O((log³|Ω|)/ε² ⋅ log 1/δ) and Õ((log⁴|Ω|)/ε² ⋅ log 1/(δ)), respectively. This work presents a new, sampling-based algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Õ((log²|Ω|)/ε² ⋅ log 1/(δ)). This improves the space complexity bound by a log|Ω| factor and update time complexity bound by a log² |Ω| factor. A critical question is whether quadratic dependence of log|Ω| on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in log|Ω| and update time poly(log|Ω|)? While this appears technically challenging, we show that establishing a lower bound of ω(log|Ω|) with poly(log|Ω|) update time is beyond the reach of current techniques. Specifically, we show that under certain hard-to-prove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity O(log|Ω|) and update time poly(log(|Ω|)). Thus, establishing a space lower bound of ω(log|Ω|) will lead to break-through complexity class separation results. 
    more » « less
    Free, publicly-accessible full text available September 16, 2025
  2. Free, publicly-accessible full text available July 21, 2025
  3. In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the Right To Be Forgotten for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management.

    This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as fa, is incremented to fa+1. When the update Δ=⊥, occurs, fais set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models.

    This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F1(sum of the frequencies of elements) of the stream is a non-trivial problem in this model. Based on the initial investigations, we focus on a modified model which we call α-RFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F0(number of distinct elements) and F1. We present algorithms and establish almost-matching lower bounds on the space complexity for these computational tasks.

     
    more » « less
    Free, publicly-accessible full text available May 10, 2025
  4. Oshman, Rotem (Ed.)
    In this work, we study the class of problems solvable by (deterministic) Adaptive Massively Parallel Computations in constant rounds from a computational complexity theory perspective. A language L is in the class AMPC⁰ if, for every ε > 0, there is a deterministic AMPC algorithm running in constant rounds with a polynomial number of processors, where the local memory of each machine s = O(N^ε). We prove that the space-bounded complexity class ReachUL is a proper subclass of AMPC⁰. The complexity class ReachUL lies between the well-known space-bounded complexity classes Deterministic Logspace (DLOG) and Nondeterministic Logspace (NLOG). In contrast, we establish that it is unlikely that PSPACE admits AMPC algorithms, even with polynomially many rounds. We also establish that showing PSPACE is a subclass of nonuniform-AMPC with polynomially many rounds leads to a significant separation result in complexity theory, namely PSPACE is a proper subclass of EXP^{Σ₂^{𝖯}}. 
    more » « less
  5. Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.

     
    more » « less
  6. Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, min-max or access control semiring, tropical semiring, and fuzzy semiring. The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula phi over n variable and a semiring (K,+, . ,0,1), find the maximum value over all possible interpretations of phi over K. This can be seen as a generalization of the well-known satisfiability problem (a propositional formula is satisfiable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to find an interpretation that achieves the maximum value. In this work, we first focus on these optimization problems over the Viterbi semiring, which we call optConfVal and optConf. We first show that for general propositional formulas in negation normal form, optConfVal and optConf are in FP^NP. We then investigate optConf when the input formula phi is represented in the conjunctive normal form. For CNF formulae, we first derive an upper bound on the value of optConf as a function of the number of maximum satisfiable clauses. In particular, we show that if r is the maximum number of satisfiable clauses in a CNF formula with m clauses, then its optConf value is at most 1/4^(m-r). Building on this we establish that optConf for CNF formulae is hard for the complexity class FP^NP[log]. We also design polynomial-time approximation algorithms and establish an inapproximability for optConfVal. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings. 
    more » « less
  7. Given a data stream 𝒟 = ⟨ a₁, a₂, …, a_m ⟩ of m elements where each a_i ∈ [n], the Distinct Elements problem is to estimate the number of distinct elements in 𝒟. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it. All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory. 
    more » « less
  8. Stefano Leonardi and Anupam Gupta (Ed.)
    A probabilistic algorithm A is pseudodeterministic if, on every input, there exists a canonical value that is output with high probability. If the algorithm outputs one of k canonical values with high probability, then it is called a k-pseudodeterministic algorithm. In the study of pseudodeterminism, the Acceptance Probability Estimation Problem (APEP), which is to additively approximate the acceptance probability of a Boolean circuit, is emerging as a central computational problem. This problem admits a 2-pseudodeterministic algorithm. Recently, it was shown that a pseudodeterministic algorithm for this problem would imply that any multi-valued function that admits a k-pseudodeterministic algorithm for a constant k (including approximation algorithms) also admits a pseudodeterministic algorithm (Dixon, Pavan, Vinodchandran; ITCS 2021). The contribution of the present work is two-fold. First, as our main conceptual contribution, we establish that the existence of a pseudodeterministic algorithm for APEP is fundamentally related to the gap between probabilistic promise classes and the corresponding standard complexity classes. In particular, we show the following equivalence: APEP has a pseudodeterministic approximation algorithm if and only if every promise problem in PromiseBPP has a solution in BPP. A conceptual interpretation of this equivalence is that the algorithmic gap between 2-pseudodeterminism and pseudodeterminism is equivalent to the gap between PromiseBPP and BPP. Based on this connection, we show that designing pseudodeterministic algorithms for APEP leads to the solution of some open problems in complexity theory, including new Boolean circuit lower bounds. This equivalence also explains how multi-pseudodeterminism is connected to problems in SearchBPP. In particular, we show that if APEP has a pseudodeterministic algorithm, then every problem that admits a k(n)-pseudodeterministic algorithm (for any polynomial k) is in SearchBPP and admits a pseudodeterministic algorithm. Motivated by this connection, we also explore its connection to probabilistic search problems and establish that APEP is complete for certain notions of search problems in the context of pseudodeterminism. Our second contribution is establishing query complexity lower bounds for multi-pseudodeterministic computations. We prove that for every k ≥ 1, there exists a problem whose (k+1)-pseudodeterministic query complexity, in the uniform query model, is O(1) but has a k-pseudodeterministic query complexity of Ω(n), even in the more general nonadaptive query model. A key contribution of this part of the work is the utilization of Sperner’s lemma in establishing query complexity lower bounds. 
    more » « less
  9. Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)-approximation with poly(log |Ω|, ε-1, log δ-1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), one-dimensional ranges, arithmetic progressions, and sub-cubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axis-parallel rectangle in d-dimensional spaces. Despite extensive prior work, the best-known streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log |Ω|, ε-1, log δ-1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several well-known problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the above-mentioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating |⋃i=1M Si| with poly(log |Ω|, ε-1, log δ-1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximate-Delphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS-21). 
    more » « less
  10. Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams. 
    more » « less