Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Kumar, Amit ; RonZewi, Noga (Ed.)Estimating the size of the union of a stream of sets S₁, S₂, …, S_M where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the wellstudied 𝖥₀ estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets S_i can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set S_i is an axisparallel rectangle in ddimensional spaces (Ω = [Δ]^d where [Δ] := {1, … ,Δ} and Δ ∈ ℕ). Recently, Meel, Chakraborty, and Vinodchandran (PODS21, PODS22) designed a streaming algorithm for (ε,δ)estimation of the size of the union of set streams over Delphic family with space and update time complexity O((log³Ω)/ε² ⋅ log 1/δ) and Õ((log⁴Ω)/ε² ⋅ log 1/(δ)), respectively. This work presents a new, samplingbased algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Õ((log²Ω)/ε² ⋅ log 1/(δ)). This improves the space complexity bound by a logΩ factor and update time complexity bound by a log² Ω factor. A critical question is whether quadratic dependence of logΩ on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in logΩ and update time poly(logΩ)? While this appears technically challenging, we show that establishing a lower bound of ω(logΩ) with poly(logΩ) update time is beyond the reach of current techniques. Specifically, we show that under certain hardtoprove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity O(logΩ) and update time poly(log(Ω)). Thus, establishing a space lower bound of ω(logΩ) will lead to breakthrough complexity class separation results.more » « lessFree, publiclyaccessible full text available September 16, 2025

Free, publiclyaccessible full text available July 21, 2025

In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the Right To Be Forgotten for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management.
This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as f_{a}, is incremented to f_{a}+1. When the update Δ=⊥, occurs, f_{a}is set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models.
This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F_{1}(sum of the frequencies of elements) of the stream is a nontrivial problem in this model. Based on the initial investigations, we focus on a modified model which we call αRFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F_{0}(number of distinct elements) and F_{1}. We present algorithms and establish almostmatching lower bounds on the space complexity for these computational tasks.
Free, publiclyaccessible full text available May 10, 2025 
Oshman, Rotem (Ed.)In this work, we study the class of problems solvable by (deterministic) Adaptive Massively Parallel Computations in constant rounds from a computational complexity theory perspective. A language L is in the class AMPC⁰ if, for every ε > 0, there is a deterministic AMPC algorithm running in constant rounds with a polynomial number of processors, where the local memory of each machine s = O(N^ε). We prove that the spacebounded complexity class ReachUL is a proper subclass of AMPC⁰. The complexity class ReachUL lies between the wellknown spacebounded complexity classes Deterministic Logspace (DLOG) and Nondeterministic Logspace (NLOG). In contrast, we establish that it is unlikely that PSPACE admits AMPC algorithms, even with polynomially many rounds. We also establish that showing PSPACE is a subclass of nonuniformAMPC with polynomially many rounds leads to a significant separation result in complexity theory, namely PSPACE is a proper subclass of EXP^{Σ₂^{𝖯}}.more » « less

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSP’s and computation of zeroth frequency moments (
F _{0}) for data streams.Our investigations lead us to observe a striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and
F _{0}computation. We design a recipe for translating algorithms developed forF _{0}estimation to model counting, resulting in new algorithms for model counting. We also provide a recipe for transforming sampling algorithm over streams to constraint sampling algorithms. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framingF _{0}estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to casespecific analysis in prior works. In particular, our view yields an algorithm for multidimensional range efficientF _{0}estimation with a simpler analysis. 
Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (
F _{0}) of a data stream.Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to casespecific analysis in prior works.

Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, minmax or access control semiring, tropical semiring, and fuzzy semiring. The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula phi over n variable and a semiring (K,+, . ,0,1), find the maximum value over all possible interpretations of phi over K. This can be seen as a generalization of the wellknown satisfiability problem (a propositional formula is satisfiable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to find an interpretation that achieves the maximum value. In this work, we first focus on these optimization problems over the Viterbi semiring, which we call optConfVal and optConf. We first show that for general propositional formulas in negation normal form, optConfVal and optConf are in FP^NP. We then investigate optConf when the input formula phi is represented in the conjunctive normal form. For CNF formulae, we first derive an upper bound on the value of optConf as a function of the number of maximum satisfiable clauses. In particular, we show that if r is the maximum number of satisfiable clauses in a CNF formula with m clauses, then its optConf value is at most 1/4^(mr). Building on this we establish that optConf for CNF formulae is hard for the complexity class FP^NP[log]. We also design polynomialtime approximation algorithms and establish an inapproximability for optConfVal. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings.more » « less

Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #Pcomplete. This is in stark contrast with other distance measures such as KL, Chisquare, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomialtime deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NPhard.

Given a data stream 𝒟 = ⟨ a₁, a₂, …, a_m ⟩ of m elements where each a_i ∈ [n], the Distinct Elements problem is to estimate the number of distinct elements in 𝒟. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it. All the current stateoftheart algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, samplingbased spaceefficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.more » « less

Given a family of sets (S1, S2,... SM) over a universe Ω, estimating the size of their union in the data streaming model is a fundamental computational problem with a wide variety of applications. The holy grail in the field of streaming is to seek design of algorithms that achieve (ε, δ)approximation with poly(log Ω, ε1, log δ1) space and update time complexity. Earlier investigations achieve algorithms with desired space and update time complexity for restricted cases such as singletons (Distinct Elements problem), onedimensional ranges, arithmetic progressions, and subcubes. However, techniques used in these works fail for many other simple structured sets. A prominent example is that of Klee's Measure Problem (KMP), wherein every set Si is represented by an axisparallel rectangle in ddimensional spaces. Despite extensive prior work, the bestknown streaming algorithms for many of these cases depend on the size of the stream, and therefore the problem of whether there exists a streaming algorithm for estimations of size of the union of sets with poly(log Ω, ε1, log δ1) space and update time complexity has remained open. In this work, we focus on certain general families of sets called Delphic families (which allows efficient membership, sampling, and cardinality queries). Such families of sets capture several wellknown problems, including KMP, test coverage, and hypervolume estimation. The primary contribution of our work is to resolve the abovementioned open problem for streams over Delphic families. In particular, we design the first streaming algorithm for estimating ⋃i=1M Si with poly(log Ω, ε1, log δ1) space and update time complexity (independent of M, the length of the stream) when each Si is a member from a Delphic family of sets. We further generalize our results to larger families of sets, called approximateDelphic families, for which the size of a set can be known approximately but not exactly. Our results resolve two of the open problems listed in Meel, Vinodchandran, Chakraborty (PODS21).more » « less