NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Feasibility of Forgetting in Data Streams

https://doi.org/10.1145/3651603

Pavan, A; Chakraborty, Sourav; Vinodchandran, N V; Meel, Kuldeep S (May 2024, Proceedings of the ACM on Management of Data)

In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a specific database or is expunged. Indeed, nations across the globe are increasingly enacting legislation to uphold the Right To Be Forgotten for individuals. Investigating computational challenges, including the formalization and implementation of this notion, is crucial due to its relevance in the domains of data privacy and management. This work introduces a new streaming model: the 'Right to be Forgotten Data Streaming Model' (RFDS model). The main feature of this model is that any element in the stream has the right to have its history removed from the stream. Formally, the input is a stream of updates of the form (a, Δ) where Δ ∈ {+, ⊥} and a is an element from a universe U. When the update Δ=+ occurs, the frequency of a, denoted as f_a, is incremented to f_a+1. When the update Δ=⊥, occurs, f_ais set to 0. This feature, which represents the forget request, distinguishes the present model from existing data streaming models. This work systematically investigates computational challenges that arise while incorporating the notion of the right to be forgotten. Our initial considerations reveal that even estimating F₁(sum of the frequencies of elements) of the stream is a non-trivial problem in this model. Based on the initial investigations, we focus on a modified model which we call α-RFDS where we limit the number of forget operations to be at most α fraction. In this modified model, we focus on estimating F₀(number of distinct elements) and F₁. We present algorithms and establish almost-matching lower bounds on the space complexity for these computational tasks.
more » « less
Full Text Available
List and Certificate Complexities in Replicable Learning

Dixon, Peter; Pavan, A; Vander_Woude, Jason; Vinodchandran, N_V (December 2023, Curran Associates, Inc.)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
We investigate replicable learning algorithms. Informally a learning algorithm is replicable if the algorithm outputs the same canonical hypothesis over multiple runs with high probability, even when different runs observe a different set of samples from the unknown data distribution. In general, such a strong notion of replicability is not achievable. Thus we consider two feasible notions of replicability called {\em list replicability} and {\em certificate replicability}. Intuitively, these notions capture the degree of (non) replicability. The goal is to design learning algorithms with optimal list and certificate complexities while minimizing the sample complexity. Our contributions are the following. 1. We first study the learning task of estimating the biases of $$d$$ coins, up to an additive error of $$\varepsilon$$, by observing samples. For this task, we design a $(d+1)$-list replicable algorithm. To complement this result, we establish that the list complexity is optimal, i.e there are no learning algorithms with a list size smaller than $d+1$ for this task. We also design learning algorithms with certificate complexity $$\tilde{O}(\log d)$$. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\varepsilon^2})$$ where $$\varepsilon$$ is the approximation error parameter (for a constant error probability). 2. In the PAC model, we show that any hypothesis class that is learnable with $$d$$-nonadaptive statistical queries can be learned via a $(d+1)$-list replicable algorithm and also via a $$\tilde{O}(\log d)$$-certificate replicable algorithm. The sample complexity of both these algorithms is $$\tilde{O}(\frac{d^2}{\nu^2})$$ where $$\nu$$ is the approximation error of the statistical query. We also show that for the concept class \dtep, the list complexity is exactly $d+1$ with respect to the uniform distribution. To establish our upper bound results we use rounding schemes induced by geometric partitions with certain properties. We use Sperner/KKM Lemma to establish the lower bound results.
more » « less
Full Text Available
Brief Announcement: Relations Between Space-Bounded and Adaptive Massively Parallel Computations

https://doi.org/10.4230/LIPIcs.DISC.2023.37

Chen, Michael; Pavan, A; Vinodchandran, N V (October 2023, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Oshman, Rotem (Ed.)
In this work, we study the class of problems solvable by (deterministic) Adaptive Massively Parallel Computations in constant rounds from a computational complexity theory perspective. A language L is in the class AMPC⁰ if, for every ε > 0, there is a deterministic AMPC algorithm running in constant rounds with a polynomial number of processors, where the local memory of each machine s = O(N^ε). We prove that the space-bounded complexity class ReachUL is a proper subclass of AMPC⁰. The complexity class ReachUL lies between the well-known space-bounded complexity classes Deterministic Logspace (DLOG) and Nondeterministic Logspace (NLOG). In contrast, we establish that it is unlikely that PSPACE admits AMPC algorithms, even with polynomially many rounds. We also establish that showing PSPACE is a subclass of nonuniform-AMPC with polynomially many rounds leads to a significant separation result in complexity theory, namely PSPACE is a proper subclass of EXP^{Σ₂^{𝖯}}.
more » « less
Full Text Available
On Approximating Total Variation Distance

https://doi.org/10.24963/ijcai.2023/387

Bhattacharyya, Arnab; Gayen, Sutanu; Meel, Kuldeep S.; Myrisiotis, Dimitrios; Pavan, A.; Vinodchandran, N. V. (August 2023, International Joint Conference on Artificial Intelligence)

Total variation distance (TV distance) is a fundamental notion of distance between probability distributions. In this work, we introduce and study the problem of computing the TV distance of two product distributions over the domain {0,1}^n. In particular, we establish the following results.1. The problem of exactly computing the TV distance of two product distributions is #P-complete. This is in stark contrast with other distance measures such as KL, Chi-square, and Hellinger which tensorize over the marginals leading to efficient algorithms.2. There is a fully polynomial-time deterministic approximation scheme (FPTAS) for computing the TV distance of two product distributions P and Q where Q is the uniform distribution. This result is extended to the case where Q has a constant number of distinct marginals. In contrast, we show that when P and Q are Bayes net distributions the relative approximation of their TV distance is NP-hard.
more » « less
Full Text Available
Constraint Optimization over Semirings

A. Pavan; Kuldeep S Meel; N. V. Vinodchandranl Arnab Bhattacharyya (February 2023, Proceedings of the AAAI Conference on Artificial Intelligence)
Williams Brian; Chen Yiling; Neville Jennifer (Ed.)
Interpretations of logical formulas over semirings (other than the Boolean semiring) have applications in various areas of computer science including logic, AI, databases, and security. Such interpretations provide richer information beyond the truth or falsity of a statement. Examples of such semirings include Viterbi semiring, min-max or access control semiring, tropical semiring, and fuzzy semiring. The present work investigates the complexity of constraint optimization problems over semirings. The generic optimization problem we study is the following: Given a propositional formula $$\varphi$$ over $$n$$ variable and a semiring $$(K,+,\cdot,0,1)$$, find the maximum value over all possible interpretations of $$\varphi$$ over $$K$$. This can be seen as a generalization of the well-known satisfiability problem (a propositional formula is satisfiable if and only if the maximum value over all interpretations/assignments over the Boolean semiring is 1). A related problem is to find an interpretation that achieves the maximum value. In this work, we first focus on these optimization problems over the Viterbi semiring, which we call \optrustval\ and \optrust. We first show that for general propositional formulas in negation normal form, \optrustval\ and {\optrust} are in $${\mathrm{FP}}^{\mathrm{NP}}$$. We then investigate {\optrust} when the input formula $$\varphi$$ is represented in the conjunctive normal form. For CNF formulae, we first derive an upper bound on the value of {\optrust} as a function of the number of maximum satisfiable clauses. In particular, we show that if $$r$$ is the maximum number of satisfiable clauses in a CNF formula with $$m$$ clauses, then its $$\optrust$$ value is at most $$1/4^{m-r}$$. Building on this we establish that {\optrust} for CNF formulae is hard for the complexity class $${\mathrm{FP}}^{\mathrm{NP}[\log]}$$. We also design polynomial-time approximation algorithms and establish an inapproximability for {\optrustval}. We establish similar complexity results for these optimization problems over other semirings including tropical, fuzzy, and access control semirings.
more » « less
Full Text Available
Pseudodeterminism: promises and lowerbounds

https://doi.org/10.1145/3519935.3520043

Dixon, Peter; Pavan, A.; Woude, Jason Vander; Vinodchandran, N. V. (July 2022, Symposium on Theory of Computing (STOC))
Stefano Leonardi and Anupam Gupta (Ed.)
A probabilistic algorithm A is pseudodeterministic if, on every input, there exists a canonical value that is output with high probability. If the algorithm outputs one of k canonical values with high probability, then it is called a k-pseudodeterministic algorithm. In the study of pseudodeterminism, the Acceptance Probability Estimation Problem (APEP), which is to additively approximate the acceptance probability of a Boolean circuit, is emerging as a central computational problem. This problem admits a 2-pseudodeterministic algorithm. Recently, it was shown that a pseudodeterministic algorithm for this problem would imply that any multi-valued function that admits a k-pseudodeterministic algorithm for a constant k (including approximation algorithms) also admits a pseudodeterministic algorithm (Dixon, Pavan, Vinodchandran; ITCS 2021). The contribution of the present work is two-fold. First, as our main conceptual contribution, we establish that the existence of a pseudodeterministic algorithm for APEP is fundamentally related to the gap between probabilistic promise classes and the corresponding standard complexity classes. In particular, we show the following equivalence: APEP has a pseudodeterministic approximation algorithm if and only if every promise problem in PromiseBPP has a solution in BPP. A conceptual interpretation of this equivalence is that the algorithmic gap between 2-pseudodeterminism and pseudodeterminism is equivalent to the gap between PromiseBPP and BPP. Based on this connection, we show that designing pseudodeterministic algorithms for APEP leads to the solution of some open problems in complexity theory, including new Boolean circuit lower bounds. This equivalence also explains how multi-pseudodeterminism is connected to problems in SearchBPP. In particular, we show that if APEP has a pseudodeterministic algorithm, then every problem that admits a k(n)-pseudodeterministic algorithm (for any polynomial k) is in SearchBPP and admits a pseudodeterministic algorithm. Motivated by this connection, we also explore its connection to probabilistic search problems and establish that APEP is complete for certain notions of search problems in the context of pseudodeterminism. Our second contribution is establishing query complexity lower bounds for multi-pseudodeterministic computations. We prove that for every k ≥ 1, there exists a problem whose (k+1)-pseudodeterministic query complexity, in the uniform query model, is O(1) but has a k-pseudodeterministic query complexity of Ω(n), even in the more general nonadaptive query model. A key contribution of this part of the work is the utilization of Sperner’s lemma in establishing query complexity lower bounds.
more » « less
Full Text Available
Model Counting Meets Distinct Elements in a Data Stream

https://doi.org/10.1145/3542700.3542721

Pavan, A.; Vinodchandran, N. V.; Bhattacharyya, Arnab; Meel, Kuldeep S. (May 2022, ACM SIGMOD Record)

Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and computation of zeroth frequency moments (F0) for data streams.
more » « less
Full Text Available

Search for: All records