Realtime decision making in IoT applications relies upon spaceefficient evaluation of queries over streaming data. To model the uncertainty in the classification of data being processed, we consider the model of probabilistic strings  sequences of discrete probability distributions over a finite set of events, and initiate the study of space complexity of streaming computation for different classes of queries over such probabilistic strings.
We first consider the problem of computing the probability that a word, sampled from the distribution defined by the probabilistic string read so far, is accepted by a given deterministic finite automaton. We show that this regular pattern matching problem can be solved using space that is only polylogarithmic in the string length (and polynomial in the size of the DFA) if we are allowed a multiplicative approximation error. Then we show how to generalize this result to quantitative queries specified by additive cost register automata  these are automata that map strings to numerical values using finite control and registers that get updated using linear transformations. Finally, we consider the case when updates in such an automaton involve tests, and in particular, when there is a counter variable that can be either incremented or decremented but decrements only apply when the counter value is nonzero. In this case, the desired answer depends on the probability distribution over the set of possible counter values that can range from 0 to n for a string of length n. Under a mild assumption, namely probabilities of the individual events are bounded away from 0 and 1, we show that there is an algorithm that can compute all n entries of this probability distribution vector to within additive 1/poly(n) error using space that is only Õ(n). In establishing these results, we introduce several new technical ideas that may prove useful for designing spaceefficient algorithms for other query models over probabilistic strings.
more »
« less
Optimal Coding Theorems in TimeBounded Kolmogorov Complexity
The classical coding theorem in Kolmogorov complexity states that if an nbit string x is sampled
with probability δ by an algorithm with prefixfree domain then K(x) ≤ log(1/δ) + O(1). In a recent
work, Lu and Oliveira [31] established an unconditional timebounded version of this result, by
showing that if x can be efficiently sampled with probability δ then rKt(x) = O(log(1/δ)) + O(log n),
where rKt denotes the randomized analogue of Levin’s Kt complexity. Unfortunately, this result is
often insufficient when transferring applications of the classical coding theorem to the timebounded
setting, as it achieves a O(log(1/δ)) bound instead of the informationtheoretic optimal log(1/δ).
Motivated by this discrepancy, we investigate optimal coding theorems in the timebounded
setting. Our main contributions can be summarised as follows.
• Efficient coding theorem for rKt with a factor of 2. Addressing a question from [31],
we show that if x can be efficiently sampled with probability at least δ then rKt(x) ≤ (2 + o(1)) ·
log(1/δ) +O(log n). As in previous work, our coding theorem is efficient in the sense that it provides
a polynomialtime probabilistic algorithm that, when given x, the code of the sampler, and δ, it
outputs, with probability ≥ 0.99, a probabilistic representation of x that certifies this rKt complexity
bound.
• Optimality under a cryptographic assumption. Under a hypothesis about the security of
cryptographic pseudorandom generators, we show that no efficient coding theorem can achieve a
bound of the form rKt(x) ≤ (2 − o(1)) · log(1/δ) + poly(log n). Under a weaker assumption, we
exhibit a gap between efficient coding theorems and existential coding theorems with nearoptimal
parameters.
• Optimal coding theorem for pKt and unconditional AntunesFortnow. We consider pKt
complexity [17], a variant of rKt where the randomness is public and the time bound is fixed. We
observe the existence of an optimal coding theorem for pKt, and employ this result to establish an
unconditional version of a theorem of Antunes and Fortnow [5] which characterizes the worstcase
running times of languages that are in average polynomialtime over all Psamplable distributions.
more »
« less
 Award ID(s):
 1811729
 NSFPAR ID:
 10366065
 Date Published:
 Journal Name:
 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)
 Page Range / eLocation ID:
 92:192:14
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Abstract In this paper, we consider the problem of noiseless nonadaptive probabilistic group testing, in which the goal is highprobability recovery of the defective set. We show that in the case of $n$ items among which $k$ are defective, the smallest possible number of tests equals $\min \{ C_{k,n} k \log n, n\}$ up to lowerorder asymptotic terms, where $C_{k,n}$ is a uniformly bounded constant (varying depending on the scaling of $k$ with respect to $n$) with a simple explicit expression. The algorithmic upper bound follows from a minor adaptation of an existing analysis of the Definite Defectives algorithm, and the algorithmindependent lower bound builds on existing works for the regimes $k \le n^{1\varOmega (1)}$ and $k = \varTheta (n)$. In sufficiently sparse regimes (including $k = o\big ( \frac{n}{\log n} \big )$), our main result generalizes that of CojaOghlan et al. (2020) by avoiding the assumption $k \le n^{1\varOmega (1)}$, whereas in sufficiently dense regimes (including $k = \omega \big ( \frac{n}{\log n} \big )$), our main result shows that individual testing is asymptotically optimal for any nonzero target success probability, thus strengthening an existing result of Aldridge (2019, IEEE Trans. Inf. Theory, 65, 2058–2061) in terms of both the error probability and the assumed scaling of $k$.more » « less

Multivariate multipoint evaluation is the problem of evaluating a multivariate polynomial, given as a coefficient vector, simultaneously at multiple evaluation points. In this work, we show that there exists a deterministic algorithm for multivariate multipoint evaluation over any finite field F that outputs the evaluations of an mvariate polynomial of degree less than d in each variable at N points in time (dm + N)1+o(1) · poly(m, d, log F) for all m ∈ N and all sufficiently large d ∈ N. A previous work of Kedlaya and Umans (FOCS 2008, SICOMP 2011) achieved the same time complexity when the number of variables m is at most d^{o(1)} and had left the problem of removing this condition as an open problem. A recent work of Bhargava, Ghosh, Kumar and Mohapatra (STOC 2022) answered this question when the underlying field is not too large and has characteristic less than d^{o(1)}. In this work, we remove this constraint on the number of variables over all finite fields, thereby answering the question of Kedlaya and Umans over all finite fields. Our algorithm relies on a nontrivial combination of ideas from three seemingly different previously knownalgorithms for multivariate multipoint evaluation, namely the algorithms of Kedlaya and Umans, that of Björklund, Kaski and Williams (IPEC 2017, Algorithmica 2019), and that of Bhargava, Ghosh, Kumar and Mohapatra, together with a result of Bombieri and Vinogradov from analytic number theory about the distribution of primes in an arithmetic progression. We also present a second algorithm for multivariate multipoint evaluation that is completely elementary and in particular, avoids the use of the Bombieri–Vinogradov Theorem. However, it requires a mild assumption that the field size is bounded by an exponentialtower in d of bounded height.more » « less

null (Ed.)This paper concerns designing distributed algorithms that are singularly optimal, i.e., algorithms that are simultaneously time and message optimal, for the fundamental leader election problem in networks. Our main result is a randomized distributed leader election algorithm for asynchronous complete networks that is essentially (up to a polylogarithmic factor) singularly optimal. Our algorithm uses O(n) messages with high probability and runs in O(log² n) time (with high probability) to elect a unique leader. The O(n) message complexity should be contrasted with the Ω(n log n) lower bounds for the deterministic message complexity of leader election algorithms (regardless of time), proven by Korach, Moran, and Zaks (TCS, 1989) for asynchronous algorithms and by Afek and Gafni (SIAM J. Comput., 1991) for synchronous networks. Hence, our result also separates the message complexities of randomized and deterministic leader election. More importantly, our (randomized) time complexity of O(log² n) for obtaining the optimal O(n) message complexity is significantly smaller than the longstanding Θ̃(n) time complexity obtained by Afek and Gafni and by Singh (SIAM J. Comput., 1997) for message optimal (deterministic) election in asynchronous networks. Afek and Gafni also conjectured that Θ̃(n) time would be optimal for messageoptimal asynchronous algorithms. Our result shows that randomized algorithms are significantly faster. Turning to synchronous complete networks, Afek and Gafni showed an essentially singularly optimal deterministic algorithm with O(log n) time and O(n log n) messages. Ramanathan et al. (Distrib. Comput. 2007) used randomization to improve the message complexity, and showed a randomized algorithm with O(n) messages but still with O(log n) time (with failure probability O(1 / log^{Ω(1)}n)). Our second result shows that synchronous complete networks admit a tightly singularly optimal randomized algorithm, with O(1) time and O(n) messages (both bounds are optimal). Moreover, our algorithm’s time bound holds with certainty, and its message bound holds with high probability, i.e., 11/n^c for constant c. Our results demonstrate that leader election can be solved in a simultaneously message and timeefficient manner in asynchronous complete networks using randomization. It is open whether this is possible in asynchronous general networks.more » « less

We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples. Our goal is to give algorithms with polynomial time and sample complexity in a nonparametric setting. Tian and Pearl (AAAI ’02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose 𝒫 is a causal model on a set V of n observable variables with respect to a given causal graph G, and let do(x) be an identifiable intervention on a variable X. We show that assuming that G has bounded indegree and bounded ccomponents (k) and that the observational distribution satisfies a strong positivity condition: (i) [Evaluation] There is an algorithm that outputs with probability 2/3 an evaluator for a distribution P^ that satisfies TV(P(V  do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The evaluator can return in O(n) time the probability P^(v) for any assignment v to V. (ii) [Sampling] There is an algorithm that outputs with probability 2/3 a sampler for a distribution P^ that satisfies TV(P(V  do(x)), P^(V)) < eps using m=O (n/eps^2) samples from P and O(mn) time. The sampler returns an iid sample from P^ with probability 1 in O(n) time. We extend our techniques to estimate P(Y  do(x)) for a subset Y of variables of interest. We also show lower bounds for the sample complexity, demonstrating that our sample complexity has optimal dependence on the parameters n and eps, as well as if k=1 on the strong positivity parameter.more » « less