skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Distinct Elements in Streams: An Algorithm for the (Text) Book
Given a data stream š’Ÿ = ⟨ a₁, aā‚‚, …, a_m ⟩ of m elements where each a_i ∈ [n], the Distinct Elements problem is to estimate the number of distinct elements in š’Ÿ. Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it. All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.  more » « less
Award ID(s):
2130608
PAR ID:
10461957
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
30th Annual European Symposium on Algorithms (ESA 2022)
Volume:
244
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally F p -estimation, F p -heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust (1+ε)-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a poly(log n , 1/ε) multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios. 
    more » « less
  2. Constraint satisfaction problems (CSPs) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred independently and with little interaction between them. In this work, we seek to investigate whether bridging the seeming communication gap between the two communities may pave the way to richer fundamental insights. To this end, we focus on two foundational problems: model counting for CSPs and the computation of the number of distinct elements in a data stream, also known as the zeroth frequency moment (F0) of a data stream. Our investigations lead us to observe striking similarity in the core techniques employed in the algorithmic frameworks that have evolved separately for model counting and distinct elements computation. We design a recipe for the translation of algorithms developed for distinct elements estimation to that of model counting, resulting in new algorithms for model counting. We then observe that algorithms in the context of distributed streaming can be transformed into distributed algorithms for model counting. We next turn our attention to viewing streaming from the lens of counting and show that framing distinct elements estimation as a special case of #DNF counting allows us to obtain a general recipe for a rich class of streaming problems, which had been subjected to case-specific analysis in prior works. 
    more » « less
  3. Georgiadis, Loukas (Ed.)
    We provide and study several algorithms for sorting an array of n comparable distinct elements subject to probabilistic comparison errors. In this model, the comparison of two elements returns the wrong answer according to a fixed probability, p_e < 1/2, and otherwise returns the correct answer. The dislocation of an element is the distance between its position in a given (current or output) array and its position in a sorted array. There are various algorithms that can be utilized for sorting or near-sorting elements subject to probabilistic comparison errors, but these algorithms are not data oblivious because they all make heavy use of noisy binary searching. In this paper, we provide new methods for sorting with comparison errors that are data oblivious while avoiding the use of noisy binary search methods. In addition, we experimentally compare our algorithms and other sorting algorithms. 
    more » « less
  4. Santhanam, Rahul (Ed.)
    Adversarially robust streaming algorithms are required to process a stream of elements and produce correct outputs, even when each stream element can be chosen as a function of earlier algorithm outputs. As with classic streaming algorithms, which must only be correct for the worst-case fixed stream, adversarially robust algorithms with access to randomness can use significantly less space than deterministic algorithms. We prove that for the Missing Item Finding problem in streaming, the space complexity also significantly depends on how adversarially robust algorithms are permitted to use randomness. (In contrast, the space complexity of classic streaming algorithms does not depend as strongly on the way randomness is used.) For Missing Item Finding on streams of length š“ with elements in {1,…,n}, and ≤ 1/poly(š“) error, we show that when š“ = O(2^√{log n}), "random seed" adversarially robust algorithms, which only use randomness at initialization, require š“^Ī©(1) bits of space, while "random tape" adversarially robust algorithms, which may make random decisions at any time, may use O(polylog(š“)) random bits. When š“ is between n^Ī©(1) and O(√n), "random tape" adversarially robust algorithms need š“^Ī©(1) space, while "random oracle" adversarially robust algorithms, which can read from a long random string for free, may use O(polylog(š“)) space. The space lower bound for the "random seed" case follows, by a reduction given in prior work, from a lower bound for pseudo-deterministic streaming algorithms given in this paper. 
    more » « less
  5. We initiate the study of biologically-inspired spiking neural networks from the perspective of streaming algorithms. Like computers, human brains face memory limitations, which pose a significant obstacle when processing large scale and dynamically changing data. In computer science, these challenges are captured by the well-known streaming model, which can be traced back to Munro and Paterson `78 and has had significant impact in theory and beyond. In the classical streaming setting, one must compute a function f of a stream of updates š’® = {u₁,…,u_m}, given restricted single-pass access to the stream. The primary complexity measure is the space used by the algorithm. In contrast to the large body of work on streaming algorithms, relatively little is known about the computational aspects of data processing in spiking neural networks. In this work, we seek to connect these two models, leveraging techniques developed for streaming algorithms to better understand neural computation. Our primary goal is to design networks for various computational tasks using as few auxiliary (non-input or output) neurons as possible. The number of auxiliary neurons can be thought of as the "space" required by the network. Previous algorithmic work in spiking neural networks has many similarities with streaming algorithms. However, the connection between these two space-limited models has not been formally addressed. We take the first steps towards understanding this connection. On the upper bound side, we design neural algorithms based on known streaming algorithms for fundamental tasks, including distinct elements, approximate median, and heavy hitters. The number of neurons in our solutions almost match the space bounds of the corresponding streaming algorithms. As a general algorithmic primitive, we show how to implement the important streaming technique of linear sketching efficiently in spiking neural networks. On the lower bound side, we give a generic reduction, showing that any space-efficient spiking neural network can be simulated by a space-efficient streaming algorithm. This reduction lets us translate streaming-space lower bounds into nearly matching neural-space lower bounds, establishing a close connection between the two models. 
    more » « less