skip to main content

Search for: All records

Creators/Authors contains: "Farach-Colton, Martin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available January 1, 2024
  2. Free, publicly-accessible full text available January 1, 2024
  3. Free, publicly-accessible full text available January 1, 2024
  4. Free, publicly-accessible full text available March 25, 2024
  5. Free, publicly-accessible full text available November 1, 2023
  6. Free, publicly-accessible full text available June 10, 2023
  7. A data structure A is said to be dynamically optimal over a class of data structures C if A is constant- competitive with every data structure C ∈ C. Much of the research on binary search trees in the past forty years has focused on studying dynamic optimality over the class of binary search trees that are modified via rotations (and indeed, the question of whether splay trees are dynamically optimal has gained notoriety as the so-called dynamic-optimality conjecture). Recently, researchers have extended this to consider dynamic optimality over certain classes of external-memory search trees. In particular, Demaine, Iacono, Koumoutsos, and Langerman propose a class of external-memory trees that support a notion of tree rotations, and then give an elegant data structure, called the Belga B-tree, that is within an O(log log N )-factor of being dynamically optimal over this class. In this paper, we revisit the question of how dynamic optimality should be defined in external memory. A defining characteristic of external-memory data structures is that there is a stark asymmetry between queries and inserts/updates/deletes: by making the former slightly asymptotically slower, one can make the latter significantly asymptotically faster (even allowing for operations with sub-constant amortized I/Os). Thismore »asymmetry makes it so that rotation-based search trees are not optimal (or even close to optimal) in insert/update/delete-heavy external-memory workloads. To study dynamic optimality for such workloads, one must consider a different class of data structures. The natural class of data structures to consider are what we call buffered-propagation trees. Such trees can adapt dynamically to the locality properties of an input sequence in order to optimize the interactions between different inserts/updates/deletes and queries. We also present a new form of beyond-worst-case analysis that allows for us to formally study a continuum between static and dynamic optimality. Finally, we give a novel data structure, called the Jεllo Tree, that is statically optimal and that achieves dynamic optimality for a large natural class of inputs defined by our beyond-worst-case analysis.« less
  8. For nearly six decades, the central open question in the study of hash tables has been to determine the optimal achievable tradeoff curve between time and space. State-of-the-art hash tables offer the following guarantee: If keys/values are Θ(logn) bits each, then it is possible to achieve constant-time insertions/deletions/queries while wasting only O(loglogn) bits of space per key when compared to the information-theoretic optimum. Even prior to this bound being achieved, the target of O(log log n) wasted bits per key was known to be a natural end goal, and was proven to be optimal for a number of closely related problems (e.g., stable hashing, dynamic retrieval, and dynamically-resized filters). This paper shows that O(log log n) wasted bits per key is not the end of the line for hashing. In fact, for any k ∈ [log∗ n], it is possible to achieve O(k)-time insertions/deletions, O(1)-time queries, and O(log(k) n) = Ologlog···logn 􏰟 􏰞􏰝 􏰠 k wasted bits per key (all with high probability in n). This means that, each time we increase inser- tion/deletion time by an additive constant, we reduce the wasted bits per key exponentially. We further show that this tradeoff curve is the best achievable by anymore »of a large class of hash tables, including any hash table designed using the current framework for making constant-time hash tables succinct. Our results hold not just for fixed-capacity hash tables, but also for hash tables that are dynamically resized (this is a fundamental departure from what is possible for filters); and for hash tables that store very large keys/values, each of which can be up to no(1) bits (this breaks with the conventional wisdom that larger keys/values should lead to more wasted bits per key). For very small keys/values, we are able to tighten our bounds to o(1) wasted bits per key, even when k = O(1). Building on this, we obtain a constant-time dynamic filter that uses n􏰕logε−1􏰖+nloge+o(n) bits of space for a wide choice of« less
  9. Finding the connected components of a graph is a fundamental prob- lem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge inser- tions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both inser- tions and deletions) per second, even when the underlying graph is farmore »too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.« less