skip to main content


This content will become publicly available on May 7, 2024

Title: The Geometry of Tree-Based Sorting
We study the connections between sorting and the binary search tree (BST) model, with an aim towards showing that the fields are connected more deeply than is currently appreciated. While any BST can be used to sort by inserting the keys one-by-one, this is a very limited relationship and importantly says nothing about parallel sorting. We show what we believe to be the first formal relationship between the BST model and sorting. Namely, we show that a large class of sorting algorithms, which includes mergesort, quicksort, insertion sort, and almost every instance-optimal sorting algorithm, are equivalent in cost to offline BST algorithms. Our main theoretical tool is the geometric interpretation of the BST model introduced by Demaine et al. [18], which finds an equivalence between searches on a BST and point sets in the plane satisfying a certain property. To give an example of the utility of our approach, we introduce the log-interleave bound, a measure of the information-theoretic complexity of a permutation π, which is within a lg lg n multiplicative factor of a known lower bound in the BST model; we also devise a parallel sorting algorithm with polylogarithmic span that sorts a permutation π using comparisons proportional to its log-interleave bound. Our aforementioned result on sorting and offline BST algorithms can be used to show existence of an offline BST algorithm whose cost is within a constant factor of the log-interleave bound of any permutation π.  more » « less
Award ID(s):
2119352 1901381 1919223
NSF-PAR ID:
10467348
Author(s) / Creator(s):
;
Publisher / Repository:
Dagstuhl Repository
Date Published:
ISSN:
1868-8969
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Parallel radix sorting has been extensively studied in the literature for decades. However, the most efficient implementations require auxiliary memory proportional to the input size, which can be prohibitive for large inputs. The standard serial in-place radix sorting algorithm is based on swapping each key to its correct place in the array on each level of recursion. However, it is not straightforward to parallelize this algorithm due to dependencies among the swaps. This paper presents Regions Sort, a new parallel in-place radix sorting algorithm that is efficient both in theory and in practice. Our algorithm uses a graph structure to model dependencies across elements that need to be swapped, and generates independent tasks from this graph that can be executed in parallel. For sorting $n$ integers from a range $r$, and a parameter $K$, Regions Sort requires only $O(K\log r\log n)$ auxiliary memory. Our algorithm requires $O(n\log r)$ work and $O((n/K+\log K)\log r)$ span, making it work-efficient and highly parallel. In addition, we present several optimizations that significantly improve the empirical performance of our algorithm. We compare the performance of Regions Sort to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and show that Regions Sort is faster than optimized out-of-place radix sorting and comparison sorting algorithms, and is almost always faster than the fastest publicly-available in-place sorting algorithm. 
    more » « less
  2. The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Theta(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n - factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000) 
    more » « less
  3. null (Ed.)
    The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Theta(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n - factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000 ). 
    more » « less
  4. null (Ed.)
    Maximal Independent Set (MIS) is one of the fundamental problems in distributed computing. The round (time) complexity of distributed MIS has traditionally focused on the worst-case time for all nodes to finish. The best-known (randomized) MIS algorithms take O(log n) worst-case rounds on general graphs (where n is the number of nodes). Breaking the O(log n) worst-case bound has been a longstanding open problem, while currently the best-known lower bound is [EQUATION] rounds. Motivated by the goal to reduce total energy consumption in energy-constrained networks such as sensor and ad hoc wireless networks, we take an alternative approach to measuring performance. We focus on minimizing the total (or equivalently, the average) time for all nodes to finish. It is not clear whether the currently best-known algorithms yield constant-round (or even o(log n)) node-averaged round complexity for MIS in general graphs. We posit the sleeping model, a generalization of the traditional model, that allows nodes to enter either "sleep" or "waking" states at any round. While waking state corresponds to the default state in the traditional model, in sleeping state a node is "offline", i.e., it does not send or receive messages (and messages sent to it are dropped as well) and does not incur any time, communication, or local computation cost. Hence, in this model, only rounds in which a node is awake are counted and we are interested in minimizing the average as well as the worst-case number of rounds a node spends in the awake state, besides the traditional worst-case round complexity (i.e., the rounds for all nodes to finish including both the awake and sleeping rounds). Our main result is that we show that MIS can be solved in (expected) O(1) rounds under node-averaged awake complexity measure in the sleeping model. In particular, we present a randomized distributed algorithm for MIS that has expected O(1)-rounds node-averaged awake complexity and, with high probability1 has O(log n)-rounds worst-case awake complexity and O(log3.41 n)-rounds worst-case complexity. Our work is a step towards understanding the node-averaged complexity of MIS both in the traditional and sleeping models, as well as designing energy-efficient distributed algorithms for energy-constrained networks. 
    more » « less
  5. Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric t-spanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a t-spanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂-norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)-spanner algorithm with competitive ratio O_d(ε^{-d} log n), improving the previous bound of O_d(ε^{-(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1-d}log ε^{-1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1-d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{-3/2}logε^{-1}log n), by comparing the online spanner with an instance-optimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{-d}) lower bound for the competitive ratio for online (1+ε)-spanner algorithms in ℝ^d under the L₁-norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k-1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{-1}logε^{-1})⋅ n^{1+1/k} edges and O(ε^{-1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the trade-off among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)-spanner for ultrametrics with O(ε^{-1}logε^{-1})⋅ n edges and O(ε^{-2}) lightness. 
    more » « less