skip to main content


Title: Joinable Parallel Balanced Binary Trees

In this article, we show how a single function,join, can be used to implement parallelbalanced binary search trees(BSTs) simply and efficiently. Based onjoin, our approach applies to multiple balanced tree data structures, and a variety of functions for ordered sets and maps. We describe our technique as an algorithmic framework calledjoin-based algorithms. We show that thejoinfunction fully captures what is needed for rebalancing trees for a variety of tree algorithms, as long as the balancing scheme satisfies certain properties, which we refer to asjoinabletrees. We discuss four balancing schemes that are joinable: AVL trees, red-black trees, weight-balanced trees, and treaps. We present a variety of tree algorithms that apply to joinable trees, includinginsert,delete,union,intersection,difference,split,range,filter, and so on, most of them also parallel. These algorithms are generic across balancing schemes. Many algorithms are optimal in the comparison model, and we provide a general proof to show the efficiency in work for joinable trees. The algorithms are highly parallel, all with polylogarithmic span (parallel dependence). Specifically, the set-set operationsunion,intersection, anddifferencehave work\( O(m\log (\frac{n}{m}+1)) \)and polylogarithmic span for input set sizes\( n \)and\( m\le n \).

We implemented and tested our algorithms on the four balancing schemes. In general, all four schemes have quite similar performance, but the weight-balanced tree slightly outperforms the others. They have the same speedup characteristics, getting around 73\( \times \)speedup on 72 cores (144 hyperthreads). Experimental results also show that our implementation outperforms existing parallel implementations, and our sequential version achieves close or much better performance than the sequential merging algorithm in C++ Standard Template Library (STL) on various input sizes.

 
more » « less
Award ID(s):
1910030
NSF-PAR ID:
10479472
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Parallel Computing
Volume:
9
Issue:
2
ISSN:
2329-4949
Page Range / eLocation ID:
1 to 41
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of approximating maximum Nash social welfare (NSW) when allocatingmindivisible items amongnasymmetric agents with submodular valuations. TheNSWis a well-established notion of fairness and efficiency, defined as the weighted geometric mean of agents’ valuations. For special cases of the problem with symmetric agents and additive(-like) valuation functions, approximation algorithms have been designed using approaches customized for these specific settings, and they fail to extend to more general settings. Hence, no approximation algorithm with a factor independent ofmwas known either for asymmetric agents with additive valuations or for symmetric agents beyond additive(-like) valuations before this work.

    In this article, we extend our understanding of theNSWproblem to far more general settings. Our main contribution is two approximation algorithms for asymmetric agents with additive and submodular valuations. Both algorithms are simple to understand and involve non-trivial modifications of a greedy repeated matchings approach. Allocations of high-valued items are done separately by un-matching certain items and re-matching them by different processes in both algorithms. We show that these approaches achieve approximation factors ofO(n) andO(nlogn) for additive and submodular cases, independent of the number of items. For additive valuations, our algorithm outputs an allocation that also achieves the fairness property of envy-free up to one item (EF1).

    Furthermore, we show that theNSWproblem under submodular valuations is strictly harder than all currently known settings with an\(\frac{\mathrm{e}}{\mathrm{e}-1}\)factor of the hardness of approximation, even for constantly many agents. For this case, we provide a different approximation algorithm that achieves a factor of\(\frac{\mathrm{e}}{\mathrm{e}-1}\), hence resolving it completely.

     
    more » « less
  2. Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulationXon a directed graphGinto weighted source-to-sink paths whose weighted sum equalsX. We show that, for acyclic graphs, considering thewidthof the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class ofwidth-stablegraphs, for which a popular heuristic is aO(logVal(X))-approximation (Val(X) being the total flow ofX), and strengthen its worst-case approximation ratio from\(\Omega (\sqrt {m})\)to Ω (m/logm) for sparse graphs, wheremis the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a (⌈ log ‖ X ‖ ⌉ +1)-approximation (‖X‖ being the maximum absolute value ofXon any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations (‖X‖ ≤ 1), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.

     
    more » « less
  3. Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a\(\uparrow 1.9\% \)higher ImageNet top-1 accuracy under the same FLOPs and a\(\uparrow 3.7\% \)better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “can these scaling strategies be transferred across different real hardware devices?”; and (2) “can these scaling strategies be transferred to different ViT variants and tasks?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from\(74.6\% \)to\(76.7\% \)(\(\uparrow 2.1\% \)) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by\(\uparrow 0.7\% \)under a similar throughput on a V100 GPU.

     
    more » « less
  4. Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed.

    This article introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to1 - o(1)while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are Θ (logn) bits, the space guarantees that Iceberg hashing offers, namely that it uses at most\(\log \binom{|U|}{n} + O(n \log \ \text{log} n)\)bits to storenitems from a universeU, matches a lower bound by Demaine et al. that applies to any stable hash table.

    Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design.

     
    more » « less
  5. A collection of sets displays aproximity gapwith respect to some property if for every set in the collection, either (i) all members areδ-close to the property in relative Hamming distance or (ii) only a tiny fraction of members areδ-close to the property. In particular, no set in the collection has roughly half of its membersδ-close to the property and the othersδ-far from it.

    We show that the collection of affine spaces displays a proximity gap with respect to Reed–Solomon (RS) codes, even over small fields, of size polynomial in the dimension of the code, and the gap applies to anyδsmaller than the Johnson/Guruswami–Sudan list-decoding bound of the RS code. We also show near-optimal gap results, over fields of (at least)linearsize in the RS code dimension, forδsmaller than the unique decoding radius. Concretely, ifδis smaller than half the minimal distance of an RS code\(V\subset {\mathbb {F}}_q^n \), every affine space is either entirelyδ-close to the code, or alternatively at most an (n/q)-fraction of it isδ-close to the code. Finally, we discuss several applications of our proximity gap results to distributed storage, multi-party cryptographic protocols, and concretely efficient proof systems.

    We prove the proximity gap results by analyzing the execution of classical algebraic decoding algorithms for Reed–Solomon codes (due to Berlekamp–Welch and Guruswami–Sudan) on aformal elementof an affine space. This involves working with Reed–Solomon codes whose base field is an (infinite) rational function field. Our proofs are obtained by developing an extension (to function fields) of a strategy of Arora and Sudan for analyzing low-degree tests.

     
    more » « less