skip to main content


Title: Joinable Parallel Balanced Binary Trees

In this article, we show how a single function,join, can be used to implement parallelbalanced binary search trees(BSTs) simply and efficiently. Based onjoin, our approach applies to multiple balanced tree data structures, and a variety of functions for ordered sets and maps. We describe our technique as an algorithmic framework calledjoin-based algorithms. We show that thejoinfunction fully captures what is needed for rebalancing trees for a variety of tree algorithms, as long as the balancing scheme satisfies certain properties, which we refer to asjoinabletrees. We discuss four balancing schemes that are joinable: AVL trees, red-black trees, weight-balanced trees, and treaps. We present a variety of tree algorithms that apply to joinable trees, includinginsert,delete,union,intersection,difference,split,range,filter, and so on, most of them also parallel. These algorithms are generic across balancing schemes. Many algorithms are optimal in the comparison model, and we provide a general proof to show the efficiency in work for joinable trees. The algorithms are highly parallel, all with polylogarithmic span (parallel dependence). Specifically, the set-set operationsunion,intersection, anddifferencehave work\( O(m\log (\frac{n}{m}+1)) \)and polylogarithmic span for input set sizes\( n \)and\( m\le n \).

We implemented and tested our algorithms on the four balancing schemes. In general, all four schemes have quite similar performance, but the weight-balanced tree slightly outperforms the others. They have the same speedup characteristics, getting around 73\( \times \)speedup on 72 cores (144 hyperthreads). Experimental results also show that our implementation outperforms existing parallel implementations, and our sequential version achieves close or much better performance than the sequential merging algorithm in C++ Standard Template Library (STL) on various input sizes.

 
more » « less
Award ID(s):
1910030
NSF-PAR ID:
10479472
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Parallel Computing
Volume:
9
Issue:
2
ISSN:
2329-4949
Page Range / eLocation ID:
1 to 41
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. On hypergraphs withmhyperedges andnvertices, wherepdenotes the total size of the hyperedges, we provide the following results:

    We give an algorithm that runs in\(\widetilde{O}(mn^{2k-2})\)time for finding a minimumk-cut in hypergraphs of arbitrary rank. This algorithm betters the previous best running time for the minimumk-cut problem, fork> 2.

    We give an algorithm that runs in\(\widetilde{O}(n^{\max \lbrace r,2k-2\rbrace })\)time for finding a minimumk-cut in hypergraphs of constant rankr. This algorithm betters the previous best running times for both the minimum cut and minimumk-cut problems for dense hypergraphs.

    Both of our algorithms are Monte Carlo, i.e., they return a minimumk-cut (or minimum cut) with high probability. These algorithms are obtained as instantiations of a genericbranching randomized contractiontechnique on hypergraphs, which extends the celebrated work of Karger and Stein on recursive contractions in graphs. Our techniques and results also extend to the problems of minimum hedge-cut and minimum hedge-k-cut on hedgegraphs, which generalize hypergraphs.

     
    more » « less
  2. Vision Transformer (ViT) has demonstrated promising performance in various computer vision tasks, and recently attracted a lot of research attention. Many recent works have focused on proposing new architectures to improve ViT and deploying it into real-world applications. However, little effort has been made to analyze and understand ViT’s architecture design space and its implication of hardware-cost on different devices. In this work, by simply scaling ViT’s depth, width, input size, and other basic configurations, we show that a scaled vanilla ViT model without bells and whistles can achieve comparable or superior accuracy-efficiency trade-off than most of the latest ViT variants. Specifically, compared to DeiT-Tiny, our scaled model achieves a\(\uparrow 1.9\% \)higher ImageNet top-1 accuracy under the same FLOPs and a\(\uparrow 3.7\% \)better ImageNet top-1 accuracy under the same latency on an NVIDIA Edge GPU TX2. Motivated by this, we further investigate the extracted scaling strategies from the following two aspects: (1) “can these scaling strategies be transferred across different real hardware devices?”; and (2) “can these scaling strategies be transferred to different ViT variants and tasks?”. For (1), our exploration, based on various devices with different resource budgets, indicates that the transferability effectiveness depends on the underlying device together with its corresponding deployment tool; for (2), we validate the effective transferability of the aforementioned scaling strategies obtained from a vanilla ViT model on top of an image classification task to the PiT model, a strong ViT variant targeting efficiency, as well as object detection and video classification tasks. In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from\(74.6\% \)to\(76.7\% \)(\(\uparrow 2.1\% \)) under the same 0.7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by\(\uparrow 0.7\% \)under a similar throughput on a V100 GPU.

     
    more » « less
  3. We study the problem of approximating maximum Nash social welfare (NSW) when allocatingmindivisible items amongnasymmetric agents with submodular valuations. TheNSWis a well-established notion of fairness and efficiency, defined as the weighted geometric mean of agents’ valuations. For special cases of the problem with symmetric agents and additive(-like) valuation functions, approximation algorithms have been designed using approaches customized for these specific settings, and they fail to extend to more general settings. Hence, no approximation algorithm with a factor independent ofmwas known either for asymmetric agents with additive valuations or for symmetric agents beyond additive(-like) valuations before this work.

    In this article, we extend our understanding of theNSWproblem to far more general settings. Our main contribution is two approximation algorithms for asymmetric agents with additive and submodular valuations. Both algorithms are simple to understand and involve non-trivial modifications of a greedy repeated matchings approach. Allocations of high-valued items are done separately by un-matching certain items and re-matching them by different processes in both algorithms. We show that these approaches achieve approximation factors ofO(n) andO(nlogn) for additive and submodular cases, independent of the number of items. For additive valuations, our algorithm outputs an allocation that also achieves the fairness property of envy-free up to one item (EF1).

    Furthermore, we show that theNSWproblem under submodular valuations is strictly harder than all currently known settings with an\(\frac{\mathrm{e}}{\mathrm{e}-1}\)factor of the hardness of approximation, even for constantly many agents. For this case, we provide a different approximation algorithm that achieves a factor of\(\frac{\mathrm{e}}{\mathrm{e}-1}\), hence resolving it completely.

     
    more » « less
  4. In the field of multi-agent autonomous transportation, such as automated payload delivery or highway on-ramp merging, agents routinely exchange knowledge to optimize their shared objective and adapt to environmental novelties through Cooperative Multi-Agent Reinforcement Learning (CMARL) algorithms. This knowledge exchange between agents allows these systems to operate efficiently and adapt to dynamic environments. However, this cooperative learning process is susceptible to adversarial poisoning attacks, as highlighted by contemporary research. Particularly, the poisoning attacks where malicious agents inject deceptive information camouflaged within the differential noise, a pivotal element for differential privacy (DP)-based CMARL algorithms, pose formidable challenges to identify and overcome. The consequences of not addressing this issue are far-reaching, potentially jeopardizing safety-critical operations and the integrity of data privacy in these applications. Existing research has strived to develop anomaly detection-based defense models to counteract conventional poisoning methods. Nonetheless, the recurring necessity for model offloading and retraining with labeled anomalous data undermines their practicality, considering the inherently dynamic nature of the safety-critical autonomous transportation applications. Further, it is imperative to maintain data privacy, ensure high performance, and adapt to environmental changes. Motivated by these challenges, this paper introduces a novel defense mechanism against stealthy adversarial poisoning attacks in the autonomous transportation domain, termed Reinforcing Autonomous Multi-agent Protection through Adversarial Resistance in Transportation (RAMPART). Leveraging a GAN model at each local node, RAMPART effectively filters out malicious advice in an unsupervised manner, whilst generating synthetic samples for each state-action pair to accommodate environmental uncertainties and eliminate the need for labeled training data. Our extensive experimental analysis, conducted in a Private Payload Delivery Network (PPDN) —a common application in the autonomous multi-agent transportation domain—demonstrates thatRAMPART successfully defends against a DP-exploited poisoning attack with a\(30\% \)attack ratio, achieving an F1 score of 0.852 and accuracy of\(96.3\% \)in heavy-traffic environments.

     
    more » « less
  5. Many geometry processing techniques require the solution of partial differential equations (PDEs) on manifolds embedded in\(\mathbb {R}^2 \)or\(\mathbb {R}^3 \), such as curves or surfaces. Suchmanifold PDEsoften involve boundary conditions (e.g., Dirichlet or Neumann) prescribed at points or curves on the manifold’s interior or along the geometric (exterior) boundary of an open manifold. However, input manifolds can take many forms (e.g., triangle meshes, parametrizations, point clouds, implicit functions, etc.). Typically, one must generate a mesh to apply finite element-type techniques or derive specialized discretization procedures for each distinct manifold representation. We propose instead to address such problems in a unified manner through a novel extension of theclosest point method(CPM) to handle interior boundary conditions. CPM solves the manifold PDE by solving a volumetric PDE defined over the Cartesian embedding space containing the manifold, and requires only a closest point representation of the manifold. Hence, CPM supports objects that are open or closed, orientable or not, and of any codimension. To enable support for interior boundary conditions we derive a method that implicitly partitions the embedding space across interior boundaries. CPM’s finite difference and interpolation stencils are adapted to respect this partition while preserving second-order accuracy. Additionally, we develop an efficient sparse-grid implementation and numerical solver that can scale to tens of millions of degrees of freedom, allowing PDEs to be solved on more complex manifolds. We demonstrate our method’s convergence behaviour on selected model PDEs and explore several geometry processing problems: diffusion curves on surfaces, geodesic distance, tangent vector field design, harmonic map construction, and reaction-diffusion textures. Our proposed approach thus offers a powerful and flexible new tool for a range of geometry processing tasks on general manifold representations.

     
    more » « less