NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fragile Complexity of Comparison-Based Algorithms

https://doi.org/10.4230/LIPIcs.ESA.2019.2

Afshani, P.; Fagerberg, R.; Hammer, D.; Jacob, R.; Kostitsyna, I.; Meyer, U.; Penschuck, M.; Sitchinava, N. (September 2019, Proceedings of the 27th European Symposium on Algorithms)

We initiate a study of algorithms with a focus on the computational complexity of individual elements, and introduce the fragile complexity of comparison-based algorithms as the maximal number of comparisons any individual element takes part in. We give a number of upper and lower bounds on the fragile complexity for fundamental problems, including Minimum, Selection, Sorting and Heap Construction. The results include both deterministic and randomized upper and lower bounds, and demonstrate a separation between the two settings for a number of problems. The depth of a comparator network is a straight-forward upper bound on the worst case fragile complexity of the corresponding fragile algorithm. We prove that fragile complexity is a different and strictly easier property than the depth of comparator networks, in the sense that for some problems a fragile complexity equal to the best network depth can be achieved with less total work and that with randomization, even a lower fragile complexity is possible.
more » « less
Full Text Available
Data Races and the Discrete Resource-time Tradeoff Problem with Resource Reuse over Paths

https://doi.org/10.1145/3323165.3323209

Das, Rathish; Tsai, Shih-Yu; Duppala, Sharmila; Lynch, Jayson; Arkin, Esther M.; Chowdhury, Rezaul; Mitchell, Joseph S.; Skiena, Steven (June 2019, 31st ACM Symposium on Parallelism in Algorithms and Architectures)

A determinacy race occurs if two or more logically parallel instructions access the same memory location and at least one of them tries to modify its content. Races are often undesirable as they can lead to nondeterministic and incorrect program behavior. A data race is a special case of a determinacy race which can be eliminated by associating a mutual-exclusion lock with the memory location in question or allowing atomic accesses to it. However, such solutions can reduce parallelism by serializing all accesses to that location. For associative and commutative updates to a memory cell, one can instead use a reducer, which allows parallel race-free updates at the expense of using some extra space. More extra space usually leads to more parallel updates, which in turn contributes to potentially lowering the overall execution time of the program. We start by asking the following question. Given a fixed budget of extra space for mitigating the cost of races in a parallel program, which memory locations should be assigned reducers and how should the space be distributed among those reducers in order to minimize the overall running time?We argue that under reasonable conditions the races of a program can be captured by a directed acyclic graph (DAG), with nodes representing memory cells and arcs representing read-write dependencies between cells. We then formulate our original question as an optimization problem on this DAG. We concentrate on a variation of this problem where space reuse among reducers is allowed by routing every unit of extra space along a (possibly different) source to sink path of the DAG and using it in the construction of multiple (possibly zero) reducers along the path. We consider two different ways of constructing a reducer and the corresponding duration functions (i.e., reduction time as a function of space budget).
more » « less
Full Text Available
Engineering a High-Performance GPU B-Tree

https://doi.org/10.1145/3293883.3295706

Awad, Muhammad A.; Ashkiani, Saman; Johnson, Rob; Farach-Colton, Martín; Owens, John D. (February 2019, Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more than roughly 100k. Because we cache the upper levels of the tree, we achieve lookup throughput that exceeds the DRAM bandwidth of the GPU. We demonstrate that the key limiter of performance on a GPU is contention and describe the design choices that allow us to achieve this high performance.
more » « less
Full Text Available
Anagram-Free Chromatic Number Is Not Pathwidth-Bounded

https://doi.org/10.1007/978-3-030-00256-5_8

Carmi, Paz; Dujmović, Vida; Morin, Pat (September 2018, Graph-Theoretic Concepts in Computer Science)

Full Text Available
Adaptive MapReduce Similarity Joins

https://doi.org/10.1145/3206333.3206340

McCauley, Samuel; Silvestri, Francesco (June 2018, 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond)

Full Text Available
Analysis-driven Engineering of Comparison-based Sorting Algorithms on GPUs

https://doi.org/10.1145/3205289.3205298

Karsin, Ben; Weichert, Volker; Casanova, Henri; Iacono, John; Sitchinava, Nodari (June 2018, Proceedings of the 2018 International Conference on Supercomputing (ICS))

We study the relationship between memory accesses, bank conflicts, thread multiplicity (also known as over-subscription) and instruction-level parallelism in comparison-based sort- ing algorithms for Graphics Processing Units (GPUs). We experimentally validate a proposed formula that relates these parameters with asymptotic analysis of the number of mem- ory accesses by an algorithm. Using this formula we analyze and compare several GPU sorting algorithms, identifying key performance bottlenecks in each one of them. Based on this analysis we propose a GPU-efficient multiway merge- sort algorithm, GPU-MMS, which minimizes or eliminates these bottlenecks and balances various limiting factors for specific hardware. We realize an implementation of GPU-MMS and compare it to sorting algorithm implementations in state-of-the-art GPU libraries on three GPU architectures. Despite these library implementations being highly optimized, we find that GPU-MMS outperforms them by an average of 21% for random integer inputs and 14% for random key-value pairs.
more » « less
Full Text Available

Search for: All records