NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

X-OpenMP — eXtreme fine-grained tasking using lock-less work stealing

https://doi.org/10.1016/j.future.2024.05.019

Nookala, Poornima; Chard, Kyle; Raicu, Ioan (October 2024, Future Generation Computer Systems)

Full Text Available
SCIPIS: Scalable and concurrent persistent indexing and search in high-end computing systems

https://doi.org/10.1016/j.jpdc.2024.104878

Orhean, Alexandru Iulian; Giannakou, Anna; Ramakrishnan, Lavanya; Chard, Kyle; Glavic, Boris; Raicu, Ioan (July 2024, Journal of Parallel and Distributed Computing)

Full Text Available
Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

https://doi.org/10.1109/MASCOTS53633.2021.9614292

Nookala, Poornima; Dinda, Peter; Hale, Kyle C.; Chard, Kyle; Raicu, Ioan (November 2021, Proceedings of the 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '21))

Enabling efficient fine-grained task parallelism is a significant challenge for hardware platforms with increasingly many cores. Existing techniques do not scale to hundreds of threads due to the high cost of synchronization in concurrent data structures. To overcome these limitations we present XQueue, a novel lock-less concurrent queuing system with relaxed ordering semantics that is geared towards realizing scalability up to hundreds of concurrent threads. We demonstrate the scalability of XQueue using microbenchmarks and show that XQueue can deliver concurrent operations with latencies as low as 110 cycles at scales of up to 192 cores (up to 6900× improvement compared to traditional synchronization mechanisms) across our diverse hardware, including x86, ARM, and Power9. The reduced latency allows XQueue to provide orders of magnitude (3300×) better throughput that existing techniques. To evaluate the real-world benefits of XQueue, we integrated XQueue with LLVM OpenMP and evaluated five unmodified benchmarks from the Barcelona OpenMP Task Suite (BOTS) as well as a graph traversal benchmark from the GAP benchmark suite. We compared the XQueue-enabled LLVM OpenMP implementation with the native LLVM and GNU OpenMP versions. Using fine-grained task workloads, XQueue can deliver 4× to 6× speedup compared to native GNU OpenMP and LLVM OpenMP in many cases, with speedups as high as 116× in some cases.
more » « less
Full Text Available
A High-Performance Distributed Relational Database System for Scalable OLAP Processing

https://doi.org/10.1109/IPDPS.2019.00083

Arnold, Jason; Glavic, Boris; Raicu, Ioan (May 2019, IPDPS)

We present HRDBMS, a novel distributed shared-nothing database system developed with the goal of improving scalability of MPP databases based on a principled combination of techniques from MPP and Big Data systems with novel communication and work-distribution techniques. HRDBMS runs on a custom distributed and asynchronous execution engine that features highly parallelized operator implementations. The system features a cost-based optimization framework, user-defined data partitioning, locality-aware query execution, a non-blocking and hierarchical shuffle, and data skipping based on caching predicate matches. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS’s scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases (Greenplum).
more » « less
Full Text Available

Search for: All records