A High-Performance Distributed Relational Database System for Scalable OLAP Processing

Arnold, Jason; Glavic, Boris; Raicu, Ioan

doi:10.1109/IPDPS.2019.00083

Citation Details

A High-Performance Distributed Relational Database System for Scalable OLAP Processing

We present HRDBMS, a novel distributed shared-nothing database system developed with the goal of improving scalability of MPP databases based on a principled combination of techniques from MPP and Big Data systems with novel communication and work-distribution techniques. HRDBMS runs on a custom distributed and asynchronous execution engine that features highly parallelized operator implementations. The system features a cost-based optimization framework, user-defined data partitioning, locality-aware query execution, a non-blocking and hierarchical shuffle, and data skipping based on caching predicate matches. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS’s scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases (Greenplum). more »

Award ID(s):: 1640864

PAR ID:: 10129192

Author(s) / Creator(s):: Arnold, Jason; Glavic, Boris; Raicu, Ioan

Date Published:: 2019-05-01

Journal Name:: IPDPS

Page Range / eLocation ID:: 738 to 748

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/IPDPS.2019.00083

More Like this