skip to main content


Title: Understanding the effect of data center resource disaggregation on production DBMSs
Resource disaggregation is a new architecture for data centers in which resources like memory and storage are decoupled from the CPU, managed independently, and connected through a high-speed network. Recent work has shown that although disaggregated data centers (DDCs) provide operational benefits, applications running on DDCs experience degraded performance due to extra network latency between the CPU and their working sets in main memory. DBMSs are an interesting case study for DDCs for two main reasons: (1) DBMSs normally process data-intensive workloads and require data movement between different resource components; and (2) disaggregation drastically changes the assumption that DBMSs can rely on their own internal resource management. We take the first step to thoroughly evaluate the query execution performance of production DBMSs in disaggregated data centers. We evaluate two popular open-source DBMSs (MonetDB and PostgreSQL) and test their performance with the TPC-H benchmark in a recently released operating system for resource disaggregation. We evaluate these DBMSs with various configurations and compare their performance with that of single-machine Linux with the same hardware resources. Our results confirm that significant performance degradation does occur, but, perhaps surprisingly, we also find settings in which the degradation is minor or where DDCs actually improve performance.  more » « less
Award ID(s):
1845749
NSF-PAR ID:
10229028
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
13
Issue:
9
ISSN:
2150-8097
Page Range / eLocation ID:
1568 to 1581
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While permissioned blockchains enable a family of data center applications, existing systems suffer from imbalanced loads across compute and memory, exacerbating the underutilization of cloud resources. This paper presents FlexChain , a novel permissioned blockchain system that addresses this challenge by physically disaggregating CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation allows blockchain service providers to upgrade and expand hardware resources independently to support a wide range of smart contracts with diverse CPU and memory demands. Moreover, it ensures efficient resource utilization and hence prevents resource fragmentation in a data center. We have explored the design of XOV blockchain systems in a disaggregated fashion and developed a tiered key-value store that can elastically scale its memory and storage. Our design significantly speeds up the execution stage. We have also leveraged several techniques to parallelize the validation stage in FlexChain to further improve the overall blockchain performance. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively. 
    more » « less
  2. null (Ed.)
    Resource-disaggregated architectures have risen in popularity for large datacenters. However, prior disaggregation systems are designed for native applications; in addition, all of them require applications to possess excellent locality to be efficiently executed. In contrast, programs written in managed languages are subject to periodic garbage collection (GC), which is a typical graph workload with poor locality. Although most datacenter applications are written in managed languages, current systems are far from delivering acceptable performance for these applications. This paper presents Semeru, a distributed JVM that can dramatically improve the performance of managed cloud applications in a memory-disaggregated environment. Its design possesses three major innovations: (1) a universal Java heap, which provides a unified abstraction of virtual memory across CPU and memory servers and allows any legacy program to run without modifications; (2) a distributed GC, which offloads object tracing to memory servers so that tracing is performed closer to data; and (3) a swap system in the OS kernel that works with the runtime to swap page data efficiently. An evaluation of Semeru on a set of widely-deployed systems shows very promising results. 
    more » « less
  3. null (Ed.)
    Fast networks and the desire for high resource utilization in data centers and the cloud have driven disaggregation. Application compute is separated from storage, but this leads to high overheads when data must move over the network for simple operations on it. Alternatively, systems could allow applications to run application logic within storage via user-defined functions. Unfortunately, this ties provisioning and utilization of storage and compute resources together again. We present a new approach to executing storage-level functions in an in-memory key-value store that avoids this problem by dynamically deciding where to execute functions over data. Users write storage functions that are logically decoupled from storage, but storage servers choose where to run invocations of these functions physically. By using a server-internal cost model and observing function execution, servers choose to directly run inexpensive functions, while preferring to execute functions with high CPU-cost at client machines. We show that with this approach storage servers can reduce network request processing costs, avoid server compute bottlenecks, and improve aggregate storage system throughput. We realize our approach on an in-memory key-value store that executes 3.2 million strict serializable user-defined storage functions per second with 100 us response times. When running a mix of logic from different applications, it provides throughput better than running that logic purely at storage servers (85% more) or purely at clients (10% more). For our workloads, it also reduces latency (up to 2x) and transactional aborts (up to 33%) over pure client-side execution. 
    more » « less
  4. Hardware memory disaggregation is an emerging trend in datacenters that provides access to remote memory as part of a shared pool or unused memory on machines across the network. Memory disaggregation aims to improve memory utilization and scale memory-intensive applications. Current state-of-the-art prototypes have shown that hardware disaggregated memory is a reality at the rack-scale. However, the memory utilization benefits of memory disaggregation can only be fully realized at larger scales enabled by a datacenter-wide network. Introduction of a datacenter network results in new performance and reliability failures that may manifest as higher network latency. Additionally, sharing of the network introduces new points of contention between multiple applications. In this work, we characterize the impact of variable network latency and contention in an open-source hardware disaggregated memory prototype - ThymesisFlow. To support our characterization, we have developed a delay injection framework that introduces delays in remote memory access to emulate network latency. Based on the characterization results, we develop insights into how reliability and resource allocation mechanisms should evolve to support hardware memory disaggregation beyond rack-scale in datacenters. 
    more » « less
  5. One recent trend of cloud data center design is resource disaggregation. Instead of having server units with “converged” compute, memory, and storage resources, a disaggregated data center (DDC) has pools of resources of each type connected via a network. While the systems community has been investigating the research challenges of DDC by designing new OS and network stacks, the implications of DDC for next-generation database systems remain unclear. In this paper, we take a first step towards understanding how DDCs might affect the design of relational databases, discuss the potential advantages and drawbacks in the context of data processing, and outline research challenges in addressing them. 
    more » « less