skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The case for distributed shared-memory databases with RDMA-enabled memory disaggregation
Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of shared what in the database community. We envision thatdistributed shared-memory databases (DSM-DB, for short)- that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB.  more » « less
Award ID(s):
1815796 1910216
PAR ID:
10467781
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
VLDB Endowment. Proceedings of the VLDB Endowment
Date Published:
Journal Name:
Proceedings of the VLDB Endowment
Volume:
16
Issue:
1
ISSN:
2150-8097
Page Range / eLocation ID:
15 to 22
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Database peptide search is the primary computational technique for identifying peptides from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now ubiquitous in the current-generation of high-performance computing (HPC) systems, yet its application in the database peptide search domain remains limited. Part of the reason is the use of sub-optimal algorithms in the existing GPU-accelerated methods resulting in significantly inefficient hardware utilization. In this paper, we design and implement a new-age CPU-GPU HPC framework, calledGiCOPS, for efficient and complete GPU-acceleration of the modern database peptide search algorithms on supercomputers. Our experimentation shows that the GiCOPS exhibits between 1.2 to 5$$\times$$ × speed improvement over its CPU-only predecessor, HiCOPS, and over 10$$\times$$ × improvement over several existing GPU-based database search algorithms for sufficiently large experiment sizes. We further assess and optimize the performance of our framework using the Roofline Model and report near-optimal results for several metrics including computations per second, occupancy rate, memory workload, branch efficiency and shared memory performance. Finally, the CPU-GPU methods and optimizations proposed in our work for complex integer- and memory-bounded algorithmic pipelines can also be extended to accelerate the existing and future peptide identification algorithms. GiCOPS is now integrated with our umbrella HPC framework HiCOPS and is available at:https://github.com/pcdslab/gicops. 
    more » « less
  2. In-memory processing offers a promising solution for enhancing the performance of data-intensive applications. While analog in-memory computing demonstrates remarkable efficiency, its limited precision is suitable only for approximate computing tasks. In contrast, digital in-memory computing delivers the deterministic precision necessary to accelerate high-assurance applications. Current digital in-memory computing methods typically involve manually breaking down arithmetic operations into in-memory compute kernels. In contrast, traditional digital circuits are synthesized through intricate and automated design workflows. In this article, we introduce a logic synthesis framework called LOGIC, which facilitates the translation of high-level applications into digital in-memory compute kernels that can be executed using non-volatile memory. We propose techniques for decomposing element-wise arithmetic operations into in-memory kernels while minimizing the number of in-memory operations. Additionally, we optimize the sequence of in-memory operations to reduce non-volatile memory utilization. To address the NP-hard execution sequencing optimization problem, we have developed twolook-aheadalgorithms that offer practical solutions. Additionally, we leverage data layout reorganization to efficiently accelerate applications that heavily rely on sparse matrix-vector multiplication operations. Our experimental evaluations demonstrate that our proposed synthesis approach improves the area and latency of fixed-point multiplication by 84% and 20% compared to the state-of-the-art, respectively. Moreover, when applied to scientific computing applications sourced from the SuiteSparse Matrix Collection, our design achieves remarkable improvements in area, latency, and energy efficiency by factors of 4.8×, 2.6×, and 11×, respectively. 
    more » « less
  3. Summary Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences. 
    more » « less
  4. Compute and memory are tightly coupled within each server in traditional datacenters. Large-scale datacenter operators have identified this coupling as a root cause behind fleetwide resource underutilization and increasing Total Cost of Ownership (TCO). With the advent of ultra-fast networks and cache-coherent interfaces, memory disaggregation has emerged as a potential solution, whereby applications can leverage available memory even outside server boundaries. This paper summarizes the growing research landscape of memory disaggregation from a software perspective and introduces the challenges toward making it practical under current and future hardware trends. We also reflect on our seven-year journey in the SymbioticLab to build a comprehensive disaggregated memory system over ultra-fast networks. We conclude with some open challenges toward building next-generation memory disaggregation systems leveraging emerging cache-coherent interconnects. 
    more » « less
  5. As next-generation wireline and wireless systems are scaled to meet increasing data demands, existing signal processing approaches face significant power and latency challenges. To address these demands, we present CAMEL (Capacitive Analog In-Memory Equalization), a mixed-signal, discrete-time, analog in-memory switched-capacitor finite impulse response (FIR) filter designed in Intel16. Using this filter as a core, we develop a 16-tap antenna-domain I/Q equalizer, with 8-bit accuracy, consuming 90 mW from a 1 V supply, while achieving a data rate of 2 Gbps at a bit error rate (BER) of 10−4 in a realistic channel at 18 dB signal-to-noise ratio (SNR). Mismatch analysis and scaling studies indicate that this design can be extended to 12 bit and 48-tap configurations with linear increase in power, while delivering full digital reconfigurability, and datarates exceeding 5 Gbps with a power efficiency of 9.81 pJ/bit. 
    more » « less