skip to main content

Title: Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories
To process real-world datasets, modern data-parallel systems often require extremely large amounts of memory, which are both costly and energy inefficient. Emerging non-volatile memory (NVM) technologies offer high capacity compared to DRAM and low energy compared to SSDs. Hence, NVMs have the potential to fundamentally change the dichotomy between DRAM and durable storage in Big Data processing. However, most Big Data applications are written in managed languages and executed on top of a managed runtime that already performs various dimensions of memory management. Supporting hybrid physical memories adds a new dimension, creating unique challenges in data replacement. This article proposes Panthera, a semantics-aware, fully automated memory management technique for Big Data processing over hybrid memories. Panthera analyzes user programs on a Big Data system to infer their coarse-grained access patterns, which are then passed to the Panthera runtime for efficient data placement and migration. For Big Data applications, the coarse-grained data division information is accurate enough to guide the GC for data layout, which hardly incurs overhead in data monitoring and moving. We implemented Panthera in OpenJDK and Apache Spark. Based on Big Data applications’ memory access pattern, we also implemented a new profiling-guided optimization strategy, which is transparent to more » applications. With this optimization, our extensive evaluation demonstrates that Panthera reduces energy by 32–53% at less than 1% time overhead on average. To show Panthera’s applicability, we extend it to QuickCached, a pure Java implementation of Memcached. Our evaluation results show that Panthera reduces energy by 28.7% at 5.2% time overhead on average. « less
; ; ; ; ; ; ; ; ; ;
Award ID(s):
2106838 1763172
Publication Date:
Journal Name:
ACM Transactions on Computer Systems
Page Range or eLocation-ID:
1 to 38
Sponsoring Org:
National Science Foundation
More Like this
  1. Using flash-based solid state drives (SSDs) as main memory has been proposed as a practical solution towards scaling memory capacity for data-intensive applications. However, almost all existing approaches rely on the paging mechanism to move data between SSDs and host DRAM. This inevitably incurs significant performance overhead and extra I/O traffic. Thanks to the byte-addressability supported by the PCIe interconnect and the internal memory in SSD controllers, it is feasible to access SSDs in both byte and block granularity today. Exploiting the benefits of SSD's byte-accessibility in today's memory-storage hierarchy is, however, challenging as it lacks systems support and abstractions for programs. In this paper, we present FlatFlash, an optimized unified memory-storage hierarchy, to efficiently use byte-addressable SSD as part of the main memory. We extend the virtual memory management to provide a unified memory interface so that programs can access data across SSD and DRAM in byte granularity seamlessly. We propose a lightweight, adaptive page promotion mechanism between SSD and DRAM to gain benefits from both the byte-addressable large SSD and fast DRAM concurrently and transparently, while avoiding unnecessary page movements. Furthermore, we propose an abstraction of byte-granular data persistence to exploit the persistence nature of SSDs, upon whichmore »we rethink the design primitives of crash consistency of several representative software systems that require data persistence, such as file systems and databases. Our evaluation with a variety of applications demonstrates that, compared to the current unified memory-storage systems, FlatFlash improves the performance for memory-intensive applications by up to 2.3x, reduces the tail latency for latency-critical applications by up to 2.8x, scales the throughput for transactional database by up to 3.0x, and decreases the meta-data persistence overhead for file systems by up to 18.9x. FlatFlash also improves the cost-effectiveness by up to 3.8x compared to DRAM-only systems, while enhancing the SSD lifetime significantly.« less
  2. Software Managed Manycore (SMM) architectures have been proposed as a solution for scaling the memory architecture. In a typical SMM architecture, Scratch Pad Memories (SPM) is used instead of caches, and data must be explicitly managed in software. While all code and data need to be managed, heap management on SMMs is especially challenging due to the highly dynamic nature of heap data access. Existing techniques spend over 90% of execution time on heap data management, which largely compromised the power efficiency of SMM architectures. This paper presents compiler-based efficient techniques that reduce heap management overhead. Experimental results on benchmarks from MiBench executing on an SMM processor modeled in Gem5 demonstrate that our approach implemented in LLVM 3.8 can improve execution time by an average of 80%, compared to the state-of-the-art.
  3. Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.
  4. The increasing demand of big data analytics for more main memory capacity in datacenters and exascale computing environments is driving the integration of heterogeneous memory technologies. The new technologies exhibit vastly greater differences in access latencies, bandwidth and capacity compared to the traditional NUMA systems. Leveraging this heterogeneity while also delivering application performance enhancements requires intelligent data placement. We present Kleio, a page scheduler with machine intelligence for applications that execute across hybrid memory components. Kleio is a hybrid page scheduler that combines existing, lightweight, history-based data tiering methods for hybrid memory, with novel intelligent placement decisions based on deep neural networks. We contribute new understanding toward the scope of benefits that can be achieved by using intelligent page scheduling in comparison to existing history-based approaches, and towards the choice of the deep learning algorithms and their parameters that are effective for this problem space. Kleio incorporates a new method for prioritizing pages that leads to highest performance boost, while limiting the resulting system resource overheads. Our performance evaluation indicates that Kleio reduces on average 80% of the performance gap between the existing solutions and an oracle with knowledge of future access pattern. Kleio provides hybrid memory systems with fastmore »and effective neural network training and prediction accuracy levels, which bring significant application performance improvements with limited resource overheads, so as to lay the grounds for its practical integration in future systems.« less
  5. The performance of modern Big Data frameworks, e.g. Spark, depends greatly on high-speed storage and shuffling, which impose a significant memory burden on production data centers. In many production situations, the persistence and shuffling intensive applications can suffer a major performance loss due to lack of memory. Thus, the common practice is usually to over-allocate the memory assigned to the data workers for production applications, which in turn reduces overall resource utilization. One efficient way to address the dilemma between the performance and cost efficiency of Big Data applications is through data center computing resource disaggregation. This paper proposes and implements a system that incorporates the Spark Big Data framework with a novel in-memory distributed file system to achieve memory disaggregation for data persistence and shuffling. We address the challenge of optimizing performance at affordable cost by co-designing the proposed in-memory distributed file system with large-volume DIMM-based persistent memory (PMEM) and RDMA technology. The disaggregation design allows each part of the system to be scaled independently, which is particularly suitable for cloud deployments. The proposed system is evaluated in a production-level cluster using real enterprise-level Spark production applications. The results of an empirical evaluation show that the system can achievemore »up to a 3.5- fold performance improvement for shuffle-intensive applications with the same amount of memory, compared to the default Spark setup. Moreover, by leveraging PMEM, we demonstrate that our system can effectively increase the memory capacity of the computing cluster with affordable cost, with a reasonable execution time overhead with respect to using local DRAM only.« less