Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of shared what in the database community. We envision that
Effectively exploiting emerging far-memory technology requires consideration of operating on richly connected data outside the context of the parent process. Operating-system technology in development offers help by exposing abstractions such as memory objects and globally invariant pointers that can be traversed by devices and newly instantiated compute. Such ideas will allow applications running on future heterogeneous distributed systems with disaggregated memory nodes to exploit near-memory processing for higher performance and to independently scale their memory and compute resources for lower cost.
more » « less- Award ID(s):
- 1841545
- PAR ID:
- 10493455
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- ACM queue
- Volume:
- 21
- Issue:
- 3
- ISSN:
- 1542-7730
- Page Range / eLocation ID:
- 75 to 93
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
distributed shared-memory databases (DSM-DB, for short) - that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB. -
Abstract There is a growing demand for low-power, autonomously learning artificial intelligence (AI) systems that can be applied at the edge and rapidly adapt to the specific situation at deployment site. However, current AI models struggle in such scenarios, often requiring extensive fine-tuning, computational resources, and data. In contrast, humans can effortlessly adjust to new tasks by transferring knowledge from related ones. The concept of learning-to-learn (L2L) mimics this process and enables AI models to rapidly adapt with only little computational effort and data. In-memory computing neuromorphic hardware (NMHW) is inspired by the brain’s operating principles and mimics its physical co-location of memory and compute. In this work, we pair L2L with in-memory computing NMHW based on phase-change memory devices to build efficient AI models that can rapidly adapt to new tasks. We demonstrate the versatility of our approach in two scenarios: a convolutional neural network performing image classification and a biologically-inspired spiking neural network generating motor commands for a real robotic arm. Both models rapidly learn with few parameter updates. Deployed on the NMHW, they perform on-par with their software equivalents. Moreover, meta-training of these models can be performed in software with high-precision, alleviating the need for accurate hardware models.
-
As memory requirements grow, and advances in memory technology slow, the availability of sufficient main memory is increasingly the bottleneck in large compute clusters. One solution to this is memory disaggregation, where jobs can remotely access memory on other servers, or far memory. This paper first presents faster swapping mechanisms and a far memory-aware cluster scheduler that make it possible to support far memory at rack scale. Then, it examines the conditions under which this use of far memory can increase job throughput. We find that while far memory is not a panacea, for memory-intensive workloads it can provide performance improvements on the order of 10% or more even without changing the total amount of memory available.more » « less
-
Abstract Edge devices operating in dynamic environments critically need the ability to continually learn without catastrophic forgetting. The strict resource constraints in these devices pose a major challenge to achieve this, as continual learning entails memory and computational overhead. Crossbar architectures using memristor devices offer energy efficiency through compute-in-memory and hold promise to address this issue. However, memristors often exhibit low precision and high variability in conductance modulation, rendering them unsuitable for continual learning solutions that require precise modulation of weight magnitude for consolidation. Current approaches fall short to address this challenge directly and rely on auxiliary high-precision memory, leading to frequent memory access, high memory overhead, and energy dissipation. In this research, we propose probabilistic metaplasticity, which consolidates weights by modulating their update
probability rather than magnitude. The proposed mechanism eliminates high-precision modification to weight magnitudes and, consequently, the need for auxiliary high-precision memory. We demonstrate the efficacy of the proposed mechanism by integrating probabilistic metaplasticity into a spiking network trained on an error threshold with low-precision memristor weights. Evaluations of continual learning benchmarks show that probabilistic metaplasticity achieves performance equivalent to state-of-the-art continual learning models with high-precision weights while consuming ~ 67% lower memory for additional parameters and up to ~ 60× lower energy during parameter updates compared to an auxiliary memory-based solution. The proposed model shows potential for energy-efficient continual learning with low-precision emerging devices. -
Abstract Josephson-CMOS hybrid memory leverages the high speed and low power operation of single-flux quantum logic and the high integration densities of CMOS technology. One of the commonly used type of interface circuits in Josephson-CMOS memory is a Suzuki stack, which is a latching high-voltage driver circuit. Suzuki stack circuits are typically powered by an AC bias voltage that has several limitations such as synchronization and coupling effects. To address these issues, a novel DC-biased Suzuki stack circuit is proposed in this paper. As compared to a conventional AC-biased Suzuki stack circuit, the proposed DC-biased design can provide similar output voltage levels and parameter margins, approximately two times higher operating frequency, and three orders of magnitude lower heat load of bias cables.