Log-based data management systems use storage as if it were an append-only medium, transforming random writes into sequential writes, which delivers significant benefits when logs are persisted on hard disks. Although solid-state drives (SSDs) offer improved random write capabilities, sequential writes continue to be advan- tageous due to locality and space efficiency. However, the inherent properties of flash-based SSDs induce major disadvantages when used with a random write block interface, causing write amplifica- tion, uneven wear, log stacking, and garbage collection overheads. To eliminate these disadvantages, Zoned Namespace (ZNS) SSDs have recently been introduced. They offer increased capacity, re- duced write amplification, and open up data placement and garbage collection to the host through zones, which have sequential-write semantics and must be explicitly reset. We explain how the new ZNS Zone Append primitive, which sup- ports pushing fine-grained data placement onto the device, along with our proposal for “Group Append”, which enables sub-block sized appends, could benefit log-structured data management sys- tems. We explore advantages of ZNS SSDs with Zone Append, Group Append, and computational storage in four log-based data management areas: (i) log-based file systems, (ii) LSM trees such as RocksDB, (iii) database systems, and (iv) event logs/shared logs. Furthermore, we propose research directions for each of these data management systems using ZNS SSDs.
more »
« less
eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device Utilization
Emerging Zoned Namespace (ZNS) SSDs, providing the coarse-grained zone abstraction, hold the potential to significantly enhance the cost efficiency of future storage infrastructure and mitigate performance unpredictability. However, existing ZNS SSDs have a static zoned interface, making them in-adaptable to workload runtime behavior, unscalable to underlying hardware capabilities, and interfering with co-located zones. Applications either under-provision the zone resources yielding unsatisfied throughput, create over-provisioned zones and incur costs, or experience unexpected I/O latencies. We propose eZNS, an elastic-ZNS interface that exposes an adaptive zone with predictable characteristics. eZNS comprises two major components: a zone arbiter that manages zone allocation and active resources on the control plane, and a hierarchical I/O scheduler with read congestion control and write admission control on the data plane. Together, eZNS enables the transparent use of a ZNS SSD and closes the gap between application requirements and zone interface properties. Our evaluations over RocksDB demonstrate that eZNS outperforms a static zoned interface by 17.7% and 80.3% in throughput and tail latency, respectively, at most.
more »
« less
- Award ID(s):
- 2212192
- PAR ID:
- 10525629
- Publisher / Repository:
- Association for Computing Machinery
- Date Published:
- Journal Name:
- ACM Transactions on Storage
- Volume:
- 20
- Issue:
- 3
- ISSN:
- 1553-3077
- Page Range / eLocation ID:
- 1 to 41
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Millions of sensors, mobile applications and machines now generate billions of events. Specialized many-core key-value stores (KVSs) can ingest and index these events at high rates (over 100 Mops/s on one machine) if events are generated on the same machine; however, to be practical and cost-effective they must ingest events over the network and scale across cloud resources elastically. We present Shadowfax, a new distributed KVS based on FASTER, that transparently spans DRAM, SSDs, and cloud blob storage while serving 130 Mops/s/VM over commodity Azure VMs using conventional Linux TCP. Beyond high single-VM performance, Shadowfax uses a unique approach to distributed reconfiguration that avoids any server-side key ownership checks or cross-core coordination both during normal operation and migration. Hence, Shadowfax can shift load in 17 s to improve system throughput by 10 Mops/s with little disruption. Compared to the state-of-the-art, it has 8x better throughput (than Seastar+memcached) and avoids costly I/O to move cold data during migration. On 12 machines, Shadowfax retains its high throughput to perform 930 Mops/s, which, to the best of our knowledge, is the highest reported throughput for a distributed KVS used for large-scale data ingestion and indexing.more » « less
-
Abstract Subduction zones host some of Earth's most damaging natural hazards, including megathrust earthquakes and earthquake‐induced tsunamis. A major control on the initiation and rupture characteristics of subduction megathrust earthquakes is how the coupled zone along the subduction interface accumulates elastic strain between events. We present results from observations of slow slip events (SSEs) in Cascadia occurring during the interseismic period downdip of the fully coupled zone, which imply that the orientation of strain accumulation within the coupled zone can vary with depth. Interseismic GPS motions suggest that forces derived from relative plate motions across a shallow, offshore locked plate interface dominate over decadal timescales. Deeper on the plate interface, below the locked (seismogenic) patch, slip during SSEs dominantly occurs in the updip direction, reflecting a dip‐parallel force acting on the slab, such as slab pull. This implies that in subduction zones with obliquely convergent plate motions, the seismogenic zone of the megathrust is loaded by forces acting in two discrete directions, leading to a depth‐varying orientation of strain accumulation on the plate interface.more » « less
-
The Virtual Instrument Software Architecture (VISA)(National Instruments, 2021, 2022c; Wikipedia Contributors, 2021) is a simple Application Programming Interface (API) to com- municate with test and measurement instrumentation from a computer. VISA includes specifications for communicating with resources or instruments over GPIB (General Purpose Interface Bus, IEEE-488) and VXI (VME eXtensions for instrumentation), which are test and measurement specific I/O interfaces along with providing protocols for communicating over PC-Standard I/O standards, such as VXI-11 (over TCP/IP), UCSBTMC (USBTest and Measurement Class, over USB), HiSLIP (High Speed LAN Instrument Protocol).(Wikipedia Contributors, 2021) VISA’s ability to communicate with a wide variety of instruments over a broad range of I/O’s using a common set of APIs makes it an attractive communication API for scientists and equipment manufacturers to write equipment control software with.more » « less
-
This article traces the evolution of SSD (solid-state drive) interfaces, examining the transition from the block storage paradigm inherited from hard disk drives to SSD-specific standards customized to flash memory. Early SSDs conformed to the block abstraction for compatibility with the existing software storage stack, but studies and deployments show that this limits the performance potential for SSDs. As a result, new SSD-specific interface standards emerged to not only capitalize on the low latency and abundant internal parallelism of SSDs, but also include new command sets that diverge from the longstanding block abstraction. We first describe flash memory technology in the context of the block storage abstraction and the components within an SSD that provide the block storage illusion. We then describe the genealogy and relationships among academic research and industry standardization efforts for SSDs, along with some of their rise and fall in popularity. We classify these works into four evolving branches: (1) extending block abstraction with host-SSD hints/directives; (2) enhancing host-level control over SSDs; (3) offloading host-level management to SSDs; and (4) making SSDs byte-addressable. By dissecting these trajectories, the article also sheds light on the emerging challenges and opportunities, providing a roadmap for future research and development in SSD technologies.more » « less
An official website of the United States government

