skip to main content


Title: Beating the I/O bottleneck: a case for log-structured virtual disks
With the increasing dominance of SSDs for local storage, today's network mounted virtual disks can no longer offer competitive performance. We propose a Log-Structured Virtual Disk (LSVD) that couples log-structured approaches at both the cache and storage layer to provide a virtual disk on top of S3-like storage. Both cache and backend store are order-preserving, enabling LSVD to provide strong consistency guarantees in case of failure. Our prototype demonstrates that the approach preserves all the advantages of virtual disks, while offering dramatic performance improvements over not only commonly used virtual disks, but the same disks combined with inconsistent (i.e. unsafe) local caching.  more » « less
Award ID(s):
1931714
NSF-PAR ID:
10392447
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
EuroSys '22: Proceedings of the Seventeenth European Conference on Computer Systems
Page Range / eLocation ID:
628 to 643
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the increasing dominance of SSDs for local storage, today's network mounted virtual disks can no longer offer competitive performance. We propose a Log-Structured Virtual Disk (LSVD) that couples log-structured approaches at both the cache and storage layer to provide a virtual disk on top of S3-like storage. Both cache and backend store are order-preserving, enabling LSVD to provide strong consistency guarantees in case of failure. Our prototype demonstrates that the approach preserves all the advantages of virtual disks, while offering dramatic performance improvements over not only commonly used virtual disks, but the same disks combined with inconsistent (i.e. unsafe) local caching. 
    more » « less
  2. The value of electronic waste at present is estimated to increase rapidly year after year, and with rapid advances in electronics, shows no signs of slowing down. Storage devices such as SATA Hard Disks and Solid State Devices are electronic devices with high value recyclable raw materials which often goes unrecovered. Most of the e-waste currently generated, including HDDs, is either managed by the informal recycling sector, or is improperly landfilled with the municipal solid waste, primarily due to insufficient recovery infrastructure and labor shortage in the recycling industry. This emphasizes the importance of developing modern advanced recycling technologies such as robotic disassembly. Performing smooth robotic disassembly operations of precision electronics necessitates fast and accurate geometric 3D profiling to provide a quick and precise location of key components. Fringe Projection Profilometry (FPP), as a variation of the well-known structured light technology, provides both the high speed and high accuracy needed to accomplish this. However, Using FPP for disassembly of high-precision electronics such as hard disks can be especially challenging, given that the hard disk platter is almost completely reflective. Furthermore, the metallic nature of its various components make it difficult to render an accurate 3D reconstruction. To address this challenge, We have developed a single-shot approach to predict the 3D point cloud of these devices using a combination of computer graphics, fringe projection, and deep learning. We calibrate a physical FPP-based 3D shape measurement system and set up its digital twin using computer graphics. We capture HDD and SSD CAD models at various orientations to generate virtual training datasets consisting of fringe images and their point cloud reconstructions. This is used to train the U-NET which is then found efficient to predict the depth of the parts to a high accuracy with only a single shot fringe image. This proposed technology has the potential to serve as a valuable fast 3D vision tool for robotic re-manufacturing and is a stepping stone for building a completely automated assembly system. 
    more » « less
  3. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The goal of this work was to design a low-cost computing facility that can support the development of an open source digital pathology corpus containing 1M images [1]. A single image from a clinical-grade digital pathology scanner can range in size from hundreds of megabytes to five gigabytes. A 1M image database requires over a petabyte (PB) of disk space. To do meaningful work in this problem space requires a significant allocation of computing resources. The improvements and expansions to our HPC (highperformance computing) cluster, known as Neuronix [2], required to support working with digital pathology fall into two broad categories: computation and storage. To handle the increased computational burden and increase job throughput, we are using Slurm [3] as our scheduler and resource manager. For storage, we have designed and implemented a multi-layer filesystem architecture to distribute a filesystem across multiple machines. These enhancements, which are entirely based on open source software, have extended the capabilities of our cluster and increased its cost-effectiveness. Slurm has numerous features that allow it to generalize to a number of different scenarios. Among the most notable is its support for GPU (graphics processing unit) scheduling. GPUs can offer a tremendous performance increase in machine learning applications [4] and Slurm’s built-in mechanisms for handling them was a key factor in making this choice. Slurm has a general resource (GRES) mechanism that can be used to configure and enable support for resources beyond the ones provided by the traditional HPC scheduler (e.g. memory, wall-clock time), and GPUs are among the GRES types that can be supported by Slurm [5]. In addition to being able to track resources, Slurm does strict enforcement of resource allocation. This becomes very important as the computational demands of the jobs increase, so that they have all the resources they need, and that they don’t take resources from other jobs. It is a common practice among GPU-enabled frameworks to query the CUDA runtime library/drivers and iterate over the list of GPUs, attempting to establish a context on all of them. Slurm is able to affect the hardware discovery process of these jobs, which enables a number of these jobs to run alongside each other, even if the GPUs are in exclusive-process mode. To store large quantities of digital pathology slides, we developed a robust, extensible distributed storage solution. We utilized a number of open source tools to create a single filesystem, which can be mounted by any machine on the network. At the lowest layer of abstraction are the hard drives, which were split into 4 60-disk chassis, using 8TB drives. To support these disks, we have two server units, each equipped with Intel Xeon CPUs and 128GB of RAM. At the filesystem level, we have implemented a multi-layer solution that: (1) connects the disks together into a single filesystem/mountpoint using the ZFS (Zettabyte File System) [6], and (2) connects filesystems on multiple machines together to form a single mountpoint using Gluster [7]. ZFS, initially developed by Sun Microsystems, provides disk-level awareness and a filesystem which takes advantage of that awareness to provide fault tolerance. At the filesystem level, ZFS protects against data corruption and the infamous RAID write-hole bug by implementing a journaling scheme (the ZFS intent log, or ZIL) and copy-on-write functionality. Each machine (1 controller + 2 disk chassis) has its own separate ZFS filesystem. Gluster, essentially a meta-filesystem, takes each of these, and provides the means to connect them together over the network and using distributed (similar to RAID 0 but without striping individual files), and mirrored (similar to RAID 1) configurations [8]. By implementing these improvements, it has been possible to expand the storage and computational power of the Neuronix cluster arbitrarily to support the most computationally-intensive endeavors by scaling horizontally. We have greatly improved the scalability of the cluster while maintaining its excellent price/performance ratio [1]. 
    more » « less
  4. Reasoning about storage systems is challenging because these systems make persistence guarantees even if the system crashes at any point. To achieve these crash-safety guarantees, storage systems include recovery procedures to restore the system to a consistent state after a crash. Moreover, large-scale systems are structured as multiple stacked layers and can require recovery at multiple layers of abstraction. Formal verification can ensure that crash-safety guarantees hold regardless of when the system crashes. To make verification tractable, large-scale systems should be verified in a modular fashion, layer-by-layer in the software stack. Layered recovery makes modularity challenging because the system can crash in the middle of a high-level recovery procedure and must start over from the low-level recovery procedure. We present Argosy, a framework for machine-checked proofs of storage systems that supports layered recovery implementations with modular proofs. The framework is based on combinators for transition relations that are inspired by Kleene algebra, which provides a convenient formalism for specifying and reasoning about crashes and recovery. On top of this framework, we implement Crash Hoare Logic (CHL), the program logic used by FSCQ. Using the logic, we have verified an example of layered recovery featuring a write-ahead log on top of a disk, which itself runs by replicating over two unreliable disks. The metatheory of the framework, the soundness of the program logic, and these examples are all verified in the Coq theorem prover. 
    more » « less
  5. Key-value (KV) software has proven useful to a wide variety of applications including analytics, time-series databases, and distributed file systems. To satisfy the requirements of diverse workloads, KV stores have been carefully tailored to best match the performance characteristics of underlying solid-state block devices. Emerging KV storage device is a promising technology for both simplifying the KV software stack and improving the performance of persistent storage-based applications. However, while providing fast, predictable put and get operations, existing KV storage devices don’t natively support range queries which are critical to all three types of applications described above. In this paper, we present KVRangeDB, a software layer that enables processing range queries for existing hash-based KV solid-state disks (KVSSDs). As an effort to adapt to the performance characteristics of emerging KVSSDs, KVRangeDB implements log-structured merge tree key index that reduces compaction I/O, merges keys when possible, and provides separate caches for indexes and values. We evaluated the KVRangeDB under a set of representative workloads, and compared its performance with two existing database solutions: a Rocksdb variant ported to work with the KVSSD, and Wisckey, a key-value database that is carefully tuned for conventional block devices. On filesystem aging workloads, KVRangeDB outperforms Wisckey by 23.7x in terms of throughput and reduce CPU usage and external write amplifications by 14.3x and 9.8x, respectively. 
    more » « less