skip to main content


Search for: All records

Creators/Authors contains: "Du, David H."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem. This paper assumes that random accesses are necessary for practical DNA archival storage. We identify all factors affecting DNA storage capacity under current technologies and systematically investigate the practical DNA storage capacity with several popular encoding schemes. The investigation result shows the collision between primers and DNA payload sequences is a major factor limiting DNA storage capacity. Based on this discovery, we designed a new encoding scheme called Collision Aware Code (CAC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CAC can extricate 120% more primers from collisions and increase the DNA tube capacity from 211.96 GB to 295.11 GB. Besides, we also evaluate CAC's recoverability from DNA storage errors. The result shows CAC is comparable to those of existing encoding schemes. 
    more » « less
  2. To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information, called I/O access hints, from upper layers to the storage layer may greatly improve application performance and ease data management in storage systems. This is especially true for heterogeneous storage systems that consist of multiple types of storage devices. Since ingesting external access hints will likely involve laborious modifications of legacy I/O stacks, it is very hard to evaluate the effect and take advantages of access hints. In this article, we design a generic and flexible framework, called HintStor, to quickly play with a set of I/O access hints and evaluate their impacts on heterogeneous storage systems. HintStor provides a new application/user-level interface, a file system plugin, and performs data management with a generic block storage data manager. We demonstrate the flexibility of HintStor by evaluating four types of access hints: file system data classification, stream ID, cloud prefetch, and I/O task scheduling on a Linux platform. The results show that HintStor can execute and evaluate various I/O access hints under different scenarios with minor modifications to the kernel and applications. 
    more » « less
  3. Hybrid Shingled Magnetic Recording (H-SMR) drives are the most recently developed SMR drives, which allow dynamic conversion of the recording format between Conventional Magnetic Recording (CMR) and SMR on a single disk drive. We identify the unique opportunities of H-SMR drives to manage the tradeoffs between performance and capacity, including the possibility of adjusting the SMR area capacity based on storage usage and the flexibility of dynamic data swapping between the CMR area and SMR area. We design and implement FluidSMR, an adaptive management scheme for hybrid SMR Drives, to fully utilize H-SMR drives under different workloads and capacity usages. FluidSMR has a two-phase allocation scheme to support a growing usage of the H-SMR drive. The scheme can intelligently determine the sizes of the CMR and the SMR space in an H-SMR drive based on the dynamic changing of workloads. Moreover, FluidSMR uses a cache in the CMR region, managed by a proposed loop-back log policy, to reduce the overhead of updates to the SMR region. Evaluations using enterprise traces demonstrate that FluidSMR outperforms baseline schemes in various workloads by decreasing the average I/O latency and effectively reducing/controlling the performance impact of the format conversion between CMR and SMR. 
    more » « less
  4. Computer systems utilizing byte-addressable Non-Volatile Memory ( NVM ) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree ( LSM-Tree ) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree using leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores. 
    more » « less
  5. null (Ed.)