skip to main content


Title: Q-ary Asymmetric LOCO Codes: Constrained Codes Supporting Flash Evolution
Flash memory devices are winning the competition for storage density against magnetic recording devices. This outcome results from advances in physics that allow storage of more than one bit per cell, coupled with advances in signal processing that reduce the effect of physical instabilities. Constrained codes are used in storage to avoid problematic patterns. Recently, we introduced binary symmetric lexicographically-ordered constrained codes (LOCO codes) for data storage and transmission. This paper introduces simple constrained codes that support non-binary physical gates in multi, triple, quad, and the currently-in-development penta-level cell (M/T/Q/P-LC) Flash memories. The new codes can be easily modified if problematic patterns change with time. These codes are designed to mitigate inter-cell interference, which is a critical source of error in Flash devices. The new codes are called q-ary asymmetric LOCO codes (QA-LOCO codes), and the construction subsumes codes previously designed for single-level cell (SLC) Flash devices (ALOCO codes). QA-LOCO codes work for a Flash device with any number, q, of levels per cell. For q ≥ 4, we show that QA-LOCO codes can achieve rates greater than 0.95log 2 q information bits per coded symbol. Capacity-achieving rates, affordable encoding-decoding complexity, and ease of reconfigurability support the growing improvement of M/T/Q/P-LC Flash memory devices, as well as lifecycle management as the characteristics of these devices change with time.  more » « less
Award ID(s):
1717602
NSF-PAR ID:
10191402
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Symposium on Information Theory (ISIT 2020)
Page Range / eLocation ID:
688 to 693
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In data storage and data transmission, certain patterns are more likely to be subject to error when written (transmitted) onto the media. In magnetic recording systems with binary data and bipolar non-return-to-zero signaling, patterns that have insufficient separation between consecutive transitions exacerbate inter-symbol interference. Constrained codes are used to eliminate such error-prone patterns. A recent example is a new family of capacity-achieving constrained codes, named lexicographically-ordered constrained codes (LOCO codes). LOCO codes are symmetric, that is, the set of forbidden patterns is closed under taking pattern complements. LOCO codes are suboptimal in terms of rate when used in Flash devices where block erasure is employed since the complement of an error-prone pattern is not detrimental in these devices. This paper introduces asymmetric LOCO codes (A-LOCO codes), which are lexicographically-ordered constrained codes that forbid only those patterns that are detrimental for Flash performance. A-LOCO codes are also capacity-achieving, and at finite-lengths, they offer higher rates than the available state-of-the-art constrained codes designed for the same goal. The mapping-demapping between the index and the codeword in A-LOCO codes allows low-complexity encoding and decoding algorithms that are simpler than their LOCO counterparts. 
    more » « less
  2. In order to meet the demands of data-hungry applications, data storage devices are required to be increasingly denser. Various sources of error appear with this increase in density. Multi-dimensional (MD) graph-based codes are capable of mitigating error sources like interference and channel non-uniformity in dense storage devices. Recently, a technique was proposed to enhance the performance of MD spatially-coupled codes that are based on circulants. The technique carefully relocates circulants to minimize the number of short cycles. However, cycles become more detrimental when they combine together to form more advanced objects, e.g., absorbing sets, including low-weight codewords. In this paper, we show how MD relocations can be exploited to minimize the number of detrimental objects in the graph of an MD code. Moreover, we demonstrate the savings in the number of relocation arrangements earned by focusing on objects rather than cycles. Our technique is applicable to a wide variety of one-dimensional (OD) codes. Simulation results reveal significant lifetime gains in practical Flash systems achieved by MD codes designed using our technique compared with OD codes having similar parameters. 
    more » « less
  3. We present the Hybrid Polar Decoder (HyPD), a hybrid classical-quantum decoder design for Polar error correction codes, which are becoming widespread in today’s 5G and tomorrow’s 6G networks. HyPD employs CMOS processing for the Polar decoder’s binary tree traversal, and Quantum Annealing (QA) processing for the Quantum Polar Decoder (QPD)-a Maximum-Likelihood QA-based Polar decoder submodule. QPD’s design efficiently transforms a Polar decoder into a quadratic polynomial optimization form, then maps this polynomial on to the physical QA hardware via QPD-MAP, a customized problem mapping scheme tailored to QPD. We have experimentally evaluated HyPD on a state-of-the-art QA device with 5,627 qubits, for 5G-NR Polar codes with block length of 1,024 bits, in Rayleigh fading channels. Our results show that HyPD outperforms Successive Cancellation List decoders of list size eight by half an order of bit error rate magnitude, and achieves a 1,500-bytes frame delivery rate of 99.1%, at 1 dB signal-to-noise ratio. Further studies present QA compute time considerations. We also propose QPD-HW, a novel QA hardware topology tailored for the task of decoding Polar codes. QPD-HW is sparse, flexible to code rate and block length, and may be of potential interest to the designers of tomorrow’s 6G wireless networks. 
    more » « less
  4. Access libraries such as ROOT[1] and HDF5[2] allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assumptions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. For example, access libraries often implement buffering and data layout that assume that large, single-threaded sequential access patterns are causing less overall latency than small parallel random access: while this is true for spinning media, it is not true for flash media. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Ceph’s extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations. 
    more » « less
  5. On large-scale high performance computing (HPC) systems, applications are provisioned with aggregated resources to meet their peak demands for brief periods. This results in resource underutilization because application requirements vary a lot during execution. This problem is particularly pronounced for deep learning applications that are running on leadership HPC systems with a large pool of burst buffers in the form of flash or non-volatile memory (NVM) devices. In this paper, we examine the I/O patterns of deep neural networks and reveal their critical need of loading many small samples randomly for successful training. We have designed a specialized Deep Learning File System (DLFS) that provides a thin set of APIs. Particularly, we design the metadata management of DLFS through an in-memory tree-based sample directory and its file services through the user-level SPDK protocol that can disaggregate the capabilities of NVM Express (NVMe) devices to parallel training tasks. Our experimental results show that DLFS can dramatically improve the throughput of training for deep neural networks on NVMe over Fabric, compared with the kernel-based Ext4 file system. Furthermore, DLFS achieves efficient user-level storage disaggregation with very little CPU utilization. 
    more » « less