skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DP-DNA: A Digital Pattern-Aware DNA Encoding Scheme to Improve Encoding Density of DNA Storage
With the rapid increase of available digital data, we are searching for a storage media with high density and capability of long-term preservation. Deoxyribonucleic Acid (DNA) storage is identified as such a promising candidate, especially for archival storage systems. However, the encoding density (i.e., how many binary bits can be encoded into one nucleotide) and error handling are two major factors intertwined in DNA storage. Considering encoding density, theoretically, one nucleotide (i.e., A, T, G, or C) can encode two binary bits (upper bound). However, due to biochemical constraints and other necessary information associated with payload, currently the encoding densities of various DNA storage systems are much less than this upper bound. Additionally, all existing studies of DNA encoding schemes are based on static analysis and really lack the awareness of dynamically changed digital patterns. Therefore, the gap between the static encoding and dynamic binary patterns prevents achieving a higher encoding density for DNA storage systems. In this paper, we propose a new Digital Pattern-Aware DNA storage system, called DP-DNA, which can efficiently store digital data in the DNA storage with high encoding density. DP-DNA maintains a set of encoding codes and uses a digital pattern-aware code (DPAC) to analyze the patterns of a binary sequence for a DNA strand and selects an appropriate code for encoding the binary sequence to achieve a high encoding density. An additional encoding field is added to the DNA encoding format, which can distinguish the encoding scheme used for those DNA strands, and thus we can decode DNA data back to its original digital data. Moreover, to further improve the encoding density, a variable-length scheme is proposed to increase the feasibility of the code scheme with a high encoding density. Finally, the experimental results indicate that the proposed DP-DNA achieves up to 103.5% higher encoding densities than prior work.  more » « less
Award ID(s):
2204656
PAR ID:
10500537
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-1948-4
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Location:
Stony Brook, NY, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem. This paper assumes that random accesses are necessary for practical DNA archival storage. We identify all factors affecting DNA storage capacity under current technologies and systematically investigate the practical DNA storage capacity with several popular encoding schemes. The investigation result shows the collision between primers and DNA payload sequences is a major factor limiting DNA storage capacity. Based on this discovery, we designed a new encoding scheme called Collision Aware Code (CAC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CAC can extricate 120% more primers from collisions and increase the DNA tube capacity from 211.96 GB to 295.11 GB. Besides, we also evaluate CAC's recoverability from DNA storage errors. The result shows CAC is comparable to those of existing encoding schemes. 
    more » « less
  2. Information storage in synthetic DNA oligomers is attractive due to the inherent physical density, stability, and energy efficiency of nucleic acids. Information retention –during writing, storage, and retrieval processes– requires development of efficient encoding/decoding systems. Additionally, potential intrusion of artificial or organic malevolent biologically active molecular machines could potentially cause catastrophic biosecurity concerns. Here we present an improved information storage method that focuses on efficiency and biosecurity. Herein this paper, we have developed and experimentally tested an algorithm to write data in pool of DNA strands by applying a fountain code (rateless erasure code), a Reed Solomon code, and an oligomer mapping code that ensures Bio-Security. We validated our method through wet-lab experiments and wrote, stored, and fully retrieved 105,360 bits of information. We validated the biosecurity aspects of our method through in-silico experimentation using a BLAST-run to compare our generated oligomers to existing genes documented in the public databases, a Plasmidhawk software analysis to determine our oligomers could not be artificially traced to have originated from another lab, and utilized an open-source software to determine whether our oligomers could have expressed any sequences that potentially originate or empower biologically meaningful functions. 
    more » « less
  3. DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of “DNA strand displacement,” we augment DNA storage with “in-memory” molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.< 
    more » « less
  4. As the volume of data is rapidly produced every day, there is a need for the storage media to keep up with the growth rate of digital data created. Despite emerging storage solutions that have been proposed such as Solid State Drive (SSD) with quad-level cells (QLC) or penta-level cells (PLC), Shingled Magnetic Recording (SMR), LTO-tape, etc., these technologies still fall short of meeting the demand for preserving huge amounts of available data. Moreover, current storage solutions have a limited lifespan, often lasting just a few years. To ensure long-term preservation, data must be continuously migrated to new storage drives. Therefore, there is a need for alternative storage technologies that not only offer high storage capacity but also long persistency. In contrast to existing storage devices, Synthetic Deoxyribonucleic Acid (DNA) storage emerges as a promising candidate for archival data storage, offering both high-density storage capacity and the potential for long-term data preservation. In this paper, we will introduce DNA storage, discuss the capabilities of DNA storage based on the current biotechnologies, discuss possible improvements in DNA storage, and explore further improvements with future technologies. Currently, the limitations of DNA storage are due to its weaknesses including high error rates, long access latency, etc. In this paper, we will focus on possible DNA storage research issues based on its relevant bio and computer technologies. Also, we will provide potential solutions and forward-looking predictions about the development and the future of DNA storage. We will discuss DNA storage from the following five perspectives: 1) We will describe the basic background of DNA storage including the basic technologies of read/write DNA storage, data access processes such as Polymerase Chain Reaction (PCR) based random access, encoding schemes from digital data to DNA, and required DNA storage format. 2) We will describe the issues of DNA storage based on the current technologies including bio-constraints during the encoding process such as avoiding long homopolymers and containing certain GC contents, different types of errors in synthesis and sequencing processes, low practical capacity with the current technologies, slow read and write performance, and low encoding density for random accesses. 3) Based on the previously mentioned issues, we will summarize the current solutions for each issue, and also give and discuss the potential solutions based on the future technologies. 4) From a system perspective, we will discuss how the DNA storage system will look if the DNA storage becomes commercialized and is widely equipped in archive systems. Some questions will be discussed including i) How to efficiently index data in DNA storage? ii) What is a good storage hierarchical storage system with DNA storage? iii) What will DNA storage be like with the development of technology? 5) Finally, we will provide a comparison with other competitive technologies. 
    more » « less
  5. null (Ed.)
    Deoxyribonucleic Acid (DNA) as a storage medium with high density and long-term preservation properties can satisfy the requirement of archival storage for rapidly increased digital volume. The read and write processes of DNA storage are error-prone. Images widely used in social media have the properties of fault tolerance which are well fitted to the DNA storage. However, prior work simply investigated the feasibility of DNA storage storing different types of data and simply store images in DNA storage, which did not fully investigate the fault-tolerant potential of images in the DNA storage. In this paper, we proposed a new image-based DNA system called IMG-DNA, which can efficiently store images in DNA storage with improved DNA storage robustness. First, a new DNA architecture is proposed to fit JPEG-based images and improve the image’s robustness in DNA storage. Moreover, barriers inserted in DNA sequences efficiently prevent error propagation in images of DNA storage. The experimental results indicate that the proposed IMG-DNA achieves much higher fault-tolerant than prior work. 
    more » « less