skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Digital data storage on DNA tape using CRISPR base editors
Abstract While the archival digital memory industry approaches its physical limits, the demand is significantly increasing, therefore alternatives emerge. Recent efforts have demonstrated DNA’s enormous potential as a digital storage medium with superior information durability, capacity, and energy consumption. However, the majority of the proposed systems require on-demand de-novo DNA synthesis techniques that produce a large amount of toxic waste and therefore are not industrially scalable and environmentally friendly. Inspired by the architecture of semiconductor memory devices and recent developments in gene editing, we created a molecular digital data storage system called “DNA Mutational Overwriting Storage” (DMOS) that stores information by leveraging combinatorial, addressable, orthogonal, and independent in vitro CRISPR base-editing reactions to write data on a blank pool of greenly synthesized DNA tapes. As a proof of concept, this work illustrates writing and accurately reading of both a bitmap representation of our school’s logo and the title of this study on the DNA tapes.  more » « less
Award ID(s):
2027738
PAR ID:
10469044
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
14
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the expanding data storage capacity needs, DNA as an alternative to the archival storage medium offers potential advantages, including higher density and data retention for information storage1,2. However, the majority of DNA-based memory systems are write-once and read-only, although few studies have suggested overwriting digital data on the existing DNA using chemical modifications of bases 3. Using those strategies requires constantly updating the entire data coding and iteratively synthesizing the DNA pool. Therefore, considering the complexity and cost, those methods needed some amendments to become industrially scalable. Inspired by magnetic tapes4 and multisession-CD5, in this work, we created a DNA storage system coined the Molecular File System (MolFS), to organize, store, and edit digital information in a DNA pool. MolFS uses DNA pools that consist of multiple sessions, where each session contains data block and unique index sections to store and edit the files. We used indexes to describe the file system hierarchy, locate files along with the blocks, recognize the sessions, and identify the file versions. This approach reduces the editing cost compared to the state-of-the-art methods, and editing or adding data requires only synthesizing a new DNA pool containing the DNA session of the differential file. As proof of concept, we encoded 2.3 Kbytes of graphic and text data into 2 DNA pools. To edit the existing DNA pool, we added 8 new differential data blocks to existing pools, reaching 13.8 Kbytes of data stored from sessions 1 to 5. We performed nanopore sequencing and recovered the data from the MolFS sessions accurately and precisely. 
    more » « less
  2. Abstract DNA is a compelling alternative to non-volatile information storage technologies due to its information density, stability, and energy efficiency. Previous studies have used artificially synthesized DNA to store data and automated next-generation sequencing to read it back. Here, we report digital Nucleic Acid Memory (dNAM) for applications that require a limited amount of data to have high information density, redundancy, and copy number. In dNAM, data is encoded by selecting combinations of single-stranded DNA with (1) or without (0) docking-site domains. When self-assembled with scaffold DNA, staple strands form DNA origami breadboards. Information encoded into the breadboards is read by monitoring the binding of fluorescent imager probes using DNA-PAINT super-resolution microscopy. To enhance data retention, a multi-layer error correction scheme that combines fountain and bi-level parity codes is used. As a prototype, fifteen origami encoded with ‘Data is in our DNA!\n’ are analyzed. Each origami encodes unique data-droplet, index, orientation, and error-correction information. The error-correction algorithms fully recover the message when individual docking sites, or entire origami, are missing. Unlike other approaches to DNA-based data storage, reading dNAM does not require sequencing. As such, it offers an additional path to explore the advantages and disadvantages of DNA as an emerging memory material. 
    more » « less
  3. Abstract Deoxyribonucleic acid (DNA) is emerging as an alternative archival memory technology. Recent advancements in DNA synthesis and sequencing have both increased the capacity and decreased the cost of storing information in de novo synthesized DNA pools. In this survey, we review methods for translating digital data to and/or from DNA molecules. An emphasis is placed on methods which have been validated by storing and retrieving real-world data via in-vitro experiments. 
    more » « less
  4. DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of “DNA strand displacement,” we augment DNA storage with “in-memory” molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.< 
    more » « less
  5. As the volume of data is rapidly produced every day, there is a need for the storage media to keep up with the growth rate of digital data created. Despite emerging storage solutions that have been proposed such as Solid State Drive (SSD) with quad-level cells (QLC) or penta-level cells (PLC), Shingled Magnetic Recording (SMR), LTO-tape, etc., these technologies still fall short of meeting the demand for preserving huge amounts of available data. Moreover, current storage solutions have a limited lifespan, often lasting just a few years. To ensure long-term preservation, data must be continuously migrated to new storage drives. Therefore, there is a need for alternative storage technologies that not only offer high storage capacity but also long persistency. In contrast to existing storage devices, Synthetic Deoxyribonucleic Acid (DNA) storage emerges as a promising candidate for archival data storage, offering both high-density storage capacity and the potential for long-term data preservation. In this paper, we will introduce DNA storage, discuss the capabilities of DNA storage based on the current biotechnologies, discuss possible improvements in DNA storage, and explore further improvements with future technologies. Currently, the limitations of DNA storage are due to its weaknesses including high error rates, long access latency, etc. In this paper, we will focus on possible DNA storage research issues based on its relevant bio and computer technologies. Also, we will provide potential solutions and forward-looking predictions about the development and the future of DNA storage. We will discuss DNA storage from the following five perspectives: 1) We will describe the basic background of DNA storage including the basic technologies of read/write DNA storage, data access processes such as Polymerase Chain Reaction (PCR) based random access, encoding schemes from digital data to DNA, and required DNA storage format. 2) We will describe the issues of DNA storage based on the current technologies including bio-constraints during the encoding process such as avoiding long homopolymers and containing certain GC contents, different types of errors in synthesis and sequencing processes, low practical capacity with the current technologies, slow read and write performance, and low encoding density for random accesses. 3) Based on the previously mentioned issues, we will summarize the current solutions for each issue, and also give and discuss the potential solutions based on the future technologies. 4) From a system perspective, we will discuss how the DNA storage system will look if the DNA storage becomes commercialized and is widely equipped in archive systems. Some questions will be discussed including i) How to efficiently index data in DNA storage? ii) What is a good storage hierarchical storage system with DNA storage? iii) What will DNA storage be like with the development of technology? 5) Finally, we will provide a comparison with other competitive technologies. 
    more » « less