skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A new approach to organize, edit and add data for DNA-based data storage
With the expanding data storage capacity needs, DNA as an alternative to the archival storage medium offers potential advantages, including higher density and data retention for information storage1,2. However, the majority of DNA-based memory systems are write-once and read-only, although few studies have suggested overwriting digital data on the existing DNA using chemical modifications of bases 3. Using those strategies requires constantly updating the entire data coding and iteratively synthesizing the DNA pool. Therefore, considering the complexity and cost, those methods needed some amendments to become industrially scalable. Inspired by magnetic tapes4 and multisession-CD5, in this work, we created a DNA storage system coined the Molecular File System (MolFS), to organize, store, and edit digital information in a DNA pool. MolFS uses DNA pools that consist of multiple sessions, where each session contains data block and unique index sections to store and edit the files. We used indexes to describe the file system hierarchy, locate files along with the blocks, recognize the sessions, and identify the file versions. This approach reduces the editing cost compared to the state-of-the-art methods, and editing or adding data requires only synthesizing a new DNA pool containing the DNA session of the differential file. As proof of concept, we encoded 2.3 Kbytes of graphic and text data into 2 DNA pools. To edit the existing DNA pool, we added 8 new differential data blocks to existing pools, reaching 13.8 Kbytes of data stored from sessions 1 to 5. We performed nanopore sequencing and recovered the data from the MolFS sessions accurately and precisely.  more » « less
Award ID(s):
2027738
PAR ID:
10434162
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
FNANO
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract While the archival digital memory industry approaches its physical limits, the demand is significantly increasing, therefore alternatives emerge. Recent efforts have demonstrated DNA’s enormous potential as a digital storage medium with superior information durability, capacity, and energy consumption. However, the majority of the proposed systems require on-demand de-novo DNA synthesis techniques that produce a large amount of toxic waste and therefore are not industrially scalable and environmentally friendly. Inspired by the architecture of semiconductor memory devices and recent developments in gene editing, we created a molecular digital data storage system called “DNA Mutational Overwriting Storage” (DMOS) that stores information by leveraging combinatorial, addressable, orthogonal, and independent in vitro CRISPR base-editing reactions to write data on a blank pool of greenly synthesized DNA tapes. As a proof of concept, this work illustrates writing and accurately reading of both a bitmap representation of our school’s logo and the title of this study on the DNA tapes. 
    more » « less
  2. Ransomware is increasingly prevalent in recent years. To defend against ransomware in computing devices using flash memory as external storage, existing designs extract the entire raw flash memory data to restore the external storage to a good state. However, they cannot allow a fine-grained recovery in terms of user files as raw flash memory data do not have the semantics of "files". In this work, we design FFRecovery, a new ransomware defense strategy that can support fine-grained data recovery after the attacks. Our key idea is, to recover a file corrupted by the ransomware, we can 1) restore its file system metadata via file system forensics, and 2) extract its file data via raw data extraction from the flash translation layer, and 3) assemble the corresponding file system metadata and the file data. A simple prototype of FFRecovery has been developed and some preliminary results are provided. 
    more » « less
  3. Abstract Ransomware attacks are increasingly prevalent in recent years. Crypto-ransomware corrupts files on an infected device and demands a ransom to recover them. In computing devices using flash memory storage (e.g., SSD, MicroSD, etc.), existing designs recover the compromised data by extracting the entire raw flash memory image, restoring the entire external storage to a good prior state. This is feasible through taking advantage of the out-of-place updates feature implemented in the flash translation layer (FTL). However, due to the lack of “file” semantics in the FTL, such a solution does not allow a fine-grained data recovery in terms of files. Considering the file-centric nature of ransomware attacks, recovering the entire disk is mostly unnecessary. In particular, the user may just wish a speedy recovery of certain critical files after a ransomware attack. In this work, we have designed$$\textsf{FFRecovery}$$ FFRecovery , a new ransomware defense strategy that can support fine-grained per file data recovery after the ransomware attack. Our key idea is that, to restore a file corrupted by the ransomware, we (1) restore its file system metadata via file system forensics, and (2) extract its file data via raw data extraction from the FTL, and (3) assemble the corresponding file system metadata and the file data. Another essential aspect of$$\textsf{FFRecovery}$$ FFRecovery is that we add a garbage collection delay and freeze mechanism into the FTL so that no raw data will be lost prior to the recovery and, additionally, the raw data needed for the recovery can be always located. A prototype of$$\textsf{FFRecovery}$$ FFRecovery has been developed and our experiments using real-world ransomware samples demonstrate the effectiveness of$$\textsf{FFRecovery}$$ FFRecovery . We also demonstrate that$$\textsf{FFRecovery}$$ FFRecovery has negligible storage cost and performance impact. 
    more » « less
  4. null (Ed.)
    The total amount of data in the world has been increasing rapidly. However, the increase of data storage capacity is much slower than that of data generation. How to store and archive such a huge amount of data becomes critical and challenging. Synthetic Deoxyribonucleic Acid (DNA) storage is one of the promising candidates with high density and long-term preservation for archival storage systems. The existing works have focused on the achievable feasibility of a small amount of data when using DNA as storage. In this paper, we investigate the scalability and potentials of DNA storage when a huge amount of data, like all available data from the world, is to be stored. First, we investigate the feasible storage capability that can be achieved in a single DNA pool/tube based on current and future technologies. Then, the indexing of DNA storage is explored. Finally, the metadata overhead based on future technology trends is also investigated. 
    more » « less
  5. The challenge of deniability for sensitive data can be a life or death issue depending on location. Plausible deniability directly impacts groups such as democracy advocates relaying information in repressive regimes, journalists covering human rights stories in a war zone, and NGO workers hiding food shipment schedules from violent militias. All of whom would benefit from a plausibly deniable storage system. Previous de- niable storage solutions only offer pieces of an implementable solution. Artifice is the first tunable, operationally secure, self repairing, and fully deniable steganographic file system. Artifice operates through the use of a virtual block device driver stored separately from the hidden data. It uses external entropy sources and error-correcting codes to deniably and reliably store data within the unallocated space of an existing file system. A set of data blocks to be hidden are combined with entropy blocks through error-correcting codes to produce a set of obfuscated carrier blocks that are indistinguishable from other pseudorandom blocks on the disk. A subset of these blocks may then be used to reconstruct the data. Artifice presents a truly deniable storage solution through its use of external entropy and error-correcting codes while providing better reliability than other deniable storage systems. 
    more » « less