skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Promiscuous molecules for smarter file operations in DNA-based data storage
Abstract DNA holds significant promise as a data storage medium due to its density, longevity, and resource and energy conservation. These advantages arise from the inherent biomolecular structure of DNA which differentiates it from conventional storage media. The unique molecular architecture of DNA storage also prompts important discussions on how data should be organized, accessed, and manipulated and what practical functionalities may be possible. Here we leverage thermodynamic tuning of biomolecular interactions to implement useful data access and organizational features. Specific sets of environmental conditions including distinct DNA concentrations and temperatures were screened for their ability to switchably access either all DNA strands encoding full image files from a GB-sized background database or subsets of those strands encoding low resolution, File Preview, versions. We demonstrate File Preview with four JPEG images and provide an argument for the substantial and practical economic benefit of this generalizable strategy to organize data.  more » « less
Award ID(s):
2027655 1901324 1650148
PAR ID:
10248725
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
12
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The physical architectures of information storage systems often dictate how information is encoded, databases are organized, and files are accessed. Here we show that a simple architecture comprised of a T7 promoter and a single-stranded overhang domain (ss-dsDNA), can unlock dynamic DNA-based information storage with powerful capabilities and advantages. The overhang provides a physical address for accessing specific DNA strands as well as implementing a range of in-storage file operations. It increases theoretical storage densities and capacities by expanding the encodable sequence space and simplifies the computational burden in designing sets of orthogonal file addresses. Meanwhile, the T7 promoter enables repeatable information access by transcribing information from DNA without destroying it. Furthermore, saturation mutagenesis around the T7 promoter and systematic analyses of environmental conditions reveal design criteria that can be used to optimize information access. This simple but powerful ss-dsDNA architecture lays the foundation for information storage with versatile capabilities. 
    more » « less
  2. As the volume of data is rapidly produced every day, there is a need for the storage media to keep up with the growth rate of digital data created. Despite emerging storage solutions that have been proposed such as Solid State Drive (SSD) with quad-level cells (QLC) or penta-level cells (PLC), Shingled Magnetic Recording (SMR), LTO-tape, etc., these technologies still fall short of meeting the demand for preserving huge amounts of available data. Moreover, current storage solutions have a limited lifespan, often lasting just a few years. To ensure long-term preservation, data must be continuously migrated to new storage drives. Therefore, there is a need for alternative storage technologies that not only offer high storage capacity but also long persistency. In contrast to existing storage devices, Synthetic Deoxyribonucleic Acid (DNA) storage emerges as a promising candidate for archival data storage, offering both high-density storage capacity and the potential for long-term data preservation. In this paper, we will introduce DNA storage, discuss the capabilities of DNA storage based on the current biotechnologies, discuss possible improvements in DNA storage, and explore further improvements with future technologies. Currently, the limitations of DNA storage are due to its weaknesses including high error rates, long access latency, etc. In this paper, we will focus on possible DNA storage research issues based on its relevant bio and computer technologies. Also, we will provide potential solutions and forward-looking predictions about the development and the future of DNA storage. We will discuss DNA storage from the following five perspectives: 1) We will describe the basic background of DNA storage including the basic technologies of read/write DNA storage, data access processes such as Polymerase Chain Reaction (PCR) based random access, encoding schemes from digital data to DNA, and required DNA storage format. 2) We will describe the issues of DNA storage based on the current technologies including bio-constraints during the encoding process such as avoiding long homopolymers and containing certain GC contents, different types of errors in synthesis and sequencing processes, low practical capacity with the current technologies, slow read and write performance, and low encoding density for random accesses. 3) Based on the previously mentioned issues, we will summarize the current solutions for each issue, and also give and discuss the potential solutions based on the future technologies. 4) From a system perspective, we will discuss how the DNA storage system will look if the DNA storage becomes commercialized and is widely equipped in archive systems. Some questions will be discussed including i) How to efficiently index data in DNA storage? ii) What is a good storage hierarchical storage system with DNA storage? iii) What will DNA storage be like with the development of technology? 5) Finally, we will provide a comparison with other competitive technologies. 
    more » « less
  3. With the rapid increase of available digital data, we are searching for a storage media with high density and capability of long-term preservation. Deoxyribonucleic Acid (DNA) storage is identified as such a promising candidate, especially for archival storage systems. However, the encoding density (i.e., how many binary bits can be encoded into one nucleotide) and error handling are two major factors intertwined in DNA storage. Considering encoding density, theoretically, one nucleotide (i.e., A, T, G, or C) can encode two binary bits (upper bound). However, due to biochemical constraints and other necessary information associated with payload, currently the encoding densities of various DNA storage systems are much less than this upper bound. Additionally, all existing studies of DNA encoding schemes are based on static analysis and really lack the awareness of dynamically changed digital patterns. Therefore, the gap between the static encoding and dynamic binary patterns prevents achieving a higher encoding density for DNA storage systems. In this paper, we propose a new Digital Pattern-Aware DNA storage system, called DP-DNA, which can efficiently store digital data in the DNA storage with high encoding density. DP-DNA maintains a set of encoding codes and uses a digital pattern-aware code (DPAC) to analyze the patterns of a binary sequence for a DNA strand and selects an appropriate code for encoding the binary sequence to achieve a high encoding density. An additional encoding field is added to the DNA encoding format, which can distinguish the encoding scheme used for those DNA strands, and thus we can decode DNA data back to its original digital data. Moreover, to further improve the encoding density, a variable-length scheme is proposed to increase the feasibility of the code scheme with a high encoding density. Finally, the experimental results indicate that the proposed DP-DNA achieves up to 103.5% higher encoding densities than prior work. 
    more » « less
  4. Deoxyribonucleic Acid (DNA), with its ultra-high storage density and long durability, is a promising long-term archival storage medium and is attracting much attention today. A DNA storage system encodes and stores digital data with synthetic DNA sequences and decodes DNA sequences back to digital data via sequencing. Many encoding schemes have been proposed to enlarge DNA storage capacity by increasing DNA encoding density. However, only increasing encoding density is insufficient because enhancing DNA storage capacity is a multifaceted problem. This paper assumes that random accesses are necessary for practical DNA archival storage. We identify all factors affecting DNA storage capacity under current technologies and systematically investigate the practical DNA storage capacity with several popular encoding schemes. The investigation result shows the collision between primers and DNA payload sequences is a major factor limiting DNA storage capacity. Based on this discovery, we designed a new encoding scheme called Collision Aware Code (CAC) to trade some encoding density for the reduction of primer-payload collisions. Compared with the best result among the five existing encoding schemes, CAC can extricate 120% more primers from collisions and increase the DNA tube capacity from 211.96 GB to 295.11 GB. Besides, we also evaluate CAC's recoverability from DNA storage errors. The result shows CAC is comparable to those of existing encoding schemes. 
    more » « less
  5. Abstract RNA-driven phase separation is emerging as a promising approach for engineering biomolecular condensates with diverse functionalities. Condensates form thanks to weak yet specific RNA–RNA interactions established by design via complementary sequence domains. Here, we demonstrate how RNA condensates formed by star-shaped RNA motifs, or nanostars, can be dynamically controlled when the motifs include additional linear or branch-loop domains that facilitate access of regulatory RNA molecules to the nanostar interaction domains. We show that condensates dissolve in the presence of RNA “invaders” that occlude selected nanostar bonds and reduce the valency of the nanostars, preventing phase separation. We further demonstrate that the introduction of “anti-invader” strands, complementary to the invaders, makes it possible to restore condensate formation. An important aspect of our experiments is that we demonstrate these behaviors in one-pot reactions, where RNA nanostars, invaders, and anti-invaders are simultaneously transcribed in vitro using short DNA templates. Our results lay the groundwork for engineering RNA-based assemblies with tunable, reversible condensation, providing a promising toolkit for synthetic biology applications requiring responsive, self-organizing biomolecular materials. 
    more » « less