skip to main content


This content will become publicly available on August 22, 2025

Title: A primordial DNA store and compute engine
There are a set of primordial features and functions expected of any modern information system: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload, and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof of principle realization of this set of primordial capabilities would advance the vision for their continued development. Here, we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50 um diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2/mg with binding capacities of over 1012 DNA oligos/mg, 10 terabytes/mg, or 104 terabytes/cm3. This “dendricolloid” stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared to 60 times, respectively. Accelerated aging studies project half-lives of ~6000 and 2 million years at 4 ˚C and -18 ˚C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3x3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids.  more » « less
Award ID(s):
1901324
PAR ID:
10549874
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
Nature Nanotechnology
ISSN:
1748-3387
Subject(s) / Keyword(s):
DNA storage, molecular information, transcription, data, dendricolloid, computer, computation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract The potential of DNA as an information storage medium is rapidly growing due to advances in DNA synthesis and sequencing. However, the chemical stability of DNA challenges the complete erasure of information encoded in DNA sequences. Here, we encode information in a DNA information solution, a mixture of true message- and false message-encoded oligonucleotides, and enables rapid and permanent erasure of information. True messages are differentiated by their hybridization to a "truth marker” oligonucleotide, and only true messages can be read; binding of the truth marker can be effectively randomized even with a brief exposure to the elevated temperature. We show 8 separate bitmap images can be stably encoded and read after storage at 25 °C for 65 days with an average of over 99% correct information recall, which extrapolates to a half-life of over 15 years at 25 °C. Heating to 95 °C for 5 minutes, however, permanently erases the message. 
    more » « less
  2. null (Ed.)
    Abstract Escherichia coli SSB (EcSSB) is a model single-stranded DNA (ssDNA) binding protein critical in genome maintenance. EcSSB forms homotetramers that wrap ssDNA in multiple conformations to facilitate DNA replication and repair. Here we measure the binding and wrapping of many EcSSB proteins to a single long ssDNA substrate held at fixed tensions. We show EcSSB binds in a biphasic manner, where initial wrapping events are followed by unwrapping events as ssDNA-bound protein density passes critical saturation and high free protein concentration increases the fraction of EcSSBs in less-wrapped conformations. By destabilizing EcSSB wrapping through increased substrate tension, decreased substrate length, and protein mutation, we also directly observe an unstable bound but unwrapped state in which ∼8 nucleotides of ssDNA are bound by a single domain, which could act as a transition state through which rapid reorganization of the EcSSB–ssDNA complex occurs. When ssDNA is over-saturated, stimulated dissociation rapidly removes excess EcSSB, leaving an array of stably-wrapped complexes. These results provide a mechanism through which otherwise stably bound and wrapped EcSSB tetramers are rapidly removed from ssDNA to allow for DNA maintenance and replication functions, while still fully protecting ssDNA over a wide range of protein concentrations. 
    more » « less
  3. Many scientific applications operate on data sets that span hundreds of Gigabytes or even Terabytes in size. Large data sets often use compression to reduce the size of the files. Yet as of today, parallel I/O libraries do not support reading and writing compressed files, necessitating either expensive sequential compression/decompression operations before/after the simulation, or omitting advanced features of parallel I/O libraries, such as collective I/O operations. This paper introduces parallel I/O on compressed data files, discusses the key challenges, requirements, and solutions for supporting compressed data files in MPI I/O, as well as limitations on some MPI I/O operations when using compressed data files. The paper details handling of individual read and write operations of compressed data files, and presents an extension to the two-phase collective I/O algorithm to support data compression. The paper further presents and evaluates an implementation based on the Snappy compression library and the OMPIO parallel I/O framework. The performance evaluation using multiple data sets demonstrate significant performance benefits when using data compression on a parallel BeeGFS file system. 
    more » « less
  4. Abstract

    The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.

     
    more » « less
  5. Abstract

    Land use change (LUC) alters the global carbon (C) stock, but our estimation of the alteration remains uncertain and is a major impediment to predicting the global C cycle. The uncertainty is partly due to the limited number and geographical bias of observations, and limited exploration of its predictors. Here we generated a comprehensive global database of 5,980 observations from 790 articles. The number of sites evaluated is at least seven times larger than in previous meta‐analyses. Our constrained estimates of different LUC's effects on soil organic C (SOC) and their variations across global climates reveal underestimation/overestimation in previous estimates. Converting forests and grasslands to croplands reduced SOC by 24.5% ± 1.53% (−11.03 ± 1.06 Mg ha−1) and 22.7% ± 1.22% (−8.09 ± 0.67 Mg ha−1), while 28.0% ± 1.56% (4.46 ± 0.42 Mg ha−1) and 33.5% ± 1.68% (5.8 ± 0.38 Mg ha−1) increases, respectively, were obtained in the reverse processes. Converting forests to grasslands decreased SOC by 2.1% ± 1.22% (−1.13 ± 0.44 Mg ha−1), while the reverse process increased SOC by 18.6% ± 1.73% (3.31 ± 0.51 Mg ha−1). Modeled relative importance of 10 drivers of LUC's impact on SOC revealed that higher initial SOC (iSOC) does not solely determine SOC loss in SOC‐negative LUC scenarios as previously proposed. Across four decades, reconverting croplands to forests and grasslands recovered only 49.5% (6.1 ± 0.51 Mg ha−1) and 75.3% (7.0 ± 0.38 Mg ha−1) of the iSOC, respectively, indicating the need for protecting C‐rich ecosystems. Our global data set advances information on LUC's effect on SOC and can be valuable to constrain Earth system models to reliably estimate global SOC stocks and plan climate change mitigation strategies.

     
    more » « less