There are a set of primordial features and functions expected of any modern information system: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload, and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof of principle realization of this set of primordial capabilities would advance the vision for their continued development. Here, we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50 um diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2/mg with binding capacities of over 1012 DNA oligos/mg, 10 terabytes/mg, or 104 terabytes/cm3. This “dendricolloid” stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared to 60 times, respectively. Accelerated aging studies project half-lives of ~6000 and 2 million years at 4 ˚C and -18 ˚C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3x3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids. more »« less
Kim, Jangwon; Bae, Jin H.; Baym, Michael; Zhang, David Yu
(, Nature Communications)
null
(Ed.)
Abstract The potential of DNA as an information storage medium is rapidly growing due to advances in DNA synthesis and sequencing. However, the chemical stability of DNA challenges the complete erasure of information encoded in DNA sequences. Here, we encode information in a DNA information solution, a mixture of true message- and false message-encoded oligonucleotides, and enables rapid and permanent erasure of information. True messages are differentiated by their hybridization to a "truth marker” oligonucleotide, and only true messages can be read; binding of the truth marker can be effectively randomized even with a brief exposure to the elevated temperature. We show 8 separate bitmap images can be stably encoded and read after storage at 25 °C for 65 days with an average of over 99% correct information recall, which extrapolates to a half-life of over 15 years at 25 °C. Heating to 95 °C for 5 minutes, however, permanently erases the message.
Volkel, Kevin D; Hook, Paul W; Keung, Albert; Timp, Winston; Tuck, James M
(, Bioinformatics)
Mathelier, Anthony
(Ed.)
Abstract MotivationAs nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore’s capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs. ResultsWe demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 s/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257× with the trade-off of a higher byte error rate of 3.52% compared to the state of the art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4× larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error-free reads when compared to DNA. Availability and implementationSource code for our soft decoder and data used to generate figures is available publicly in the Github repository https://github.com/dna-storage/hedges-soft-decoder (10.5281/zenodo.11454877). All raw FAST5/FASTQ data are available at 10.5281/zenodo.11985454 and 10.5281/zenodo.12014515.
Naufer, M Nabuan; Morse, Michael; Möller, Guðfríður Björg; McIsaac, James; Rouzina, Ioulia; Beuning, Penny J; Williams, Mark C
(, Nucleic Acids Research)
null
(Ed.)
Abstract Escherichia coli SSB (EcSSB) is a model single-stranded DNA (ssDNA) binding protein critical in genome maintenance. EcSSB forms homotetramers that wrap ssDNA in multiple conformations to facilitate DNA replication and repair. Here we measure the binding and wrapping of many EcSSB proteins to a single long ssDNA substrate held at fixed tensions. We show EcSSB binds in a biphasic manner, where initial wrapping events are followed by unwrapping events as ssDNA-bound protein density passes critical saturation and high free protein concentration increases the fraction of EcSSBs in less-wrapped conformations. By destabilizing EcSSB wrapping through increased substrate tension, decreased substrate length, and protein mutation, we also directly observe an unstable bound but unwrapped state in which ∼8 nucleotides of ssDNA are bound by a single domain, which could act as a transition state through which rapid reorganization of the EcSSB–ssDNA complex occurs. When ssDNA is over-saturated, stimulated dissociation rapidly removes excess EcSSB, leaving an array of stably-wrapped complexes. These results provide a mechanism through which otherwise stably bound and wrapped EcSSB tetramers are rapidly removed from ssDNA to allow for DNA maintenance and replication functions, while still fully protecting ssDNA over a wide range of protein concentrations.
Abstract Land use change (LUC) alters the global carbon (C) stock, but our estimation of the alteration remains uncertain and is a major impediment to predicting the global C cycle. The uncertainty is partly due to the limited number and geographical bias of observations, and limited exploration of its predictors. Here we generated a comprehensive global database of 5,980 observations from 790 articles. The number of sites evaluated is at least seven times larger than in previous meta‐analyses. Our constrained estimates of different LUC's effects on soil organic C (SOC) and their variations across global climates reveal underestimation/overestimation in previous estimates. Converting forests and grasslands to croplands reduced SOC by 24.5% ± 1.53% (−11.03 ± 1.06 Mg ha−1) and 22.7% ± 1.22% (−8.09 ± 0.67 Mg ha−1), while 28.0% ± 1.56% (4.46 ± 0.42 Mg ha−1) and 33.5% ± 1.68% (5.8 ± 0.38 Mg ha−1) increases, respectively, were obtained in the reverse processes. Converting forests to grasslands decreased SOC by 2.1% ± 1.22% (−1.13 ± 0.44 Mg ha−1), while the reverse process increased SOC by 18.6% ± 1.73% (3.31 ± 0.51 Mg ha−1). Modeled relative importance of 10 drivers of LUC's impact on SOC revealed that higher initial SOC (iSOC) does not solely determine SOC loss in SOC‐negative LUC scenarios as previously proposed. Across four decades, reconverting croplands to forests and grasslands recovered only 49.5% (6.1 ± 0.51 Mg ha−1) and 75.3% (7.0 ± 0.38 Mg ha−1) of the iSOC, respectively, indicating the need for protecting C‐rich ecosystems. Our global data set advances information on LUC's effect on SOC and can be valuable to constrain Earth system models to reliably estimate global SOC stocks and plan climate change mitigation strategies.
Singh, Siddhesh Pratap; Gabriel, Edgar
(, 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID))
Many scientific applications operate on data sets that span hundreds of Gigabytes or even Terabytes in size. Large data sets often use compression to reduce the size of the files. Yet as of today, parallel I/O libraries do not support reading and writing compressed files, necessitating either expensive sequential compression/decompression operations before/after the simulation, or omitting advanced features of parallel I/O libraries, such as collective I/O operations. This paper introduces parallel I/O on compressed data files, discusses the key challenges, requirements, and solutions for supporting compressed data files in MPI I/O, as well as limitations on some MPI I/O operations when using compressed data files. The paper details handling of individual read and write operations of compressed data files, and presents an extension to the two-phase collective I/O algorithm to support data compression. The paper further presents and evaluates an implementation based on the Snappy compression library and the OMPIO parallel I/O framework. The performance evaluation using multiple data sets demonstrate significant performance benefits when using data compression on a parallel BeeGFS file system.
Lin, Kevin N, Volkel, Kevin, Cao, Cyrus, Hook, Paul W, Polak, Rachel E, Clark, Andrew S, San_Miguel, Adriana, Timp, Winston, Tuck, James M, Velev, Orlin D, and Keung, Albert J. A primordial DNA store and compute engine. Retrieved from https://par.nsf.gov/biblio/10549874. Nature Nanotechnology . Web. doi:10.1038/s41565-024-01771-6.
Lin, Kevin N, Volkel, Kevin, Cao, Cyrus, Hook, Paul W, Polak, Rachel E, Clark, Andrew S, San_Miguel, Adriana, Timp, Winston, Tuck, James M, Velev, Orlin D, & Keung, Albert J. A primordial DNA store and compute engine. Nature Nanotechnology, (). Retrieved from https://par.nsf.gov/biblio/10549874. https://doi.org/10.1038/s41565-024-01771-6
Lin, Kevin N, Volkel, Kevin, Cao, Cyrus, Hook, Paul W, Polak, Rachel E, Clark, Andrew S, San_Miguel, Adriana, Timp, Winston, Tuck, James M, Velev, Orlin D, and Keung, Albert J.
"A primordial DNA store and compute engine". Nature Nanotechnology (). Country unknown/Code not available: Springer. https://doi.org/10.1038/s41565-024-01771-6.https://par.nsf.gov/biblio/10549874.
@article{osti_10549874,
place = {Country unknown/Code not available},
title = {A primordial DNA store and compute engine},
url = {https://par.nsf.gov/biblio/10549874},
DOI = {10.1038/s41565-024-01771-6},
abstractNote = {There are a set of primordial features and functions expected of any modern information system: a substrate stably carrying data; the ability to repeatedly write, read, erase, reload, and compute on specific data from that substrate; and the overall ability to execute such functions in a seamless and programmable manner. For nascent molecular information technologies, proof of principle realization of this set of primordial capabilities would advance the vision for their continued development. Here, we present a DNA-based store and compute engine that captures these primordial capabilities. This system comprises multiple image files encoded into DNA and adsorbed onto ~50 um diameter, highly porous, hierarchically branched, colloidal substrate particles comprised of naturally abundant cellulose acetate. Their surface areas are over 200 cm2/mg with binding capacities of over 1012 DNA oligos/mg, 10 terabytes/mg, or 104 terabytes/cm3. This “dendricolloid” stably holds DNA files better than bare DNA with an extrapolated ability to be repeatedly lyophilized and rehydrated over 170 times compared to 60 times, respectively. Accelerated aging studies project half-lives of ~6000 and 2 million years at 4 ˚C and -18 ˚C, respectively. The data can also be erased and replaced, and non-destructive file access is achieved through transcribing from distinct synthetic promoters. The resultant RNA molecules can be directly read via nanopore sequencing and can also be enzymatically computed to solve simplified 3x3 chess and sudoku problems. Our study establishes a feasible route for utilizing the high information density and parallel computational advantages of nucleic acids.},
journal = {Nature Nanotechnology},
publisher = {Springer},
author = {Lin, Kevin N and Volkel, Kevin and Cao, Cyrus and Hook, Paul W and Polak, Rachel E and Clark, Andrew S and San_Miguel, Adriana and Timp, Winston and Tuck, James M and Velev, Orlin D and Keung, Albert J},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.