skip to main content


Title: SIMD||DNA: Single Instruction, Multiple Data Computation with DNA Strand Displacement Cascades
Typical DNA storage schemes do not allow in-memory computation, and instead transformation of the stored data requires DNA sequencing, electronic computation of the transformation, followed by synthesizing new DNA. In contrast we propose a model of in-memory computation that avoids the time consuming and expensive sequencing and synthesis steps, with computation carried out by DNA strand displacement. We demonstrate the flexibility of our approach by developing schemes for massively parallel binary counting and elementary cellular automaton Rule 110 computation.  more » « less
Award ID(s):
1652824 1618895
NSF-PAR ID:
10110021
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on DNA Computing and Molecular Programming. DNA 2019.
Volume:
11648
Page Range / eLocation ID:
219-235
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The storage of data in DNA typically involves encoding and synthesizing data into short oligonucleotides, followed by reading with a sequencing instrument. Major challenges include the molecular consumption of synthesized DNA, basecalling errors, and limitations with scaling up read operations for individual data elements. Addressing these challenges, we describe a DNA storage system called MDRAM (Magnetic DNA-based Random Access Memory) that enables repetitive and efficient readouts of targeted files with nanopore-based sequencing. By conjugating synthesized DNA to magnetic agarose beads, we enabled repeated data readouts while preserving the original DNA analyte and maintaining data readout quality. MDRAM utilizes an efficient convolutional coding scheme that leverages soft information in raw nanopore sequencing signals to achieve information reading costs comparable to Illumina sequencing despite higher error rates. Finally, we demonstrate a proof-of-concept DNA-based proto-filesystem that enables an exponentially-scalable data address space using only small numbers of targeting primers for assembly and readout.

     
    more » « less
  2. Abstract Motivation

    In the past few years, researchers have proposed numerous indexing schemes for searching large datasets of raw sequencing experiments. Most of these proposed indexes are approximate (i.e. with one-sided errors) in order to save space. Recently, researchers have published exact indexes—Mantis, VariMerge and Bifrost—that can serve as colored de Bruijn graph representations in addition to serving as k-mer indexes. This new type of index is promising because it has the potential to support more complex analyses than simple searches. However, in order to be useful as indexes for large and growing repositories of raw sequencing data, they must scale to thousands of experiments and support efficient insertion of new data.

    Results

    In this paper, we show how to build a scalable and updatable exact raw sequence-search index. Specifically, we extend Mantis using the Bentley–Saxe transformation to support efficient updates, called Dynamic Mantis. We demonstrate Dynamic Mantis’s scalability by constructing an index of ≈40K samples from SRA by adding samples one at a time to an initial index of 10K samples. Compared to VariMerge and Bifrost, Dynamic Mantis is more efficient in terms of index-construction time and memory, query time and memory and index size. In our benchmarks, VariMerge and Bifrost scaled to only 5K and 80 samples, respectively, while Dynamic Mantis scaled to more than 39K samples. Queries were over 24× faster in Mantis than in Bifrost (VariMerge does not immediately support general search queries we require). Dynamic Mantis indexes were about 2.5× smaller than Bifrost’s indexes and about half as big as VariMerge’s indexes.

    Availability and implementation

    Dynamic Mantis implementation is available at https://github.com/splatlab/mantis/tree/mergeMSTs.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Classified as a complex big data analytics problem, DNA short read alignment serves as a major sequential bottleneck to massive amounts of data generated by next-generation sequencing platforms. With Von-Neumann computing architectures struggling to address such computationally-expensive and memory-intensive task today, Processing-in-Memory (PIM) platforms are gaining growing interests. In this paper, an energy-efficient and parallel PIM accelerator (AlignS) is proposed to execute DNA short read alignment based on an optimized and hardware-friendly alignment algorithm. We first develop AlignS platform that harnesses SOT-MRAM as computational memory and transforms it to a fundamental processing unit for short read alignment. Accordingly, we present a novel, customized, highly parallel read alignment algorithm that only seeks the proposed simple and parallel in-memory operations (i.e. comparisons and additions). AlignS is then optimized through a new correlated data partitioning and mapping methodology that allows local storage and processing of DNA sequence to fully exploit the algorithm-level's parallelism, and to accelerate both exact and inexact matches. The device-to-architecture co-simulation results show that AlignS improves the short read alignment throughput per Watt per mm^2 by ~12X compared to the ASIC accelerator. Compared to recent FM-index-based ReRAM platform, AlignS achieves 1.6X higher throughput per Watt. 
    more » « less
  4. With the emergence of portable DNA sequencers, such as Oxford Nanopore Technology MinION, metagenomic DNA sequencing can be performed in real-time and directly in the field. However, because metagenomic DNA analysis is computationally and memory intensive, and the current methods are designed for batch processing, the current metagenomic tools are not well suited for mobile devices. In this paper, we propose a new memory-efficient method to identify Operational Taxonomic Units (OTUs) in metagenomic DNA streams. Our method is based on finding connected components in overlap graphs constructed over a real-time stream of long DNA reads as produced by MinION platform. We propose an efficient algorithm to maintain connected components when an overlap graph is streamed, and show how redundant information can be removed from the stream by transitive closures. Through experiments on simulated and real-world metagenomic data, we demonstrate that the resulting solution is able to recover OTUs with high precision while remaining suitable for mobile computing devices. 
    more » « less
  5. Abstract

    In nature, plants experience rapid changes in light intensity and quality throughout the day. To maximize growth, they have established molecular mechanisms to optimize photosynthetic output while protecting components of the light‐dependent reaction and CO2fixation pathways. Plant phenotyping of mutant collections has become a powerful tool to unveil the genetic loci involved in environmental acclimation. Here, we describe the phenotyping of the transfer‐DNA (T‐DNA) insertion mutant line SALK_008491, previously known asnhd1‐1. Growth in a fluctuating light regime caused a loss in growth rate accompanied by a spike in photosystem (PS) II damage and increased non‐photochemical quenching (NPQ). Interestingly, an independentnhd1null allele did not recapitulate the NPQ phenotype. Through bulk sequencing of a backcrossed segregating F2pool, we identified an ~14‐kb large deletion on chromosome 3 (Chr3) in SALK_008491 affecting five genes upstream ofNHD1. BesidesNHD1, which encodes for a putative plastid Na+/H+antiporter, the stromal NAD‐dependent D‐3‐phosphoglycerate dehydrogenase 3 (PGDH3) locus was eradicated. Although some changes in the SALK_008491 mutant's photosynthesis can be assigned to the loss of PGDH3, our follow‐up studies employing respective single mutants and complementation with overlapping transformation‐competent artificial chromosome (TAC) vectors reveal that the exacerbated fluctuating light sensitivity in SALK_008491 mutants result from the simultaneous loss of PGDH3 and NHD1. Altogether, the data obtained from this large deletion‐carrying mutant provide new and unintuitive insights into the molecular mechanisms that function to protect the photosynthetic machinery. Moreover, our study renews calls for caution when setting up reverse genetic studies using T‐DNA lines. Although second‐site insertions, indels, and SNPs have been reported before, large deletion surrounding the insertion site causes yet another problem. Nevertheless, as shown through this research, such unpredictable genetic events following T‐DNA mutagenesis can provide unintuitive insights that allow for understanding complex phenomena such as the plant acclimation to dynamic high light stress.

     
    more » « less