Abstract DNA has emerged as a promising material to address growing data storage demands. We recently demonstrated a structure-based DNA data storage approach where DNA probes are spatially oriented on the surface of DNA origami and decoded using DNA-PAINT. In this approach, larger origami structures could improve the efficiency of reading and writing data. However, larger origami require long single-stranded DNA scaffolds that are not commonly available. Here, we report the engineering of a novel longer DNA scaffold designed to produce a larger rectangle origami needed to expand the origami-based digital nucleic acid memory (dNAM) approach. We confirmed that this scaffold self-assembled into the correct origami platform and correctly positioned DNA data strands using atomic force microscopy and DNA-PAINT super-resolution microscopy. This larger structure enables a 67% increase in the number of data points per origami and will support efforts to efficiently scale up origami-based dNAM.
more »
« less
DNA Origami Words and Rewriting Systems
We classify rectangular DNA origami structures according to their scaffold and staples organization by associating a graphical representation to each scaffold folding. Inspired by well studied Temperley-Lieb algebra, we identify basic modules that form the structures. The graphical description is obtained by ‘gluing’ basic modules one on top of the other. To each module we associate a symbol such that gluing of molecules corresponds to concatenating the associated symbols. Every word corresponds to a graphical representation of a DNA origami structure. A set of rewriting rules defines equivalent words that correspond to the same graphical structure. We propose two different types of basic module structures and corresponding rewriting rules. For each type, we provide the number of all possible structures through the number of equivalence classes of words. We also give a polynomial time algorithm that computes the shortest word for each equivalence class.
more »
« less
- Award ID(s):
- 1800443
- PAR ID:
- 10097999
- Editor(s):
- McQuillan, I.; Seki, S.
- Date Published:
- Journal Name:
- Unconventional Computation and Natural Computation
- Volume:
- 11493
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Ćirić, M.; Droste, M.; Pin, JÉ. (Ed.)We initiate an algebraic approach to study DNA origami structures. We identify two types of basic building blocks and describe a DNA origami structure by their composition. These building blocks are taken as generators of a monoid, called the origami monoid, and motivated by the well studied Temperley-Lieb algebras, we identify a set of relations that characterize the origami monoid. We present several observations about Green’s relations for the origami monoid and study the relations to a direct product of Jones monoids, which is a morphic image of an origami monoid.more » « less
-
Traditionally, many text-mining tasks treat individual word-tokens as the finest meaningful semantic granularity. However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures. Word-level analysis cannot leverage the semantic information present in such subword structures. With regard to word embedding techniques, this leads to not only poor embeddings for infrequent words in long-tailed text corpora but also weak capabilities for handling out-of-vocabulary words. In this paper we propose MorphMine for unsupervised morpheme segmentation. MorphMine applies a parsimony criterion to hierarchically segment words into the fewest number of morphemes at each level of the hierarchy. This leads to longer shared morphemes at each level of segmentation. Experiments show that MorphMine segments words in a variety of languages into human-verified morphemes. Additionally, we experimentally demonstrate that utilizing MorphMine morphemes to enrich word embeddings consistently improves embedding quality on a variety of of embedding evaluations and a downstream language modeling task.more » « less
-
null (Ed.)Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods.more » « less
-
null (Ed.)Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods.more » « less
An official website of the United States government

