skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DNA Origami Words and Rewriting Systems
We classify rectangular DNA origami structures according to their scaffold and staples organization by associating a graphical representation to each scaffold folding. Inspired by well studied Temperley-Lieb algebra, we identify basic modules that form the structures. The graphical description is obtained by ‘gluing’ basic modules one on top of the other. To each module we associate a symbol such that gluing of molecules corresponds to concatenating the associated symbols. Every word corresponds to a graphical representation of a DNA origami structure. A set of rewriting rules defines equivalent words that correspond to the same graphical structure. We propose two different types of basic module structures and corresponding rewriting rules. For each type, we provide the number of all possible structures through the number of equivalence classes of words. We also give a polynomial time algorithm that computes the shortest word for each equivalence class.  more » « less
Award ID(s):
1800443
PAR ID:
10097999
Author(s) / Creator(s):
; ; ;
Editor(s):
McQuillan, I.; Seki, S.
Date Published:
Journal Name:
Unconventional Computation and Natural Computation
Volume:
11493
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ćirić, M.; Droste, M.; Pin, JÉ. (Ed.)
    We initiate an algebraic approach to study DNA origami structures. We identify two types of basic building blocks and describe a DNA origami structure by their composition. These building blocks are taken as generators of a monoid, called the origami monoid, and motivated by the well studied Temperley-Lieb algebras, we identify a set of relations that characterize the origami monoid. We present several observations about Green’s relations for the origami monoid and study the relations to a direct product of Jones monoids, which is a morphic image of an origami monoid. 
    more » « less
  2. Traditionally, many text-mining tasks treat individual word-tokens as the finest meaningful semantic granularity. However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures. Word-level analysis cannot leverage the semantic information present in such subword structures. With regard to word embedding techniques, this leads to not only poor embeddings for infrequent words in long-tailed text corpora but also weak capabilities for handling out-of-vocabulary words. In this paper we propose MorphMine for unsupervised morpheme segmentation. MorphMine applies a parsimony criterion to hierarchically segment words into the fewest number of morphemes at each level of the hierarchy. This leads to longer shared morphemes at each level of segmentation. Experiments show that MorphMine segments words in a variety of languages into human-verified morphemes. Additionally, we experimentally demonstrate that utilizing MorphMine morphemes to enrich word embeddings consistently improves embedding quality on a variety of of embedding evaluations and a downstream language modeling task. 
    more » « less
  3. null (Ed.)
    RNA origami is a framework for the modular design of nanoscaffolds that can be folded from a single strand of RNA and used to organize molecular components with nanoscale precision. The design of genetically expressible RNA origami, which must fold cotranscriptionally, requires modelling and design tools that simultaneously consider thermodynamics, the folding pathway, sequence constraints and pseudoknot optimization. Here, we describe RNA Origami Automated Design software (ROAD), which builds origami models from a library of structural modules, identifies potential folding barriers and designs optimized sequences. Using ROAD, we extend the scale and functional diversity of RNA scaffolds, creating 32 designs of up to 2,360 nucleotides, five that scaffold two proteins, and seven that scaffold two small molecules at precise distances. Micrographic and chromatographic comparisons of optimized and non-optimized structures validate that our principles for strand routing and sequence design substantially improve yield. By providing efficient design of RNA origami, ROAD may simplify the construction of custom RNA scaffolds for nanomedicine and synthetic biology. 
    more » « less
  4. null (Ed.)
    Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods. 
    more » « less
  5. null (Ed.)
    Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we-9*6 propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3–4% with less than 1/5th the training parameters compared to other word embedding methods. 
    more » « less