This content will become publicly available on March 1, 2026
Title: Standard Lyndon Loop Words: Weighted Orders
We generalize the study of standard Lyndon loop words from [16] to a more general class of orders on the underlying alphabet, as suggested in [16, Remark 3.15]. The main new ingredient is the exponent-tightness of these words, which also allows to generalize the construction of PBW bases of the untwisted quantum loop algebra $$U_{q}(L{{\mathfrak{g}}})$$ via the combinatorics of loop words. more »« less
Harp, Nicholas R.; Brown, Catherine C.; Neta, Maital
(, Social Psychological and Personality Science)
null
(Ed.)
Ambiguous stimuli are useful for assessing emotional bias. For example, surprised faces could convey a positive or negative meaning, and the degree to which an individual interprets these expressions as positive or negative represents their “valence bias.” Currently, the most well-validated ambiguous stimuli for assessing valence bias include nonverbal signals (faces and scenes), overlooking an inherent ambiguity in verbal signals. This study identified 32 words with dual-valence ambiguity (i.e., relatively high intersubject variability in valence ratings and relatively slow response times) and length-matched clearly valenced words (16 positive, 16 negative). Preregistered analyses demonstrated that the words-based valence bias correlated with the bias for faces, r s (213) = .27, p < .001, and scenes, r s (204) = .46, p < .001. That is, the same people who interpret ambiguous faces/scenes as positive also interpret ambiguous words as positive. These findings provide a novel tool for measuring valence bias and greater generalizability, resulting in a more robust measure of this bias.
Mueller, Aaron; Frank, Robert; Linzen, Tal; Wang, Luheng; Schuster, Sebastian
(, Findings of the Association for Computational Linguistics)
Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy sensitive manner when performing syntactic transformations—for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models have only observed models that were not pre-trained on natural language data before being trained to perform syntactic transformations, in spite of the fact that pre-training has been found to induce hierarchical linguistic generalizations in language models; in other words, the syntactic capabilities of seq2seq models may have been greatly understated. We address this gap using the pre-trained seq2seq models T5 and BART, as well as their multilingual variants mT5 and mBART. We evaluate whether they generalize hierarchically on two transformations in two languages: question formation and passivization in English and German. We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not. This result presents evidence for the learnability of hierarchical syntactic information from non-annotated natural language text while also demonstrating that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive.
Volkel, Kevin; Tomek, Kyle J.; Keung, Albert J.; Tuck, James M.
(, ACM Journal on Emerging Technologies in Computing Systems)
As interest in DNA-based information storage grows, the costs of synthesis have been identified as a key bottleneck. A potential direction is to tune synthesis for data. Data strands tend to be composed of a small set of recurring code word sequences, and they contain longer sequences of repeated data. To exploit these properties, we propose a new framework called DINOS. DINOS consists of three key parts: (i) The first is a hierarchical strand assembly algorithm, inspired by gene assembly techniques that can assemble arbitrary data strands from a small set of primitive blocks. (ii) The assembly algorithm relies on our novel formulation for how to construct primitive blocks, spanning a variety of useful configurations from a set of code words and overhangs. Each primitive block is a code word flanked by a pair of overhangs that are created by a cyclic pairing process that keeps the number of primitive blocks small. Using these primitive blocks, any data strand of arbitrary length can be assembled, theoretically. We show a minimal system for a binary code with as few as six primitive blocks, and we generalize our processes to support an arbitrary set of overhangs and code words. (iii) We exploit our hierarchical assembly approach to identify redundant sequences and coalesce the reactions that create them to make assembly more efficient. We evaluate DINOS and describe its key characteristics. For example, the number of reactions needed to make a strand can be reduced by increasing the number of overhangs or the number of code words, but increasing the number of overhangs offers a small advantage over increasing code words while requiring substantially fewer primitive blocks. However, density is improved more by increasing the number of code words. We also find that a simple redundancy coalescing technique is able to reduce reactions by 90.6% and 41.2% on average for decompressed and compressed data, respectively, even when the smallest data fragments being assembled are 16 bits. With a simple padding heuristic that finds even more redundancy, we can further decrease reactions for the same operating point up to 91.1% and 59% for decompressed and compressed data, respectively, on average. Our approach offers greater density by up to 80% over a prior general purpose gene assembly technique. Finally, in an analysis of synthesis costs in which we make 1 GB volume using de novo synthesis versus making only the primitive blocks with de novo synthesis and otherwise assembling using DINOS, we estimate DINOS as 10 5 × cheaper than de novo synthesis.
Mubayi, Dhruv
Verstraete
(, The Electronic journal of combinatorics)
Counting paths and walks in graphs have numerous applications, such as finding bounds on the spectral radius of the graph and the energy of a graph, as well as in standard graph theoretic reductions of combinatorial problems such as counting words over alphabets with forbidden patterns. In this paper, the authors strengthen and generalize Erdős and Simonovits's results on counting paths to counting isomorphic copies of trees in a graph of average degree d.
Dong, Lu; Wang, Xiao; Nwogu, Ifeoma
(, Association for Computational Linguistics)
Sign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars.
Khomych, Severyn, Korniichuk, Nazar, Molokanov, Kostiantyn, and Tsymbaliuk, Alexander. Standard Lyndon Loop Words: Weighted Orders. Retrieved from https://par.nsf.gov/biblio/10628578. International Mathematics Research Notices 2025.5 Web. doi:10.1093/imrn/rnaf030.
Khomych, Severyn, Korniichuk, Nazar, Molokanov, Kostiantyn, & Tsymbaliuk, Alexander. Standard Lyndon Loop Words: Weighted Orders. International Mathematics Research Notices, 2025 (5). Retrieved from https://par.nsf.gov/biblio/10628578. https://doi.org/10.1093/imrn/rnaf030
Khomych, Severyn, Korniichuk, Nazar, Molokanov, Kostiantyn, and Tsymbaliuk, Alexander.
"Standard Lyndon Loop Words: Weighted Orders". International Mathematics Research Notices 2025 (5). Country unknown/Code not available: IMRN. https://doi.org/10.1093/imrn/rnaf030.https://par.nsf.gov/biblio/10628578.
@article{osti_10628578,
place = {Country unknown/Code not available},
title = {Standard Lyndon Loop Words: Weighted Orders},
url = {https://par.nsf.gov/biblio/10628578},
DOI = {10.1093/imrn/rnaf030},
abstractNote = {We generalize the study of standard Lyndon loop words from [16] to a more general class of orders on the underlying alphabet, as suggested in [16, Remark 3.15]. The main new ingredient is the exponent-tightness of these words, which also allows to generalize the construction of PBW bases of the untwisted quantum loop algebra $U_{q}(L{{\mathfrak{g}}})$ via the combinatorics of loop words.},
journal = {International Mathematics Research Notices},
volume = {2025},
number = {5},
publisher = {IMRN},
author = {Khomych, Severyn and Korniichuk, Nazar and Molokanov, Kostiantyn and Tsymbaliuk, Alexander},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.