NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unsupervised Morphological Tree Tokenizer

Zhu, Qingyang; Hu, Xiang; Ji, Pengyu; Wu, Wei; Tu, Kewei (July 2025, Findings of the Association for Computational Linguistics (ACL 2025))

As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks.
more » « less
Free, publicly-accessible full text available July 26, 2026
Site-selective doublon-holon dynamics in a pumped one-dimensional Hubbard superlattice with staggered Coulomb interactions

https://doi.org/10.1103/PhysRevB.109.195121

Cheng, Zhenyu; Li, Ying; Lu, Hantao; Hu, Xiang; Huang, Zhongbing; Fiete, Gregory_A; Du, Liang (May 2024, Physical Review B)
Quench dynamics in the one-dimensional mass-imbalanced ionic Hubbard model

https://doi.org/10.1103/PhysRevB.107.195147

Xie, Zhuotao; Zhao, Ming; Lu, Hantao; Huang, Zhongbing; Fiete, Gregory A; Hu, Xiang; Du, Liang (May 2023, Physical Review B)

Full Text Available
Second-order Dirac superconductors and magnetic field induced Majorana hinge modes

https://doi.org/10.1103/PhysRevB.100.020509

Ghorashi, Sayed Ali Akbar; Hu, Xiang; Hughes, Taylor L.; Rossi, Enrico (July 2019, Physical Review B)

Search for: All records