skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Hu, Xiang"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks. 
    more » « less
  2. Doublon-holon dynamics is investigated in a pumped one-dimensional Hubbard model with a staggered on-site Coulomb interaction at half-filling. When the system parameters are set to be in the Mott-insulating regime the equilibrium sublattice density of states exhibits several characteristic peaks, corresponding to the lower and upper Hubbard bands as well as hybridization bands. We study the linear absorption spectrum and find two main peaks characterizing the photon frequencies which excite the ground state to an excited state. For a system driven by a laser pulse with general intensity and frequency, both the energy absorption and the doublon-holon dynamics exhibit distinct behaviors as a function of laser amplitude and frequency. Single-photon processes are observed at low laser intensity where the energy is absorbed for resonant laser frequencies. For strong laser intensity multiphoton-induced dynamics are observed in the system that are confirmed by an evaluation of the Loschmidt amplitude. The contribution of multiphoton processes to site-resolved double occupancy is also characterized by the generalized Loschmidt amplitude. The site-resolved doublon-holon dynamics are observed in both the one and multiphoton processes and the site-resolved behavior is explained within a quasiparticle picture. Our study suggests strategies to optically engineer the doublon-holon dynamics in one-dimensional strongly correlated many-body systems. 
    more » « less