skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Lu, Yijing"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition, especially in front vowels, between American English and Indian English speakers. To gain a deeper understanding of these differences, we conduct real-time MRI-based articulatory analysis, revealing distinct velar region patterns during the production of specific front vowels. This underscores the need to deepen the scientific understanding of self-supervised speech model variances to advance robust and inclusive speech technology.

     
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  2. Individuals who have undergone treatment for oral cancer oftentimes exhibit compensatory behavior in consonant production. This pilot study investigates whether compensatory mechanisms utilized in the production of speech sounds with a given target constriction location vary systematically depending on target manner of articulation. The data reveal that compensatory strategies used to produce target alveolar segments vary systematically as a function of target manner of articulation in subtle yet meaningful ways. When target constriction degree at a particular constriction location cannot be preserved, individuals may leverage their ability to finely modulate constriction degree at multiple constriction locations along the vocal tract. 
    more » « less
  3. There is a lack of general agreement among previous studies (e.g., Bakst, 2016; Dediu & Moisik, 2019; Westbury et al., 1998) on whether measurements of vocal tract morphology are robust predictors of inter-speaker variation in tongue shaping for American English /ɹ/. One possible reason is the different quantifications of /ɹ/ tongue shapes that were employed. The current study compares the relationships between a single set of anatomical measurements and three different measures of lingual articulation for /ɹ/ in /ɑɹɑ/ in midsagittal real-time MRI data. A novel method was developed to quantify the palatal constriction location and length, which served as the first two measures of tongue shape. A linear Support Vector Machine divided the constriction location and length measures into regions that approximate the visually identified categories of “retroflex” and “bunched.” The third shape measurement is the signed distance of each token of /ɹ/ to the division boundary, representing the degree of “retroflexion” or “bunchedness” based on palatal constriction properties. These three measures showed marginally to moderately significant linear relationships with two specific measures of individual speakers’ vocal tract anatomy: the degree of mandibular inclination and the length of the oral cavity roof. Overall, the effect of anatomy on the lingual articulation of /ɹ/ is not strong. [Work supported by NSF, Grant 1908865.]

     
    more » « less
  4. The theory of Task Dynamics provides a method of predicting articulatory kinematics from a discrete phonologically-relevant representation (“gestural score”). However, because the implementations of that model (e.g., Nam et al., 2004) have generally used a simplified articulatory geometry (Mermelstein et al., 1981) whose forward model (from articulator to constriction coordinates) can be analytically derived, quantitative predictions of the model for individual human vocal tracts have not been possible. Recently, methods of deriving individual speaker forward models from real-time MRI data have been developed (Sorensen et al., 2019). This has further allowed development of task dynamic models for individual speakers, which make quantitative predictions. Thus far, however, these models (Alexander et al., 2019) could only synthesize limited types of utterances due to their inability to model temporally overlapping gestures. An updated implementation is presented, which can accommodate overlapping gestures and incorporates an optimization loop to improve the fit of modeled articulatory trajectories to the observed ones. Using an analysis-by-synthesis approach, the updated implementation can be utilized: (1) to refine the hypothesized speaker-general gestural parameters (target, stiffness) for individual speakers; (2) to test different degrees of temporal overlapping among multiple gestures such as a CCVC syllable. [Work supported by NSF, Grant 1908865.]

     
    more » « less