skip to main content


Title: 3D dynamic MRI of the vocal tract during natural speech
Purpose

To develop and evaluate a technique for 3D dynamic MRI of the full vocal tract at high temporal resolution during natural speech.

Methods

We demonstrate 2.4 × 2.4 × 5.8 mm3spatial resolution, 61‐ms temporal resolution, and a 200 × 200 × 70 mm3FOV. The proposed method uses 3D gradient‐echo imaging with a custom upper‐airway coil, a minimum‐phase slab excitation, stack‐of‐spirals readout, pseudo golden‐angle view order inkxky, linear Cartesian order alongkz, and spatiotemporal finite difference constrained reconstruction, with 13‐fold acceleration. This technique is evaluated using in vivo vocal tract airway data from 2 healthy subjects acquired at 1.5T scanner, 1 with synchronized audio, with 2 tasks during production of natural speech, and via comparison with interleaved multislice 2D dynamic MRI.

Results

This technique captured known dynamics of vocal tract articulators during natural speech tasks including tongue gestures during the production of consonants “s” and “l” and of consonant–vowel syllables, and was additionally consistent with 2D dynamic MRI. Coordination of lingual (tongue) movements for consonants is demonstrated via volume‐of‐interest analysis. Vocal tract area function dynamics revealed critical lingual constriction events along the length of the vocal tract for consonants and vowels.

Conclusion

We demonstrate feasibility of 3D dynamic MRI of the full vocal tract, with spatiotemporal resolution adequate to visualize lingual movements for consonants and vocal tact shaping during natural productions of consonant–vowel syllables, without requiring multiple repetitions.

 
more » « less
PAR ID:
10078807
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Magnetic Resonance in Medicine
Volume:
81
Issue:
3
ISSN:
0740-3194
Page Range / eLocation ID:
p. 1511-1520
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Purpose

    To provide 3D real‐time MRI of speech production with improved spatio‐temporal sharpness using randomized, variable‐density, stack‐of‐spiral sampling combined with a 3D spatio‐temporally constrained reconstruction.

    Methods

    We evaluated five candidate (k,t) sampling strategies using a previously proposed gradient‐echo stack‐of‐spiral sequence and a 3D constrained reconstruction with spatial and temporal penalties. Regularization parameters were chosen by expert readers based on qualitative assessment. We experimentally determined the effect of spiral angle increment andkztemporal order. The strategy yielding highest image quality was chosen as the proposed method. We evaluated the proposed and original 3D real‐time MRI methods in 2 healthy subjects performing speech production tasks that invoke rapid movements of articulators seen in multiple planes, using interleaved 2D real‐time MRI as the reference. We quantitatively evaluated tongue boundary sharpness in three locations at two speech rates.

    Results

    The proposed data‐sampling scheme uses a golden‐angle spiral increment in thekxkyplane and variable‐density, randomized encoding alongkz. It provided a statistically significant improvement in tongue boundary sharpness score (P < .001) in the blade, body, and root of the tongue during normal and 1.5‐times speeded speech. Qualitative improvements were substantial during natural speech tasks of alternating high, low tongue postures during vowels. The proposed method was also able to capture complex tongue shapes during fast alveolar consonant segments. Furthermore, the proposed scheme allows flexible retrospective selection of temporal resolution.

    Conclusion

    We have demonstrated improved 3D real‐time MRI of speech production using randomized, variable‐density, stack‐of‐spiral sampling with a 3D spatio‐temporally constrained reconstruction.

     
    more » « less
  2. Abstract

    Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 participants performing linguistically motivated speech tasks, alongside the corresponding public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each participant.

     
    more » « less
  3. Purpose

    To demonstrate a tagging method compatible with RT‐MRI for the study of speech production.

    Methods

    Tagging is applied as a brief interruption to a continuous real‐time spiral acquisition. Tagging can be initiated manually by the operator, cued to the speech stimulus, or be automatically applied with a fixed frequency. We use a standard 2D 1‐3‐3‐1 binomial SPAtial Modulation of Magnetization (SPAMM) sequence with 1 cm spacing in both in‐plane directions. Tag persistence in tongue muscle is simulated and validated in vivo. The ability to capture internal tongue deformations is tested during speech production of American English diphthongs in native speakers.

    Results

    We achieved an imaging window of 650‐800 ms at 1.5T, with imaging signal to noise ratio ≥ 17 and tag contrast to noise ratio ≥ 5 in human tongue, providing 36 frames/s temporal resolution and 2 mm in‐plane spatial resolution with real‐time interactive acquisition and view‐sharing reconstruction. The proposed method was able to capture tongue motion patterns and their relative timing with adequate spatiotemporal resolution during the production of American English diphthongs and consonants.

    Conclusion

    Intermittent tagging during real‐time MRI of speech production is able to reveal the internal deformations of the tongue. This capability will allow new investigations of valuable spatiotemporal information on the biomechanics of the lingual subsystems during speech without reliance on binning speech utterance repetition.

     
    more » « less
  4. Purpose

    To improve the depiction and tracking of vocal tract articulators in spiral real‐time MRI (RT‐MRI) of speech production by estimating and correcting for dynamic changes in off‐resonance.

    Methods

    The proposed method computes a dynamic field map from the phase of single‐TE dynamic images after a coil phase compensation where complex coil sensitivity maps are estimated from the single‐TE dynamic scan itself. This method is tested using simulations and in vivo data. The depiction of air–tissue boundaries is evaluated quantitatively using a sharpness metric and visual inspection.

    Results

    Simulations demonstrate that the proposed method provides robust off‐resonance correction for spiral readout durations up to 5 ms at 1.5T. In ‐vivo experiments during human speech production demonstrate that image sharpness is improved in a majority of data sets at air–tissue boundaries including the upper lip, hard palate, soft palate, and tongue boundaries, whereas the lower lip shows little improvement in the edge sharpness after correction.

    Conclusion

    Dynamic off‐resonance correction is feasible from single‐TE spiral RT‐MRI data, and provides a practical performance improvement in articulator sharpness when applied to speech production imaging.

     
    more » « less
  5. Objectives

    To evaluate a novel method for real‐time tagged MRI with increased tag persistence using phase sensitive tagging (REALTAG), demonstrated for speech imaging.

    Methods

    Tagging is applied as a brief interruption to a continuous real‐time spiral acquisition. REALTAG is implemented using a total tagging flip angle of 180° and a novel frame‐by‐frame phase sensitive reconstruction to remove smooth background phase while preserving the sign of the tag lines. Tag contrast‐to‐noise ratio of REALTAG and conventional tagging (total flip angle of 90°) is simulated and evaluated in vivo. The ability to extend tag persistence is tested during the production of vowel‐to‐vowel transitions by American English speakers.

    Results

    REALTAG resulted in a doubling of contrast‐to‐noise ratio at each time point and increased tag persistence by more than 1.9‐fold. The tag persistence was 1150 ms with contrast‐to‐noise ratio >6 at 1.5T, providing 2 mm in‐plane resolution, 179 frames/s, with 72.6 ms temporal window width, and phase sensitive reconstruction. The new imaging window is able to capture internal tongue deformation over word‐to‐word transitions in natural speech production.

    Conclusion

    Tag persistence is substantially increased in intermittently tagged real‐time MRI by using the improved REALTAG method. This makes it possible to capture longer motion patterns in the tongue, such as cross‐word vowel‐to‐vowel transitions, and provides a powerful new window to study tongue biomechanics.

     
    more » « less