Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Humans rarely speak without producing co-speech gestures of the hands, head, and other parts of the body. Co-speech gestures are also highly restricted in how they are timed with speech, typically synchronizing with prosodically-prominent syllables. What functional principles underlie this relationship? Here, we examine how the production of co-speech manual gestures influences spatiotemporal patterns of the oral articulators during speech production. We provide novel evidence that words uttered with accompanying co-speech gestures are produced with more extreme tongue and jaw displacement, and that presence of a co-speech gesture contributes to greater temporal stability of oral articulatory movements. This effect–which we term coupling enhancement–differs from stress-based hyperarticulation in that differences in articulatory magnitude are not vowel-specific in their patterning. Speech and gesture synergies therefore constitute an independent variable to consider when modeling the effects of prosodic prominence on articulatory patterns. Our results are consistent with work in language acquisition and speech-motor control suggesting that synchronizing speech to gesture can entrain acoustic prominence.more » « lessFree, publicly-accessible full text available December 1, 2026
-
Co-speech gestures are timed to occur with prosodically prominent syllables in several languages. In prior work in Indo-European languages, gestures are found to be attracted to stressed syllables, with gesture apexes preferentially aligning with syllables bearing higher and more dynamic pitch accents. Little research has examined the temporal alignment of co-speech gestures in African tonal languages, where metrical prominence is often hard to identify due to a lack of canonical stress correlates, and where a key function of pitch is in distinguishing between words, rather than marking intonational prominence. Here, we examine the alignment of co-speech gestures in two different Niger-Congo languages with very different word structures, Medʉmba (Grassfields Bantu, Cameroon) and Igbo (Igboid, Nigeria). Our findings suggest that the initial position in the stem tends to attract gestures in Medʉmba, while the final syllable in the word is the default position for gesture alignment in Igbo; phrase position also influences gesture alignment, but in language-specific ways. Though neither language showed strong evidence of elevated prominence of any individual tone value, gesture patterning in Igbo suggests that metrical structure at the level of the tonal foot is relevant to the speech-gesture relationship. Our results demonstrate how the speech-gesture relationship can be a window into patterns of word- and phrase-level prosody cross-linguistically. They also show that the relationship between gesture and tone (and the related notion of ‘tonal prominence’) is mediated by tone’s function in a language.more » « lessFree, publicly-accessible full text available January 1, 2026
-
Skarnitzl, R.; Volín, J. (Ed.)Fluid conversation depends on conversation partners’ ability to make predictions about one another’s speech in order to forecast turn ends and prepare upcoming turns. One model used to explain this process of temporal prediction is the coupled oscillator model of turn-taking. A generalization that the model captures is the relative scarcity of interruption in turn-taking, as it predicts partners’ turns should be counter-phased to one another, with minimal pause time between turns. However, in naturalistic conversation, turns are often delayed, rather than occurring in perfect succession. We hypothesize that these delays are not of arbitrary duration, but are structured in their timing, just as between turns with immediate transitions. We demonstrate that relative timing of prosodic events occurring at turn ends is key to modelling pause duration between turns, providing evidence that interturn pauses exist in a temporal trading relation with the final syllable and prosodic word of immediately preceding turn.more » « less
-
Skarnitzl, R. & (Ed.)While motion capture is rapidly becoming the gold standard for research on the intricacies of co-speech gesture and its relationship to speech, traditional marker-based motion capture technology is not always feasible, meaning researchers must code video data manually. We compare two methods for coding co-speech gestures of the hands and arms in video data of spontaneous speech: manual coding and semi-automated coding using OpenPose, a markerless motion capture software. We provide a comparison of the temporal alignment of gesture apexes based on video recordings of interviews with speakers of Medumba (Grassfields Bantu). Our results show a close correlation between the computationally calculated apexes and our hand-annotated apexes, suggesting that both methods are equally valid for coding video data. The use of markerless motion capture technology for gesture coding will enable more rapid coding of manual gestures, while still allowingmore » « less
-
Skarnitzl, R. & (Ed.)The timing of both manual co-speech gestures and head gestures is sensitive to prosodic structure of speech. However, head gesters are used not only by speakers, but also by listeners as a backchanneling device. Little research exists on the timing of gestures in back-channeling. To address this gap, we compare timing of listener and speaker head gestures in an interview context. Results reveal the dual role that head gestures play in speech and conversational interaction: while they are coordinated in key ways to one’s own speech, they are also coordinated to the gestures (and hence, the speech) of a conversation partner when one is actively listening to them. We also show that head gesture timing is sensitive to social dynamics between interlocutors. This study provides a novel contribution to literature on head gesture timing and has implications for studies of discourse and accommodation.more » « less
-
Stressed syllables in languages which have them tend to show two interesting properties: They show patterns of phonetic ‘enhancement’ at the articulatory and acoustic levels, and they also show coordinative properties. They typically play a key role in coordinating speech with co-speech gesture, in coordination with a musical beat, and in other sensorimotor synchronization tasks such as speech-coordinated beat tapping and metronome timing. While various phonological theories have considered stress from both of these perspectives, there is as yet no clear explanation as to how these properties relate to one another. The present work tests the hypothesis that aspects of phonetic enhancement may in fact be driven by coordination itself by observing how phonetic patterns produced by speakers of two prosodically-distinct languages—English and Medʉmba (Grassfields Bantu)—vary as a function of timing relations with an imaginary metronome beat. Results indicate that production of syllables in time (versus on the ‘offbeat’) with the imaginary beat led to increased duration and first formant frequency—two widely observed correlates of syllable stress—for speakers of both languages. These results support the idea that some patterns of phonetic enhancement may have their roots in coordinative practices.more » « less
An official website of the United States government

Full Text Available