Among the many multilingual speakers of the world, code- switching (CSW) is a common linguistic phenomenon. Prior sociolinguistic work has shown that factors such as expressing group identity and solidarity, performing affective function, and reflecting shared experiences are related to CSW prevalence in multilingual speech. We build on prior studies by asking: is the expression of empathy a motivation for CSW in speech? To begin to answer this question, we examine several multilingual speech corpora representing diverse language families and ap- ply recent modeling advances in the study of empathetic mono- lingual speech. We find a generally stronger positive relation- ship of spoken CSW with the lexical correlates of empathy than with acoustic-prosodic ones, which holds across three language pairs. Our work is a first step toward establishing a motivation for CSW that has thus far mainly been studied qualitatively.
more »
« less
Capturing Formality in Speech Across Domains and Languages
The linguistic notion of formality is one dimension of stylistic variation in human communication. A universal characteristic of language production, formality has surface-level realizations in written and spoken language. In this work, we explore ways of measuring the formality of such realizations in multilingual speech corpora across a wide range of domains. We compare measures of formality, contrasting textual and acoustic-prosodic metrics. We believe that a combination of these should correlate well with downstream applications. Our findings include: an indication that certain prosodic variables might play a stronger role than others; no correlation between prosodic and textual measures; limited evidence for anticipated inter-domain trends, but some evidence of consistency of measures between languages. We conclude that non-lexical indicators of formality in speech may be more subtle than our initial expectations, motivating further work on reliably encoding spoken formality.
more »
« less
- Award ID(s):
- 2327564
- PAR ID:
- 10507914
- Publisher / Repository:
- ISCA
- Date Published:
- Journal Name:
- Interspeech 2023
- Page Range / eLocation ID:
- 1030 to 1034
- Subject(s) / Keyword(s):
- formality code-switching speech
- Format(s):
- Medium: X
- Location:
- Dublin, Ireland
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
It is well-known that speakers who entrain to one another have more successful conver- sations than those who do not. Previous re- search has shown that interlocutors entrain on linguistic features in both written and spoken monolingual domains. More recent work on code-switched communication has also shown preliminary evidence of entrainment on cer- tain aspects of code-switching (CSW). How- ever, such studies of entrainment in code- switched domains have been extremely few and restricted to human-machine textual inter- actions. Our work studies code-switched spon- taneous speech between humans, finding that (1) patterns of written and spoken entrainment in monolingual settings largely generalize to code-switched settings, and (2) some patterns of entrainment on code-switching in dialogue agent-generated text generalize to spontaneous code-switched speech. Our findings give rise to important implications for the potentially "uni- versal" nature of entrainment as a communica- tion phenomenon, and potential applications in inclusive and interactive speech technology.more » « less
-
Prosody perception is fundamental to spoken language communication as it supports comprehension, pragmatics, morphosyntactic parsing of speech streams, and phonological awareness. A particular aspect of prosody: perceptual sensitivity to speech rhythm patterns in words (i.e., lexical stress sensitivity), is also a robust predictor of reading skills, though it has received much less attention than phonological awareness in the literature. Given the importance of prosody and reading in educational outcomes, reliable and valid tools are needed to conduct large-scale health and genetic investigations of individual differences in prosody, as groundwork for investigating the biological underpinnings of the relationship between prosody and reading. Motivated by this need, we present the Test of Prosody via Syllable Emphasis (“TOPsy”) and highlight its merits as a phenotyping tool to measure lexical stress sensitivity in as little as 10 min, in scalable internet-based cohorts. In this 28-item speech rhythm perception test [modeled after the stress identification test from Wade-Woolley (2016) ], participants listen to multi-syllabic spoken words and are asked to identify lexical stress patterns. Psychometric analyses in a large internet-based sample shows excellent reliability, and predictive validity for self-reported difficulties with speech-language, reading, and musical beat synchronization. Further, items loaded onto two distinct factors corresponding to initially stressed vs. non-initially stressed words. These results are consistent with previous reports that speech rhythm perception abilities correlate with musical rhythm sensitivity and speech-language/reading skills, and are implicated in reading disorders (e.g., dyslexia). We conclude that TOPsy can serve as a useful tool for studying prosodic perception at large scales in a variety of different settings, and importantly can act as a validated brief phenotype for future investigations of the genetic architecture of prosodic perception, and its relationship to educational outcomes.more » « less
-
Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.more » « less
-
Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.more » « less
An official website of the United States government

