skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine Learning Facilitated Investigations of Intonational Meaning: Prosodic Cues to Epistemic Shifts in American English Utterances
Machine Learning Facilitated Investigations of Intonational Meaning: Prosodic Cues to Epistemic Shifts in American English Utterances Authors: Veilleux, Shattuck-Hufnagel, Jeong, Brugos, Ahn This work analyzes experimentally elicited speech to capture the relationship between prosody and semantic/pragmatic meanings. Production prompts were comicstrips where contexts were manipulated along axes prominently discussed in sem/prag literature. Participants were tasked with reading lines as the speaker would, uttering a target phrase communicating a proposition p (e.g., “only marble is available”) to a hearer who had epistemic authority on p. Prompts varied whether the speaker’s initial belief (prior bias) was confirmed (condition A: bias=p) or corrected (condition B: bias=¬p); this meaning difference was reinforced by response particles (A: “okay so” vs. B: “oh really”) preceding the target phrase. Over 475 productions were annotated with phonologically-informed phonetic labels (PoLaR). To model many-to-many mappings between features (prosodic form) and classification (sem/prag meaning), Random Forests were designed on labels and derived measures (including f0 ranges, slopes, TCoG) from 299 recordings — classifying meaning with high accuracy (>85%). RFs identified condition-distinguishing prosodic cues in both response particle and target phrases, leading to questions of how/whether functionally-overlapping lexical content might affect prosodic realization. Moreover, RFs identified phrase-final f0 as important, leading to deeper edge-tone explorations. These highlight how explanatory ML models can help iteratively improve targeted analysis.  more » « less
Award ID(s):
2042702 2042748 2042694
PAR ID:
10527900
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ISCA
Date Published:
Page Range / eLocation ID:
931 to 935
Subject(s) / Keyword(s):
speech recognition human-computer interaction computational paralinguistics intonational meaning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Phrase-level prosodic prominence in American English is understood, in the AM tradition, to be marked by pitch accents. While such prominences are characterized via tonal labels in ToBI (e.g. H*), their cues are not exclusively in the pitch domain: timing, loudness and voice quality are known to contribute to prominence perception. All of these cues occur with a wide degree of variability in naturally produced speech, and this variation may be informative. In this study, we advance towards a system of explicit labelling of individual cues to prosodic structure, here focusing on phrase-level prominence. We examine correlations between the presence of a set of 6 cues to prominence (relating to segment duration, loudness, and non-modal phonation, in addition to f0) and pitch accent labels in a corpus of ToBI-labelled American English speech. Results suggest that tokens with more cues are more likely to receive a pitch accent label. 
    more » « less
  2. Lengthening and creaky voice are associated with prosodic finality in English. Listeners can use lengthening to identify both utterance-internal and final prosodic phrase boundaries and can use creak to locate utterance endings. Less is known about listeners' use of creak to locate internal prosodic boundaries and the relative importance assigned to duration and creak when both are present. Participants in two experiments segmented structurally ambiguous sentences in which duration and creak were manipulated to signal prosodic boundaries. When duration- and creak-based cues provided redundant information, their effects were additive. When these cues conflicted, the effect of creak was subtractive. 
    more » « less
  3. Abstract This study examined how inferences about epistemic competence and generalized labeling errors influence children’s selective word learning. Three- to 4-year-olds (N = 128) learned words from informants who asked questions about objects, mentioning either correct or incorrect labels. Such questions do not convey stark differences in informants’ epistemic competence. Inaccurate labels, however, generate error signals that can lead to weaker encoding of novel information. Preschoolers retained novel labels from both informants but were slower to respond in the Inaccurate Labeler condition. When the test procedure was not sensitive to the strength of information encoding, children performed above chance in both conditions and their response times did not differ. These results suggest that epistemic-level inferences and error generalizations influence preschoolers’ selective word learning concurrently. 
    more » « less
  4. Abstract Irrelevant salient distractors can trigger early quitting in visual search, causing observers to miss targets they might otherwise find. Here, we asked whether task-relevant salient cues can produce a similar early quitting effect on the subset of trials where those cues fail to highlight the target. We presented participants with a difficult visual search task and used two cueing conditions. In the high-predictive condition, a salient cue in the form of a red circle highlighted the target most of the time a target was present. In the low-predictive condition, the cue was far less accurate and did not reliably predict the target (i.e., the cue was often a false positive). These were contrasted against a control condition in which no cues were presented. In the high-predictive condition, we found clear evidence of early quitting on trials where the cue was a false positive, as evidenced by both increased miss errors and shorter response times on target absent trials. No such effects were observed with low-predictive cues. Together, these results suggest that salient cues which are false positives can trigger early quitting, though perhaps only when the cues have a high-predictive value. These results have implications for real-world searches, such as medical image screening, where salient cues (referred to as computer-aided detection or CAD) may be used to highlight potentially relevant areas of images but are sometimes inaccurate. 
    more » « less
  5. This study examines whether second language (L2) learners' processing of an intonationally cued lexical contrast is facilitated when intonational cues signal a segmental contrast in the native language (L1). It does so by investigating Seoul Korean and French listeners' processing of intonationally cued lexical-stress contrasts in English. Neither Seoul Korean nor French has lexical stress; instead, the two languages have similar intonational systems where prominence is realized at the level of the Accentual Phrase. A critical difference between the two systems is that French has only one tonal pattern underlying the realization of the Accentual Phrase, whereas Korean has two underlying tonal patterns that depend on the laryngeal feature of the phrase-initial segment. The L and H tonal cues thus serve to distinguish segments at the lexical level in Korean but not in French; Seoul Korean listeners are thus hypothesized to outperform French listeners when processing English lexical stress realized only with (only) tonal cues (H * on the stressed syllable). Seoul Korean and French listeners completed a sequence-recall task with four-item sequences of English words that differed in intonationally cued lexical stress (experimental condition) or in word-initial segment (control condition). The results showed higher accuracy for Seoul Korean listeners than for French listeners only when processing English lexical stress, suggesting that the processing of an intonationally cued lexical contrast in the L2 is facilitated when intonational cues signal a segmental contrast in the L1. These results are interpreted within the scope of the cue-based transfer approach to L2 prosodic processing. 
    more » « less