skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Polite Speech Emerges From Competing Social Goals
Language is a remarkably efficient tool for transmitting information. Yet human speakers make statements that are inefficient, imprecise, or even contrary to their own beliefs, all in the service of being polite. What rational machinery underlies polite language use? Here, we show that polite speech emerges from the competition of three communicative goals: to convey information, to be kind, and to present oneself in a good light. We formalize this goal tradeoff using a probabilistic model of utterance production, which predicts human utterance choices in socially sensitive situations with high quantitative accuracy, and we show that our full model is superior to its variants with subsets of the three goals. This utility-theoretic approach to speech acts takes a step toward explaining the richness and subtlety of social language use.  more » « less
Award ID(s):
1911790
PAR ID:
10279431
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Open Mind
Volume:
4
ISSN:
2470-2986
Page Range / eLocation ID:
71 to 87
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For robots to successfully operate as members of human-robot teams, it is crucial for robots to correctly understand the intentions of their human teammates. This task is particularly difficult due to human sociocultural norms: for reasons of social courtesy (e.g., politeness), people rarely express their intentions directly, instead typically employing polite utterance forms such as Indirect Speech Acts (ISAs). It is thus critical for robots to be capable of inferring the intentions behind their teammates’ utterances based on both their interaction context (including, e.g., social roles) and their knowledge of the sociocultural norms that are applicable within that context. This work builds off of previous research on understanding and generation of ISAs using Dempster-Shafer Theoretic Uncertain Logic, by showing how other recent work in Dempster-Shafer Theoretic rule learning can be used to learn appropriate uncertainty intervals for robots’ representations of sociocultural politeness norms. 
    more » « less
  2. ISCA (Ed.)
    In this paper, we explore automatic prediction of dialect density of the African American English (AAE) dialect, where dialect density is defined as the percentage of words in an utterance that contain characteristics of the non-standard dialect. We investigate several acoustic and language modeling features, including the commonly used X-vector representation and ComParE feature set, in addition to information extracted from ASR transcripts of the audio files and prosodic information. To address issues of limited labeled data, we use a weakly supervised model to project prosodic and X-vector features into low-dimensional task-relevant representations. An XGBoost model is then used to predict the speaker's dialect density from these features and show which are most significant during inference. We evaluate the utility of these features both alone and in combination for the given task. This work, which does not rely on hand-labeled transcripts, is performed on audio segments from the CORAAL database. We show a significant correlation between our predicted and ground truth dialect density measures for AAE speech in this database and propose this work as a tool for explaining and mitigating bias in speech technology. 
    more » « less
  3. Predicting the intelligibility of noisy recordings is difficult and most current algorithms treat all speech energy as equally important to intelligibility. Our previous work on human perception used a listening test paradigm and correlational analysis to show that some energy is more important to intelligibility than other energy. In this paper, we propose a system called the Bubble Cooperative Network (BCN), which aims to predict important areas of individual utterances directly from clean speech. Given such a prediction, noise is added to the utterance in unimportant regions and then presented to a recognizer. The BCN is trained with a loss that encourages it to add as much noise as possible while preserving recognition performance, encouraging it to identify important regions precisely and place the noise everywhere else. Empirical evaluation shows that the BCN can obscure 97.7% of the spectrogram with noise while maintaining recognition accuracy for a simple speech recognizer that compares a noisy test utterance with a clean reference utterance. The masks predicted by a single BCN on several utterances show patterns that are similar to analyses derived from human listening tests that analyze each utterance separately, while exhibiting better generalization and less context-dependence than previous approaches. 
    more » « less
  4. For enhanced performance and privacy, companies deploying voice-activated technologies such as virtual assistants and robots are increasingly tending toward designs in which technologies only begin attending to speech once a specified wakeword is heard. Due to concerns that interactions with such technologies could lead users, especially children, to develop impolite habits, some companies have begun to develop use modes in which interactants are required to use ostensibly polite wakewords such as " Please''. In this paper, we argue that these ``please-centering'' wakewords are likely to backfire and actually discourage polite interactions due to the particular types of lexical and syntactic priming induced by those wakewords. We then present the results of a human-subject experiment (n=90) that validates those claims. 
    more » « less
  5. Abstract Computational models of infant word‐finding typically operate over transcriptions of infant‐directed speech corpora. It is now possible to test models of word segmentation on speech materials, rather than transcriptions of speech. We propose that such modeling efforts be conducted over the speech of the experimental stimuli used in studies measuring infants' capacity for learning from spoken sentences. Correspondence with infant outcomes in such experiments is an appropriate benchmark for models of infants. We demonstrate such an analysis by applying the DP‐Parser model of Algayres and colleagues to auditory stimuli used in infant psycholinguistic experiments by Pelucchi and colleagues. The DP‐Parser model takes speech as input, and creates multiple overlapping embeddings from each utterance. Prospective words are identified as clusters of similar embedded segments. This allows segmentation of each utterance into possible words, using a dynamic programming method that maximizes the frequency of constituent segments. We show that DP‐Parse mimics American English learners' performance in extracting words from Italian sentences, favoring the segmentation of words with high syllabic transitional probability. This kind of computational analysis over actual stimuli from infant experiments may be helpful in tuning future models to match human performance. 
    more » « less