The study of language shift, the replacement of one language by another in a community, or subgroup of a speech community, is a prime topic for sociolinguistic analysis: shift is almost always the result of social factors. This paper argues for focusing research on the study of shift in process and, to that end, studying the different kinds of speakers in shifting communities. The prevalent response to massive, global language shift by linguists is language documentation. Although the need for documentation is clear, there have been inadvertent consequences: valorizing last speakers, promoting linguistic purism, and devaluing L2 language learners who, in many communities, represent the future of the language. The urgency of documenting and describing languages with relatively small numbers of elderly speakers has led the linguistic community to focus almost exclusively on such groups and ignore both larger speech communities in earlier stages of shift, and overlook the wide range of speaker types in shift communities. From a social standpoint, the result is that we are often failing to do the language work in precisely those communities where reversing language shift is still relatively easy. From a scientific standpoint, we are missing the opportunity to study language change in process, and missing the chance to study speaker variation in a shift situation. Variation in proficiency and performance across shifting speakers is not random but systematic and correlates with a set of social and cognitive factors.
more »
« less
Characterizing English Variation across Social Media Communities with BERT
Much previous work characterizing language variation across Internet social groups has focused on the types of words used by these groups. We extend this type of study by employing BERT to characterize variation in the senses of words as well, analyzing two months of English comments in 474 Reddit communities. The specificity of different sense clusters to a community, combined with the specificity of a community’s unique word types, is used to identify cases where a social group’s language deviates from the norm. We validate our metrics using user-created glossaries and draw on sociolinguistic theories to connect language variation with trends in community behavior. We find that communities with highly distinctive language are medium-sized, and their loyal and highly engaged users interact in dense networks.
more »
« less
- Award ID(s):
- 1813470
- PAR ID:
- 10274010
- Editor(s):
- Daelemans, Walter
- Date Published:
- Journal Name:
- Transactions of the Association for Computational Linguistics
- Volume:
- 9
- ISSN:
- 2307-387X
- Page Range / eLocation ID:
- 538-556
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)This paper outlines a new model of language revitalisation that understands language to be a characteristic of a nexus of social activities rather than an independent object. Language use is one of an overall set of factors contributing to the wellbeing of a particular community. Our model treats language as one node (or a cluster of nodes) in a complex system of interacting behaviours. Changes to another node or in the language node(s) itself can impact overall social wellbeing, something often ignored by linguists (but not by other social scientists working in Indigenous communities). Disruption to an existing network occurs within a time frame; the longer the disruption, the more likely that the network redefines the group. Variables that define the language ecology operate on multiple levels. For the group and for individuals within the group, there can be considerable variation in usage and proficiency over time. Sustainability cannot be reduced to simple cause-and-effect relationships between sociocultural variables. The next phase of language revitalisation projects should be built around the concept of language activity as part of promoting community wellbeing. The use of complex networks that have been applied to human wellbeing in other contexts support our argument.more » « less
-
Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting SEMANTICMEMES dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.more » « less
-
Transformative media fandom is a remarkably coherent, long-lived, and diverse community united primarily by shared engagement in the varied activities of fandom. Its social norms are highly-developed and frequently debated, and have been studied by the CSCW and Media Studies communities in the past, but rarely using the tools and theories of privacy, despite fannish norms often bearing strongly on privacy. We use privacy scholarship and existing theories thereof to examine these norms and bring an additional perspective to understanding fandom communities. In this work, we analyze over 250,000 words of meta'' essays and comments on those essays, reflecting the views and debates of hundreds of fans on these privacy norms. Drawing on Solove's theory of privacy as an aggregation of different ideas and on a variety of other academic theories of privacy, we analyze these norms as highly effective at protecting the integrity of fannish activities. We then articulate the value of studying these sorts of diverse activity-defined'' communities, arguing that such approaches grant us greater power to understand privacy experiences in ways that are specific, contextual, and intersectional yet still generalizable where possible.more » « less
-
ABSTRACT Host-associated microbial communities, henceforth ‘microbiota’, can affect the physiology and behavior of their hosts. In mammals, host ecological, social and environmental variables are associated with variation in microbial communities. Within individuals in a given mammalian species, the microbiota also partitions by body site. Here, we build on this work and sequence the bacterial 16S rRNA gene to profile the microbiota at six distinct body sites (ear, nasal and oral cavities, prepuce, rectum and anal scent gland) in a population of wild spotted hyenas (Crocuta crocuta), which are highly social, large African carnivores. We inquired whether microbiota at these body sites vary with host sex or social rank among juvenile hyenas, and whether they differ between juvenile females and adult females. We found that the scent gland microbiota differed between juvenile males and juvenile females, whereas the prepuce and rectal microbiota differed between adult females and juvenile females. Social rank, however, was not a significant predictor of microbiota profiles. Additionally, the microbiota varied considerably among the six sampled body sites and exhibited strong specificity among individual hyenas. Thus, our findings suggest that site-specific niche selection is a primary driver of microbiota structure in mammals, but endogenous host factors may also be influential.more » « less
An official website of the United States government

