skip to main content


Title: Characterizing English Variation across Social Media Communities with BERT
Much previous work characterizing language variation across Internet social groups has focused on the types of words used by these groups. We extend this type of study by employing BERT to characterize variation in the senses of words as well, analyzing two months of English comments in 474 Reddit communities. The specificity of different sense clusters to a community, combined with the specificity of a community’s unique word types, is used to identify cases where a social group’s language deviates from the norm. We validate our metrics using user-created glossaries and draw on sociolinguistic theories to connect language variation with trends in community behavior. We find that communities with highly distinctive language are medium-sized, and their loyal and highly engaged users interact in dense networks.  more » « less
Award ID(s):
1813470
NSF-PAR ID:
10274010
Author(s) / Creator(s):
;
Editor(s):
Daelemans, Walter
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
9
ISSN:
2307-387X
Page Range / eLocation ID:
538-556
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The study of language shift, the replacement of one language by another in a community, or subgroup of a speech community, is a prime topic for sociolinguistic analysis: shift is almost always the result of social factors. This paper argues for focusing research on the study of shift in process and, to that end, studying the different kinds of speakers in shifting communities. The prevalent response to massive, global language shift by linguists is language documentation. Although the need for documentation is clear, there have been inadvertent consequences: valorizing last speakers, promoting linguistic purism, and devaluing L2 language learners who, in many communities, represent the future of the language. The urgency of documenting and describing languages with relatively small numbers of elderly speakers has led the linguistic community to focus almost exclusively on such groups and ignore both larger speech communities in earlier stages of shift, and overlook the wide range of speaker types in shift communities. From a social standpoint, the result is that we are often failing to do the language work in precisely those communities where reversing language shift is still relatively easy. From a scientific standpoint, we are missing the opportunity to study language change in process, and missing the chance to study speaker variation in a shift situation. Variation in proficiency and performance across shifting speakers is not random but systematic and correlates with a set of social and cognitive factors. 
    more » « less
  2. Researchers, evaluators and designers from an array of academic disciplines and industry sectors are turning to participatory approaches as they seek to understand and address complex social problems. We refer to participatory approaches that collaboratively engage/ partner with stakeholders in knowledge creation/problem solving for action/social change outcomes as collaborative change research, evaluation and design (CCRED). We further frame CCRED practitioners by their desire to move beyond knowledge creation for its own sake to implementation of new knowledge as a tool for social change. In March and May of 2018, we conducted a literature search of multiple discipline-specific databases seeking collaborative, change-oriented scholarly publications. The search was limited to include peerreviewed journal articles, with English language abstracts available, published in the last five years. The search resulted in 526 citations, 236 of which met inclusion criteria. Though the search was limited to English abstracts, all major geographic regions (North America, Europe, Latin America/Caribbean, APAC, Africa and the Middle East) were represented within the results, although many articles did not state a specific region. Of those identified, most studies were located in North America, with the Middle East having only one identified study. We followed a qualitative thematic synthesis process to examine the abstracts of peer-reviewed articles to identify practices that transcend individual disciplines, sectors and contexts to achieve collaborative change. We surveyed the terminology used to describe CCRED, setting, content/topic of study, type of collaboration, and related benefits/outcomes in order to discern the words used to designate collaboration, the frameworks, tools and methods employed, and the presence of action, evaluation or outcomes. Forty-three percent of the reviewed articles fell broadly within the social sciences, followed by 26 percent in education and 25 percent in health/medicine. In terms of participants and/ or collaborators in the articles reviewed, the vast majority of the 236 articles (86%) described participants, that is, those who the research was about or from whom data was collected. In contrast to participants, partners/collaborators (n=32; 14%) were individuals or groups who participated in the design or implementation of the collaborative change effort described. In terms of the goal for collaboration and/or for doing the work, the most frequently used terminology related to some aspect of engagement and empowerment. Common descriptors for the work itself were ‘social change’ (n=74; 31%), ‘action’ (n=33; 14%), ‘collaborative or participatory research/practice’ (n=13; 6%), ‘transformation’ (n=13; 6%) and ‘community engagement’ (n=10; 4%). Of the 236 articles that mentioned a specific framework or approach, the three most common were some variation of Participatory Action Research (n=30; 50%), Action Research (n=40; 16.9%) or Community-Based Participatory Research (n=17; 7.2%). Approximately a third of the 236 articles did not mention a specific method or tool in the abstract. The most commonly cited method/tool (n=30; 12.7%) was some variation of an arts-based method followed by interviews (n=18; 7.6%), case study (n=16; 6.7%), or an ethnographic-related method (n=14; 5.9%). While some articles implied action or change, only 14 of the 236 articles (6%) stated a specific action or outcome. Most often, the changes described were: the creation or modification of a model, method, process, framework or protocol (n=9; 4%), quality improvement, policy change and social change (n=8; 3%), or modifications to education/training methods and materials (n=5; 2%). The infrequent use of collaboration as a descriptor of partner engagement, coupled with few reported findings of measurable change, raises questions about the nature of CCRED. It appears that conducting CCRED is as complex an undertaking as the problems that the work is attempting to address. 
    more » « less
  3. Abstract

    To what degree can we determine people's connections with groups through the language they use? In recent years, large archives of behavioral data from social media communities have become available to social scientists, opening the possibility of tracking naturally occurring group identity processes. A feature of most digital groups is that they rely exclusively on the written word. Across 3 studies, we developed and validated a language-based metric of group identity strength and demonstrated its potential in tracking identity processes in online communities. In Studies 1a–1c, 873 people wrote about their connections to various groups (country, college, or religion). A total of 2 language markers of group identity strength were found: high affiliation (more words like we, togetherness) and low cognitive processing or questioning (fewer words like think, unsure). Using these markers, a language-based unquestioning affiliation index was developed and applied to in-class stream-of-consciousness essays of 2,161 college students (Study 2). Greater levels of unquestioning affiliation expressed in language predicted not only self-reported university identity but also students’ likelihood of remaining enrolled in college a year later. In Study 3, the index was applied to naturalistic Reddit conversations of 270,784 people in 2 online communities of supporters of the 2016 presidential candidates—Hillary Clinton and Donald Trump. The index predicted how long people would remain in the group (3a) and revealed temporal shifts mirroring members’ joining and leaving of groups (3b). Together, the studies highlight the promise of a language-based approach for tracking and studying group identity processes in online groups.

     
    more » « less
  4. Using archived social media data, the language signatures of people going through breakups were mapped. Text analyses were conducted on 1,027,541 posts from 6,803 Reddit users who had posted about their breakups. The posts include users’ Reddit history in the 2 y surrounding their breakups across the various domains of their life, not just posts pertaining to their relationship. Language markers of an impending breakup were evident 3 mo before the event, peaking on the week of the breakup and returning to baseline 6 mo later. Signs included an increase in I-words, we-words, and cognitive processing words (characteristic of depression, collective focus, and the meaning-making process, respectively) and drops in analytic thinking (indicating more personal and informal language). The patterns held even when people were posting to groups unrelated to breakups and other relationship topics. People who posted about their breakup for longer time periods were less well-adjusted a year after their breakup compared to short-term posters. The language patterns seen for breakups replicated for users going through divorce (n= 5,144; 1,109,867 posts) or other types of upheavals (n= 51,357; 11,081,882 posts). The cognitive underpinnings of emotional upheavals are discussed using language as a lens.

     
    more » « less
  5. Abstract

    Plant species vary in how they regulate moisture and this has implications for their flammability during wildfires. We explored how fuel moisture is shaped by variation within five hydraulic traits: saturated moisture content, cell wall rigidity, cell solute potential, symplastic water fraction and tissue capacitance.

    Using pressure–volume curves, we measured these hydraulic traits in twigs and distal shoots (i.e. twigs + leaves) in 62 plant species across four wooded communities in south‐eastern Australia. Moisture content of fine fuels was then estimated for circumstances typical of fire weather. These projections were made assuming that under the hot, dry, windy conditions typical of large wildfires, leaves and fine twigs would function at internal water pressures close to wilting point (i.e. turgor loss point, TLP). The effect of different moisture contents at TLP on ignition time was then modelled using a fully mechanistic, finite element model of biomass ignition based on standard principles of physical chemistry.

    We also measured predawn water potential, an indication of plant access to soil water that is influenced by root architecture. These data were used to model how root traits influence fuel moisture and ignition time.

    Most variation among species in fuel moisture under fire weather conditions arose from differences in saturated moisture content (3.4‐ to 3.6‐fold variation). Twig capacitance was also an important driver of fuel moisture under these weather conditions (1.9‐ to 2.2‐fold variation in moisture content). A suite of other leaf and root traits influencing how much shoots dry out as they approach wilting point each contributed 1.0‐ to 1.6‐fold variation in projected fuel moisture during fire weather. Fuel moisture variation in turn drove variation in flammability by modifying predicted ignition time.

    Two main life‐history types in fire‐prone habitats are obligate seeders and resprouters. There were no significant differences between these species groups in estimated fuel moisture during fire weather, nor in any measured hydraulic traits.

    Live fuel moisture is an important determinant of wildfire activity. Our data show that variation in tissue saturated moisture content among co‐occurring species represents an important ecological store of variation in flammability in the study communities.

    A freePlain Language Summarycan be found within the Supporting Information of this article.

     
    more » « less