skip to main content


Title: Understanding the Diverging User Trajectories in Highly-related Online Communities during the COVID-19 Pandemic
As the COVID-19 pandemic is disrupting life worldwide, related online communities are popping up. In particular, two “new” communities, /r/China flu and /r/Coronavirus, emerged on Reddit and have been dedicated to COVID- related discussions from the very beginning of this pandemic. With /r/Coronavirus promoted as the official community on Reddit, it remains an open question how users choose between these two highly-related communities. In this paper, we characterize user trajectories in these two communities from the beginning of COVID-19 to the end of September 2020. We show that new users of /r/China flu and /r/Coronavirus were similar from January to March. After that, their differences steadily increase, both in language distance and membership prediction, as the pandemic continues to unfold. Furthermore, users who started at /r/China flu from January to March were more likely to leave, while those who started in later months tend to remain highly “loyal”. To understand this difference, we develop a movement analysis framework to understand membership changes in these two communities and identify a significant proportion of /r/China flu members (around 50%) that moved to /r/Coronavirus in February. This movement turns out to be highly predictable based on other subreddits that users were previously active in. Our work demonstrates how two highly-related communities emerge and develop their own identity in a crisis, and highlights the important role of existing communities in understanding such an emergence.  more » « less
Award ID(s):
1910225
NSF-PAR ID:
10297879
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the International AAAI Conference on Weblogs and Social Media
Volume:
15
Issue:
1
ISSN:
2162-3449
Page Range / eLocation ID:
888-899
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Research has rarely examined how the COVID-19 pandemic may affect teens’ social media engagement and psychological wellbeing, and even less research has compared the difference between teens with and without mental health concerns. We collected and analyzed weekly data from January to December 2020 from teens in four Reddit communities (subreddits), including teens in r/Teenagers and teens who participated in three mental health subreddits (r/Depression, r/Anxiety, and r/SuicideWatch). The results showed that teens’ weekly subreddit participation, posting/commenting frequency, and emotion expression were related to significant pandemic events. Teen Redditors on r/Teenagers had a higher posting/commenting frequency but lower negative emotion than teen Redditors on the three mental health subreddits. When comparing posts/comments on r/Teenagers, teens who ever visited one of the three mental health subreddits posted/commented twice as frequently as teens who did not, but their emotion expression was similar. The results from the Interrupted Time Series Analysis (ITSA) indicated that both teens with and without mental health concerns reversed the trend in posting frequency and negative emotion from declining to increasing right after the pandemic outbreak, and teens with mental health concerns had a more rapidly increasing trend in posting/commenting. The findings suggest that teens’ social media engagement and emotion expression reflect the pandemic evolution. Teens with mental health concerns are more likely to reveal their emotions on specialized mental health subreddits rather than on the general r/Teenagers subreddit. In addition, the findings indicated that teens with mental health concerns had a strong social interaction desire that various barriers in the real world may inhibit. The findings call for more attention to understand the pandemic’s influence on teens by monitoring and analyzing social media data and offering adequate support to teens regarding their mental health wellbeing. 
    more » « less
  2. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  3. Background

    The COVID-19 pandemic has resulted in heightened levels of depression, anxiety, and other mental health issues due to sudden changes in daily life, such as economic stress, social isolation, and educational irregularity. Accurately assessing emotional and behavioral changes in response to the pandemic can be challenging, but it is essential to understand the evolving emotions, themes, and discussions surrounding the impact of COVID-19 on mental health.

    Objective

    This study aims to understand the evolving emotions and themes associated with the impact of COVID-19 on mental health support groups (eg, r/Depression and r/Anxiety) on Reddit (Reddit Inc) during the initial phase and after the peak of the pandemic using natural language processing techniques and statistical methods.

    Methods

    This study used data from the r/Depression and r/Anxiety Reddit communities, which consisted of posts contributed by 351,409 distinct users over a period spanning from 2019 to 2022. Topic modeling and Word2Vec embedding models were used to identify key terms associated with the targeted themes within the data set. A range of trend and thematic analysis techniques, including time-to-event analysis, heat map analysis, factor analysis, regression analysis, and k-means clustering analysis, were used to analyze the data.

    Results

    The time-to-event analysis revealed that the first 28 days following a major event could be considered a critical window for mental health concerns to become more prominent. The theme trend analysis revealed key themes such as economic stress, social stress, suicide, and substance use, with varying trends and impacts in each community. The factor analysis highlighted pandemic-related stress, economic concerns, and social factors as primary themes during the analyzed period. Regression analysis showed that economic stress consistently demonstrated the strongest association with the suicide theme, whereas the substance theme had a notable association in both data sets. Finally, the k-means clustering analysis showed that in r/Depression, the number of posts related to the “depression, anxiety, and medication” cluster decreased after 2020, whereas the “social relationships and friendship” cluster showed a steady decrease. In r/Anxiety, the “general anxiety and feelings of unease” cluster peaked in April 2020 and remained high, whereas the “physical symptoms of anxiety” cluster showed a slight increase.

    Conclusions

    This study sheds light on the impact of COVID-19 on mental health and the related themes discussed in 2 web-based communities during the pandemic. The results offer valuable insights for developing targeted interventions and policies to support individuals and communities in similar crises.

     
    more » « less
  4. The COVID-19 pandemic has dramatically altered family life in the United States. Over the long duration of the pandemic, parents had to adapt to shifting work conditions, virtual schooling, the closure of daycare facilities, and the stress of not only managing households without domestic and care supports but also worrying that family members may contract the novel coronavirus. Reports early in the pandemic suggest that these burdens have fallen disproportionately on mothers, creating concerns about the long-term implications of the pandemic for gender inequality and mothers’ well-being. Nevertheless, less is known about how parents’ engagement in domestic labor and paid work has changed throughout the pandemic, what factors may be driving these changes, and what the long-term consequences of the pandemic may be for the gendered division of labor and gender inequality more generally.

    The Study on U.S. Parents’ Divisions of Labor During COVID-19 (SPDLC) collects longitudinal survey data from partnered U.S. parents that can be used to assess changes in parents’ divisions of domestic labor, divisions of paid labor, and well-being throughout and after the COVID-19 pandemic. The goal of SPDLC is to understand both the short- and long-term impacts of the pandemic for the gendered division of labor, work-family issues, and broader patterns of gender inequality.

    Survey data for this study is collected using Prolifc (www.prolific.co), an opt-in online platform designed to facilitate scientific research. The sample is comprised U.S. adults who were residing with a romantic partner and at least one biological child (at the time of entry into the study). In each survey, parents answer questions about both themselves and their partners. Wave 1 of SPDLC was conducted in April 2020, and parents who participated in Wave 1 were asked about their division of labor both prior to (i.e., early March 2020) and one month after the pandemic began. Wave 2 of SPDLC was collected in November 2020. Parents who participated in Wave 1 were invited to participate again in Wave 2, and a new cohort of parents was also recruited to participate in the Wave 2 survey. Wave 3 of SPDLC was collected in October 2021. Parents who participated in either of the first two waves were invited to participate again in Wave 3, and another new cohort of parents was also recruited to participate in the Wave 3 survey. This research design (follow-up survey of panelists and new cross-section of parents at each wave) will continue through 2024, culminating in six waves of data spanning the period from March 2020 through October 2024. An estimated total of approximately 6,500 parents will be surveyed at least once throughout the duration of the study.

    SPDLC data will be released to the public two years after data is collected; Waves 1 and 2 are currently publicly available. Wave 3 will be publicly available in October 2023, with subsequent waves becoming available yearly. Data will be available to download in both SPSS (.sav) and Stata (.dta) formats, and the following data files will be available: (1) a data file for each individual wave, which contains responses from all participants in that wave of data collection, (2) a longitudinal panel data file, which contains longitudinal follow-up data from all available waves, and (3) a repeated cross-section data file, which contains the repeated cross-section data (from new respondents at each wave) from all available waves. Codebooks for each survey wave and a detailed user guide describing the data are also available. Response Rates: Of the 1,157 parents who participated in Wave 1, 828 (72%) also participated in the Wave 2 study. Presence of Common Scales: The following established scales are included in the survey:
    • Self-Efficacy, adapted from Pearlin's mastery scale (Pearlin et al., 1981) and the Rosenberg self-esteem scale (Rosenberg, 2015) and taken from the American Changing Lives Survey
    • Communication with Partner, taken from the Marriage and Relationship Survey (Lichter & Carmalt, 2009)
    • Gender Attitudes, taken from the National Survey of Families and Households (Sweet & Bumpass, 1996)
    • Depressive Symptoms (CES-D-10)
    • Stress, measured using Cohen's Perceived Stress Scale (Cohen, Kamarck, & Mermelstein, 1983)
    Full details about these scales and all other items included in the survey can be found in the user guide and codebook
    The second wave of the SPDLC was fielded in November 2020 in two stages. In the first stage, all parents who participated in W1 of the SPDLC and who continued to reside in the United States were re-contacted and asked to participate in a follow-up survey. The W2 survey was posted on Prolific, and messages were sent via Prolific’s messaging system to all previous participants. Multiple follow-up messages were sent in an attempt to increase response rates to the follow-up survey. Of the 1,157 respondents who completed the W1 survey, 873 at least started the W2 survey. Data quality checks were employed in line with best practices for online surveys (e.g., removing respondents who did not complete most of the survey or who did not pass the attention filters). After data quality checks, 5.2% of respondents were removed from the sample, resulting in a final sample size of 828 parents (a response rate of 72%).

    In the second stage, a new sample of parents was recruited. New parents had to meet the same sampling criteria as in W1 (be at least 18 years old, reside in the United States, reside with a romantic partner, and be a parent living with at least one biological child). Also similar to the W1 procedures, we oversampled men, Black individuals, individuals who did not complete college, and individuals who identified as politically conservative to increase sample diversity. A total of 1,207 parents participated in the W2 survey. Data quality checks led to the removal of 5.7% of the respondents, resulting in a final sample size of new respondents at Wave 2 of 1,138 parents.

    In both stages, participants were informed that the survey would take approximately 20 minutes to complete. All panelists were provided monetary compensation in line with Prolific’s compensation guidelines, which require that all participants earn above minimum wage for their time participating in studies.
    To be included in SPDLC, respondents had to meet the following sampling criteria at the time they enter the study: (a) be at least 18 years old, (b) reside in the United States, (c) reside with a romantic partner (i.e., be married or cohabiting), and (d) be a parent living with at least one biological child. Follow-up respondents must be at least 18 years old and reside in the United States, but may experience changes in relationship and resident parent statuses. Smallest Geographic Unit: U.S. State

    This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. In accordance with this license, all users of these data must give appropriate credit to the authors in any papers, presentations, books, or other works that use the data. A suggested citation to provide attribution for these data is included below:            

    Carlson, Daniel L. and Richard J. Petts. 2022. Study on U.S. Parents’ Divisions of Labor During COVID-19 User Guide: Waves 1-2.  

    To help provide estimates that are more representative of U.S. partnered parents, the SPDLC includes sampling weights. Weights can be included in statistical analyses to make estimates from the SPDLC sample representative of U.S. parents who reside with a romantic partner (married or cohabiting) and a child aged 18 or younger based on age, race/ethnicity, and gender. National estimates for the age, racial/ethnic, and gender profile of U.S. partnered parents were obtained using data from the 2020 Current Population Survey (CPS). Weights were calculated using an iterative raking method, such that the full sample in each data file matches the nationally representative CPS data in regard to the gender, age, and racial/ethnic distributions within the data. This variable is labeled CPSweightW2 in the Wave 2 dataset, and CPSweightLW2 in the longitudinal dataset (which includes Waves 1 and 2). There is not a weight variable included in the W1-W2 repeated cross-section data file.
     
    more » « less
  5. null (Ed.)
    The sudden outbreak of the COVID-19 pandemic has brought drastic changes to people’s daily lives, work, and the surrounding environment. Investigations into these changes are very important for decision makers to implement policies on economic loss assessments and stimulation packages, city reopening, resilience of the environment, and arrangement of medical resources. In order to analyze the impact of COVID-19 on people’s lives, activities, and the natural environment, this paper investigates the spatial and temporal characteristics of Nighttime Light (NTL) radiance and Air Quality Index (AQI) before and during the pandemic in mainland China. The monthly mean NTL radiance, and daily and monthly mean AQI are calculated over mainland China and compared before and during the pandemic. Our results show that the monthly average NTL brightness is much lower during the quarantine period than before. This study categorizes NTL into three classes: residential area, transportation, and public facilities and commercial centers, with NTL radiance ranges of 5–20, 20–40 and greater than 40 (nW· cm − 2 · sr − 1 ), respectively. We found that the Number of Pixels (NOP) with NTL detection increased in the residential area and decreased in the commercial centers for most of the provinces after the shutdown, while transportation and public facilities generally stayed the same. More specifically, we examined these factors in Wuhan, where the first confirmed cases were reported, and where the earliest quarantine measures were taken. Observations and analysis of pixels associated with commercial centers were observed to have lower NTL radiance values, indicating a dimming behavior, while residential area pixels recorded increased levels of brightness after the beginning of the lockdown. The study also discovered a significant decreasing trend in the daily average AQI for mainland China from January to March 2020, with cleaner air in most provinces during February and March, compared to January 2020. In conclusion, the outbreak and spread of COVID-19 has had a crucial impact on people’s daily lives and activity ranges through the increased implementation of lockdown and quarantine policies. On the other hand, the air quality of mainland China has improved with the reduction in non-essential industries and motor vehicle usage. This evidence demonstrates that the Chinese government has executed very stringent quarantine policies to deal with the pandemic. The decisive response to control the spread of COVID-19 provides a reference for other parts of the world. 
    more » « less