skip to main content


This content will become publicly available on March 1, 2025

Title: A quantitative linguistic analysis of a cancer online health community with a smooth latent space model
Online health communities (OHCs) provide free, open, and well-resourced platforms for patients, family members, and others to discuss illnesses, express feelings, and connect with others. Linguistic analysis of OHC posts can assist in better understanding disease conditions as well as monitoring the emotional and mental status of patients and those who are closely related. Many existing OHC linguistic analyses are limited by focusing on individual words. There are a handful of cooccurrence network analyses, which have multiple methodological limitations. In this article we analyze posts that are publicly available at the LUNGevity Foundation’s Lung Cancer Support Community (LCSC). The analyzed data contains 21,028 posts published between April 2018 and February 2022. For word cooccurrence network analysis, we develop a two-part latent space model, which advances from the existing ones by accommodating network weights. Further, we consider the scenario where there are change points in time, networks remain the same between two change points but differ on the two sides of a change point, and the number and locations of change points are unknown. A penalized fusion approach is developed to data-dependently determine change points and estimate networks. In data analysis multiple change points are identified, which reflect significant changes in lung cancer patients’ and their close affiliates’ emotional/mental status and mostly align with the changes in COVID-19. The obtained network structures and other findings are also sensible.  more » « less
Award ID(s):
2209685
NSF-PAR ID:
10512836
Author(s) / Creator(s):
; ;
Publisher / Repository:
Institute of Mathematical Statistics
Date Published:
Journal Name:
The Annals of Applied Statistics
Volume:
18
Issue:
1
ISSN:
1932-6157
Subject(s) / Keyword(s):
Cancer , cooccurrence network , online health community , quantitative linguistic analysis , smooth latent space model
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This clinical study presents a comprehensive investigation into the utility of breath analysis as a non-invasive method for the early detection of lung cancer. The study enrolled 14 lung cancer patients, 14 non-lung cancer controls with diverse medical conditions, and 3 tuberculosis (TB) patients for biomarker discovery. Matching criteria including age, gender, smoking history, and comorbidities were strictly followed to ensure reliable comparisons. A systematic breath sampling protocol utilizing a BIO-VOC sampler was employed, followed by VOC analysis using Thermal Desorption–Gas Chromatography–Mass Spectrometry (TD-GC/MS). The resulting VOC profiles were subjected to stringent statistical analysis, including Orthogonal Projections to Latent Structures—Discriminant Analysis (OPLS-DA), Kruskal–Wallis test, and Receiver Operating Characteristic (ROC) analysis. Notably, 13 VOCs exhibited statistically significant differences between lung cancer patients and controls. The combination of eight VOCs (hexanal, heptanal, octanal, benzaldehyde, undecane, phenylacetaldehyde, decanal, and benzoic acid) demonstrated substantial discriminatory power with an area under the curve (AUC) of 0.85, a sensitivity of 82%, and a specificity of 76% in the discovery set. Validation in an independent cohort yielded an AUC of 0.78, a sensitivity of 78%, and a specificity of 64%. Further analysis revealed that elevated aldehyde levels in lung cancer patients’ breath could be attributed to overactivated Alcohol Dehydrogenase (ADH) pathways in cancerous tissues. Addressing methodological challenges, this study employed a matching of physiological and pathological confounders, controlled room air samples, and standardized breath sampling techniques. Despite the limitations, this study’s findings emphasize the potential of breath analysis as a diagnostic tool for lung cancer and suggest its utility in differentiating tuberculosis from lung cancer. However, further research and validation are warranted for the translation of these findings into clinical practice.

     
    more » « less
  2. Abstract Background

    Sexual differences across molecular levels profoundly impact cancer biology and outcomes. Patient gender significantly influences drug responses, with divergent reactions between men and women to the same drugs. Despite databases on sex differences in human tissues, understanding regulations of sex disparities in cancer is limited. These resources lack detailed mechanistic studies on sex-biased molecules.

    Methods

    In this study, we conducted a comprehensive examination of molecular distinctions and regulatory networks across 27 cancer types, delving into sex-biased effects. Our analyses encompassed sex-biased competitive endogenous RNA networks, regulatory networks involving sex-biased RNA binding protein-exon skipping events, sex-biased transcription factor-gene regulatory networks, as well as sex-biased expression quantitative trait loci, sex-biased expression quantitative trait methylation, sex-biased splicing quantitative trait loci, and the identification of sex-biased cancer therapeutic drug target genes. All findings from these analyses are accessible on SexAnnoDB (https://ccsm.uth.edu/SexAnnoDB/).

    Results

    From these analyses, we defined 126 cancer therapeutic target sex-associated genes. Among them, 9 genes showed sex-biased at both the mRNA and protein levels. Specifically,S100A9was the target of five drugs, of which calcium has been approved by the FDA for the treatment of colon and rectal cancers. Transcription factor (TF)-gene regulatory network analysis suggested that four TFs in the SARC male group targetedS100A9and upregulated the expression ofS100A9in these patients. Promoter region methylation status was only associated withS100A9expression in KIRP female patients. Hypermethylation inhibitedS100A9expression and was responsible for the downregulation ofS100A9in these female patients.

    Conclusions

    Comprehensive network and association analyses indicated that the sex differences at the transcriptome level were partially the result of corresponding sex-biased epigenetic and genetic molecules. Overall, SexAnnoDB offers a discipline-specific search platform that could potentially assist basic experimental researchers or physicians in developing personalized treatment plans.

     
    more » « less
  3. Finding genes biologically directly or indirectly related to lung cancer has been drawing much attention, and many genes directly related to lung cancer have been reported. However, it has not been confirmed whether those published 'key' genes are truly critical to lung cancer formation, i.e., they may be with very limited useful information. As a result, finding essential genes remains a challenging lung cancer research problem. Using a recently developed competing linear factor analysis method in differentially expressed gene detection, we advance the study of lung cancer critical genes detection to a uniformly informative level. A set of common four genes and their functional effects are detected to be differentially expressed in tumor and non- tumor samples with 100% sensitivity and 100% specificity in one study of lung adenocarcinoma (LUAD) and one study of squamous cell lung cancers (LUSC) (two North American cohorts with 20429 genes, 576 and 552 samples respectively). Two additional analyses also gain accuracy of 97.8% sensitivity and 100% specificity in one study of non-small cell lung carcinomas (NSCLC, a European cohort with 20356 genes and 156 samples), and an accuracy of 100% sensitivity and 95% specificity (1 out of 20 non-tumor samples) in one study of ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas (LUAD, a Japanese cohort with 20356 genes and 224 samples). There are some common genes, but different functional effects, within each set of four genes among two North American cohorts and a European cohort and among North American cohorts and the Japanese cohort. These results show the four-gene-based classifiers are robust with different types of lung cancers and different race cohorts and accurate. The functional effects of four genes disclose significantly other mechanisms (mysteries) between LUAD and LUSC. These sets of four genes and their functional effects are considered to be essential for lung cancer studies and practice. These genes' functional effects naturally classify patients into different groups (more than seven subtypes). Subtype information is useful for personalized therapies. The new findings can motivate new lung cancer research in more focused and targeted directions to save lives, protect people, and reduce enormous economic costs in research and lung cancer treatments. 
    more » « less
  4. Abstract

    Applying an abductive mixed‐methods approach, we investigate the informal status systems in three women's prison units (across two prisons) and one men's prison unit. Qualitative analyses suggest “old head” narratives—where age, time in prison, sociability, and prison wisdom confer unit status—are prevalent across all four contexts. Perceptions of maternal “caregivers” and manipulative “bullies,” however, are found only in the three women's units. The qualitative findings inform formal network analyses by differentiating “positive,” “neutral,” and “negative” status nominations, with “negative” ties primarily absent from the men's unit. Within the women's units, network analyses find that high‐status women are likely to receive both positive and negative peer nominations, such that evaluations depend on who is doing the evaluating. Comparing the women's and men's networks, the correlates of positive and neutral ties are generally the same and center on covariates of age, getting along with others, race, and religion. Overall, the study points to important similarities and differences in status across the gendered prison contexts, while demonstrating how a sequential mixed‐methods design can illuminate both the meaning and the structure of prison informal organization.

     
    more » « less
  5. null (Ed.)
    Background The COVID-19 pandemic has caused several disruptions in personal and collective lives worldwide. The uncertainties surrounding the pandemic have also led to multifaceted mental health concerns, which can be exacerbated with precautionary measures such as social distancing and self-quarantining, as well as societal impacts such as economic downturn and job loss. Despite noting this as a “mental health tsunami”, the psychological effects of the COVID-19 crisis remain unexplored at scale. Consequently, public health stakeholders are currently limited in identifying ways to provide timely and tailored support during these circumstances. Objective Our study aims to provide insights regarding people’s psychosocial concerns during the COVID-19 pandemic by leveraging social media data. We aim to study the temporal and linguistic changes in symptomatic mental health and support expressions in the pandemic context. Methods We obtained about 60 million Twitter streaming posts originating from the United States from March 24 to May 24, 2020, and compared these with about 40 million posts from a comparable period in 2019 to attribute the effect of COVID-19 on people’s social media self-disclosure. Using these data sets, we studied people’s self-disclosure on social media in terms of symptomatic mental health concerns and expressions of support. We employed transfer learning classifiers that identified the social media language indicative of mental health outcomes (anxiety, depression, stress, and suicidal ideation) and support (emotional and informational support). We then examined the changes in psychosocial expressions over time and language, comparing the 2020 and 2019 data sets. Results We found that all of the examined psychosocial expressions have significantly increased during the COVID-19 crisis—mental health symptomatic expressions have increased by about 14%, and support expressions have increased by about 5%, both thematically related to COVID-19. We also observed a steady decline and eventual plateauing in these expressions during the COVID-19 pandemic, which may have been due to habituation or due to supportive policy measures enacted during this period. Our language analyses highlighted that people express concerns that are specific to and contextually related to the COVID-19 crisis. Conclusions We studied the psychosocial effects of the COVID-19 crisis by using social media data from 2020, finding that people’s mental health symptomatic and support expressions significantly increased during the COVID-19 period as compared to similar data from 2019. However, this effect gradually lessened over time, suggesting that people adapted to the circumstances and their “new normal.” Our linguistic analyses revealed that people expressed mental health concerns regarding personal and professional challenges, health care and precautionary measures, and pandemic-related awareness. This study shows the potential to provide insights to mental health care and stakeholders and policy makers in planning and implementing measures to mitigate mental health risks amid the health crisis. 
    more » « less