skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A quantitative linguistic analysis of a cancer online health community with a smooth latent space model
Online health communities (OHCs) provide free, open, and well-resourced platforms for patients, family members, and others to discuss illnesses, express feelings, and connect with others. Linguistic analysis of OHC posts can assist in better understanding disease conditions as well as monitoring the emotional and mental status of patients and those who are closely related. Many existing OHC linguistic analyses are limited by focusing on individual words. There are a handful of cooccurrence network analyses, which have multiple methodological limitations. In this article we analyze posts that are publicly available at the LUNGevity Foundation’s Lung Cancer Support Community (LCSC). The analyzed data contains 21,028 posts published between April 2018 and February 2022. For word cooccurrence network analysis, we develop a two-part latent space model, which advances from the existing ones by accommodating network weights. Further, we consider the scenario where there are change points in time, networks remain the same between two change points but differ on the two sides of a change point, and the number and locations of change points are unknown. A penalized fusion approach is developed to data-dependently determine change points and estimate networks. In data analysis multiple change points are identified, which reflect significant changes in lung cancer patients’ and their close affiliates’ emotional/mental status and mostly align with the changes in COVID-19. The obtained network structures and other findings are also sensible.  more » « less
Award ID(s):
2209685
PAR ID:
10512836
Author(s) / Creator(s):
; ;
Publisher / Repository:
Institute of Mathematical Statistics
Date Published:
Journal Name:
The Annals of Applied Statistics
Volume:
18
Issue:
1
ISSN:
1932-6157
Subject(s) / Keyword(s):
Cancer , cooccurrence network , online health community , quantitative linguistic analysis , smooth latent space model
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Yanwu, Xu (Ed.)
    Lung cancer is a major cause of cancer-related deaths, and early diagnosis and treatment are crucial for improving patients’ survival outcomes. In this paper, we propose to employ convolutional neural networks to model the non-linear relationship between the risk of lung cancer and the lungs’ morphology revealed in the CT images. We apply a mini-batched loss that extends the Cox proportional hazards model to handle the non-convexity induced by neural networks, which also enables the training of large data sets. Additionally, we propose to combine mini-batched loss and binary cross-entropy to predict both lung cancer occurrence and the risk of mortality. Simulation results demonstrate the effectiveness of both the mini-batched loss with and without the censoring mechanism, as well as its combination with binary cross-entropy. We evaluate our approach on the National Lung Screening Trial data set with several 3D convolutional neural network architectures, achieving high AUC and C-index scores for lung cancer classification and survival prediction. These results, obtained from simulations and real data experiments, highlight the potential of our approach to improving the diagnosis and treatment of lung cancer. 
    more » « less
  2. null (Ed.)
    Background The COVID-19 pandemic has caused several disruptions in personal and collective lives worldwide. The uncertainties surrounding the pandemic have also led to multifaceted mental health concerns, which can be exacerbated with precautionary measures such as social distancing and self-quarantining, as well as societal impacts such as economic downturn and job loss. Despite noting this as a “mental health tsunami”, the psychological effects of the COVID-19 crisis remain unexplored at scale. Consequently, public health stakeholders are currently limited in identifying ways to provide timely and tailored support during these circumstances. Objective Our study aims to provide insights regarding people’s psychosocial concerns during the COVID-19 pandemic by leveraging social media data. We aim to study the temporal and linguistic changes in symptomatic mental health and support expressions in the pandemic context. Methods We obtained about 60 million Twitter streaming posts originating from the United States from March 24 to May 24, 2020, and compared these with about 40 million posts from a comparable period in 2019 to attribute the effect of COVID-19 on people’s social media self-disclosure. Using these data sets, we studied people’s self-disclosure on social media in terms of symptomatic mental health concerns and expressions of support. We employed transfer learning classifiers that identified the social media language indicative of mental health outcomes (anxiety, depression, stress, and suicidal ideation) and support (emotional and informational support). We then examined the changes in psychosocial expressions over time and language, comparing the 2020 and 2019 data sets. Results We found that all of the examined psychosocial expressions have significantly increased during the COVID-19 crisis—mental health symptomatic expressions have increased by about 14%, and support expressions have increased by about 5%, both thematically related to COVID-19. We also observed a steady decline and eventual plateauing in these expressions during the COVID-19 pandemic, which may have been due to habituation or due to supportive policy measures enacted during this period. Our language analyses highlighted that people express concerns that are specific to and contextually related to the COVID-19 crisis. Conclusions We studied the psychosocial effects of the COVID-19 crisis by using social media data from 2020, finding that people’s mental health symptomatic and support expressions significantly increased during the COVID-19 period as compared to similar data from 2019. However, this effect gradually lessened over time, suggesting that people adapted to the circumstances and their “new normal.” Our linguistic analyses revealed that people expressed mental health concerns regarding personal and professional challenges, health care and precautionary measures, and pandemic-related awareness. This study shows the potential to provide insights to mental health care and stakeholders and policy makers in planning and implementing measures to mitigate mental health risks amid the health crisis. 
    more » « less
  3. This clinical study presents a comprehensive investigation into the utility of breath analysis as a non-invasive method for the early detection of lung cancer. The study enrolled 14 lung cancer patients, 14 non-lung cancer controls with diverse medical conditions, and 3 tuberculosis (TB) patients for biomarker discovery. Matching criteria including age, gender, smoking history, and comorbidities were strictly followed to ensure reliable comparisons. A systematic breath sampling protocol utilizing a BIO-VOC sampler was employed, followed by VOC analysis using Thermal Desorption–Gas Chromatography–Mass Spectrometry (TD-GC/MS). The resulting VOC profiles were subjected to stringent statistical analysis, including Orthogonal Projections to Latent Structures—Discriminant Analysis (OPLS-DA), Kruskal–Wallis test, and Receiver Operating Characteristic (ROC) analysis. Notably, 13 VOCs exhibited statistically significant differences between lung cancer patients and controls. The combination of eight VOCs (hexanal, heptanal, octanal, benzaldehyde, undecane, phenylacetaldehyde, decanal, and benzoic acid) demonstrated substantial discriminatory power with an area under the curve (AUC) of 0.85, a sensitivity of 82%, and a specificity of 76% in the discovery set. Validation in an independent cohort yielded an AUC of 0.78, a sensitivity of 78%, and a specificity of 64%. Further analysis revealed that elevated aldehyde levels in lung cancer patients’ breath could be attributed to overactivated Alcohol Dehydrogenase (ADH) pathways in cancerous tissues. Addressing methodological challenges, this study employed a matching of physiological and pathological confounders, controlled room air samples, and standardized breath sampling techniques. Despite the limitations, this study’s findings emphasize the potential of breath analysis as a diagnostic tool for lung cancer and suggest its utility in differentiating tuberculosis from lung cancer. However, further research and validation are warranted for the translation of these findings into clinical practice. 
    more » « less
  4. ABSTRACT Graphical models are powerful tools to investigate complex dependency structures in high-throughput datasets. However, most existing graphical models make one of two canonical assumptions: (i) a homogeneous graph with a common network for all subjects or (ii) an assumption of normality, especially in the context of Gaussian graphical models. Both assumptions are restrictive and can fail to hold in certain applications such as proteomic networks in cancer. To this end, we propose an approach termed robust Bayesian graphical regression (rBGR) to estimate heterogeneous graphs for non-normally distributed data. rBGR is a flexible framework that accommodates non-normality through random marginal transformations and constructs covariate-dependent graphs to accommodate heterogeneity through graphical regression techniques. We formulate a new characterization of edge dependencies in such models called conditional sign independence with covariates, along with an efficient posterior sampling algorithm. In simulation studies, we demonstrate that rBGR outperforms existing graphical regression models for data generated under various levels of non-normality in both edge and covariate selection. We use rBGR to assess proteomic networks in lung and ovarian cancers to systematically investigate the effects of immunogenic heterogeneity within tumors. Our analyses reveal several important protein–protein interactions that are differentially associated with the immune cell abundance; some corroborate existing biological knowledge, whereas others are novel findings. 
    more » « less
  5. Finding genes biologically directly or indirectly related to lung cancer has been drawing much attention, and many genes directly related to lung cancer have been reported. However, it has not been confirmed whether those published 'key' genes are truly critical to lung cancer formation, i.e., they may be with very limited useful information. As a result, finding essential genes remains a challenging lung cancer research problem. Using a recently developed competing linear factor analysis method in differentially expressed gene detection, we advance the study of lung cancer critical genes detection to a uniformly informative level. A set of common four genes and their functional effects are detected to be differentially expressed in tumor and non- tumor samples with 100% sensitivity and 100% specificity in one study of lung adenocarcinoma (LUAD) and one study of squamous cell lung cancers (LUSC) (two North American cohorts with 20429 genes, 576 and 552 samples respectively). Two additional analyses also gain accuracy of 97.8% sensitivity and 100% specificity in one study of non-small cell lung carcinomas (NSCLC, a European cohort with 20356 genes and 156 samples), and an accuracy of 100% sensitivity and 95% specificity (1 out of 20 non-tumor samples) in one study of ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas (LUAD, a Japanese cohort with 20356 genes and 224 samples). There are some common genes, but different functional effects, within each set of four genes among two North American cohorts and a European cohort and among North American cohorts and the Japanese cohort. These results show the four-gene-based classifiers are robust with different types of lung cancers and different race cohorts and accurate. The functional effects of four genes disclose significantly other mechanisms (mysteries) between LUAD and LUSC. These sets of four genes and their functional effects are considered to be essential for lung cancer studies and practice. These genes' functional effects naturally classify patients into different groups (more than seven subtypes). Subtype information is useful for personalized therapies. The new findings can motivate new lung cancer research in more focused and targeted directions to save lives, protect people, and reduce enormous economic costs in research and lung cancer treatments. 
    more » « less