skip to main content


Title: An NLP approach to identify SDoH-related circumstance and suicide crisis from death investigation narratives
Abstract Objectives

Suicide presents a major public health challenge worldwide, affecting people across the lifespan. While previous studies revealed strong associations between Social Determinants of Health (SDoH) and suicide deaths, existing evidence is limited by the reliance on structured data. To resolve this, we aim to adapt a suicide-specific SDoH ontology (Suicide-SDoHO) and use natural language processing (NLP) to effectively identify individual-level SDoH-related social risks from death investigation narratives.

Materials and Methods

We used the latest National Violent Death Report System (NVDRS), which contains 267 804 victim suicide data from 2003 to 2019. After adapting the Suicide-SDoHO, we developed a transformer-based model to identify SDoH-related circumstances and crises in death investigation narratives. We applied our model retrospectively to annotate narratives whose crisis variables were not coded in NVDRS. The crisis rates were calculated as the percentage of the group’s total suicide population with the crisis present.

Results

The Suicide-SDoHO contains 57 fine-grained circumstances in a hierarchical structure. Our classifier achieves AUCs of 0.966 and 0.942 for classifying circumstances and crises, respectively. Through the crisis trend analysis, we observed that not everyone is equally affected by SDoH-related social risks. For the economic stability crisis, our result showed a significant increase in crisis rate in 2007–2009, parallel with the Great Recession.

Conclusions

This is the first study curating a Suicide-SDoHO using death investigation narratives. We showcased that our model can effectively classify SDoH-related social risks through NLP approaches. We hope our study will facilitate the understanding of suicide crises and inform effective prevention strategies.

 
more » « less
Award ID(s):
2019844
NSF-PAR ID:
10433260
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the American Medical Informatics Association
Volume:
30
Issue:
8
ISSN:
1067-5027
Format(s):
Medium: X Size: p. 1408-1417
Size(s):
["p. 1408-1417"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objective

    The impact of social determinants of health (SDoH) on patients’ healthcare quality and the disparity is well known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to automatically extract SDoH information from clinical notes.

    Materials and Methods

    The study uses the N2C2 Shared Task data, which were collected from 2 sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for 12 SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes.

    Results

    Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared with the shared task methods. Our approach achieved an F1 of 0.9101, 0.8053, and 0.9025 for Subtasks A, B, and C, respectively.

    Conclusions

    The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue and further research is needed to improve the extraction of entities with complex semantic meanings and low-frequency entities. We have made the source code available at https://github.com/Zephyr1022/SDOH-N2C2-UTSA.

     
    more » « less
  2. Background

    The COVID-19 pandemic has resulted in heightened levels of depression, anxiety, and other mental health issues due to sudden changes in daily life, such as economic stress, social isolation, and educational irregularity. Accurately assessing emotional and behavioral changes in response to the pandemic can be challenging, but it is essential to understand the evolving emotions, themes, and discussions surrounding the impact of COVID-19 on mental health.

    Objective

    This study aims to understand the evolving emotions and themes associated with the impact of COVID-19 on mental health support groups (eg, r/Depression and r/Anxiety) on Reddit (Reddit Inc) during the initial phase and after the peak of the pandemic using natural language processing techniques and statistical methods.

    Methods

    This study used data from the r/Depression and r/Anxiety Reddit communities, which consisted of posts contributed by 351,409 distinct users over a period spanning from 2019 to 2022. Topic modeling and Word2Vec embedding models were used to identify key terms associated with the targeted themes within the data set. A range of trend and thematic analysis techniques, including time-to-event analysis, heat map analysis, factor analysis, regression analysis, and k-means clustering analysis, were used to analyze the data.

    Results

    The time-to-event analysis revealed that the first 28 days following a major event could be considered a critical window for mental health concerns to become more prominent. The theme trend analysis revealed key themes such as economic stress, social stress, suicide, and substance use, with varying trends and impacts in each community. The factor analysis highlighted pandemic-related stress, economic concerns, and social factors as primary themes during the analyzed period. Regression analysis showed that economic stress consistently demonstrated the strongest association with the suicide theme, whereas the substance theme had a notable association in both data sets. Finally, the k-means clustering analysis showed that in r/Depression, the number of posts related to the “depression, anxiety, and medication” cluster decreased after 2020, whereas the “social relationships and friendship” cluster showed a steady decrease. In r/Anxiety, the “general anxiety and feelings of unease” cluster peaked in April 2020 and remained high, whereas the “physical symptoms of anxiety” cluster showed a slight increase.

    Conclusions

    This study sheds light on the impact of COVID-19 on mental health and the related themes discussed in 2 web-based communities during the pandemic. The results offer valuable insights for developing targeted interventions and policies to support individuals and communities in similar crises.

     
    more » « less
  3. Abstract Background

    Pediatric hematopoietic stem cell transplantation (HCT) is an intensive medical procedure that places substantial financial and logistical burdens on families and is associated with significant health risks, such as graft‐versus‐host disease (GVHD), and infections. The influence of the social determinants of health (SDoH) on outcomes following pediatric HCT is understudied. This study aimed to examine whether SDoH predicts outcomes following pediatric HCT.

    Procedure

    Data were collected from 84 children who received HCT (Mage = 5.8 years, SD = 3.7) and their primary caregiver. Detailed demographic information was collected from caregivers at baseline, and child health information was extracted from the electronic medical records. Multivariate logistic regression was used to examine the association between SDoH and health outcomes within a 24‐month period following pediatric HCT.

    Results

    After controlling for malignancy as reason for transplant and donor type, lower family income predicted the incidence of chronic GVHD. Neighborhood deprivation, total family income, public health insurance, caregiver relationship status, caregiver educational attainment, and perceived family financial difficulties did not predict acute GVHD or the number of infections.

    Conclusions

    Total family income is a simple family indicator of SDoH that predicts chronic GVHD after pediatric allogeneic HCT. These findings provide further support for the importance of screening of child and family SDoH risks to ensure that fundamental needs can be met to mitigate potential health disparities for up to 2 years following pediatric HCT.

     
    more » « less
  4. Introduction

    With the increasing utilization of text-based suicide crisis counseling, new means of identifying at risk clients must be explored. Natural language processing (NLP) holds promise for evaluating the content of crisis counseling; here we use a data-driven approach to evaluate NLP methods in identifying client suicide risk.

    Methods

    De-identified crisis counseling data from a regional text-based crisis encounter and mobile tipline application were used to evaluate two modeling approaches in classifying client suicide risk levels. A manual evaluation of model errors and system behavior was conducted.

    Results

    The neural model outperformed a term frequency-inverse document frequency (tf-idf) model in the false-negative rate. While 75% of the neural model’s false negative encounters had some discussion of suicidality, 62.5% saw a resolution of the client’s initial concerns. Similarly, the neural model detected signals of suicidality in 60.6% of false-positive encounters.

    Discussion

    The neural model demonstrated greater sensitivity in the detection of client suicide risk. A manual assessment of errors and model performance reflected these same findings, detecting higher levels of risk in many of the false-positive encounters and lower levels of risk in many of the false negatives. NLP-based models can detect the suicide risk of text-based crisis encounters from the encounter’s content.

     
    more » « less
  5. null (Ed.)
    Background The COVID-19 pandemic has caused several disruptions in personal and collective lives worldwide. The uncertainties surrounding the pandemic have also led to multifaceted mental health concerns, which can be exacerbated with precautionary measures such as social distancing and self-quarantining, as well as societal impacts such as economic downturn and job loss. Despite noting this as a “mental health tsunami”, the psychological effects of the COVID-19 crisis remain unexplored at scale. Consequently, public health stakeholders are currently limited in identifying ways to provide timely and tailored support during these circumstances. Objective Our study aims to provide insights regarding people’s psychosocial concerns during the COVID-19 pandemic by leveraging social media data. We aim to study the temporal and linguistic changes in symptomatic mental health and support expressions in the pandemic context. Methods We obtained about 60 million Twitter streaming posts originating from the United States from March 24 to May 24, 2020, and compared these with about 40 million posts from a comparable period in 2019 to attribute the effect of COVID-19 on people’s social media self-disclosure. Using these data sets, we studied people’s self-disclosure on social media in terms of symptomatic mental health concerns and expressions of support. We employed transfer learning classifiers that identified the social media language indicative of mental health outcomes (anxiety, depression, stress, and suicidal ideation) and support (emotional and informational support). We then examined the changes in psychosocial expressions over time and language, comparing the 2020 and 2019 data sets. Results We found that all of the examined psychosocial expressions have significantly increased during the COVID-19 crisis—mental health symptomatic expressions have increased by about 14%, and support expressions have increased by about 5%, both thematically related to COVID-19. We also observed a steady decline and eventual plateauing in these expressions during the COVID-19 pandemic, which may have been due to habituation or due to supportive policy measures enacted during this period. Our language analyses highlighted that people express concerns that are specific to and contextually related to the COVID-19 crisis. Conclusions We studied the psychosocial effects of the COVID-19 crisis by using social media data from 2020, finding that people’s mental health symptomatic and support expressions significantly increased during the COVID-19 period as compared to similar data from 2019. However, this effect gradually lessened over time, suggesting that people adapted to the circumstances and their “new normal.” Our linguistic analyses revealed that people expressed mental health concerns regarding personal and professional challenges, health care and precautionary measures, and pandemic-related awareness. This study shows the potential to provide insights to mental health care and stakeholders and policy makers in planning and implementing measures to mitigate mental health risks amid the health crisis. 
    more » « less