skip to main content


Title: Natural language processing for mental health interventions: a systematic review and research framework
Abstract

Neuropsychiatric disorders pose a high societal cost, but their treatment is hindered by lack of objective outcomes and fidelity metrics. AI technologies and specifically Natural Language Processing (NLP) have emerged as tools to study mental health interventions (MHI) at the level of their constituent conversations. However, NLP’s potential to address clinical and research challenges remains unclear. We therefore conducted a pre-registered systematic review of NLP-MHI studies using PRISMA guidelines (osf.io/s52jh) to evaluate their models, clinical applications, and to identify biases and gaps. Candidate studies (n = 19,756), including peer-reviewed AI conference manuscripts, were collected up to January 2023 through PubMed, PsycINFO, Scopus, Google Scholar, and ArXiv. A total of 102 articles were included to investigate their computational characteristics (NLP algorithms, audio features, machine learning pipelines, outcome metrics), clinical characteristics (clinical ground truths, study samples, clinical focus), and limitations. Results indicate a rapid growth of NLP MHI studies since 2019, characterized by increased sample sizes and use of large language models. Digital health platforms were the largest providers of MHI data. Ground truth for supervised learning models was based on clinician ratings (n = 31), patient self-report (n = 29) and annotations by raters (n = 26). Text-based features contributed more to model accuracy than audio markers. Patients’ clinical presentation (n = 34), response to intervention (n = 11), intervention monitoring (n = 20), providers’ characteristics (n = 12), relational dynamics (n = 14), and data preparation (n = 4) were commonly investigated clinical categories. Limitations of reviewed studies included lack of linguistic diversity, limited reproducibility, and population bias. A research framework is developed and validated (NLPxMHI) to assist computational and clinical researchers in addressing the remaining gaps in applying NLP to MHI, with the goal of improving clinical utility, data access, and fairness.

 
more » « less
NSF-PAR ID:
10467991
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Translational Psychiatry
Volume:
13
Issue:
1
ISSN:
2158-3188
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Over the past 2 decades, various desktop and mobile telemedicine systems have been developed to support communication and care coordination among distributed medical teams. However, in the hands-busy care environment, such technologies could become cumbersome because they require medical professionals to manually operate them. Smart glasses have been gaining momentum because of their advantages in enabling hands-free operation and see-what-I-see video-based consultation. Previous research has tested this novel technology in different health care settings. Objective The aim of this study was to review how smart glasses were designed, used, and evaluated as a telemedicine tool to support distributed care coordination and communication, as well as highlight the potential benefits and limitations regarding medical professionals’ use of smart glasses in practice. Methods We conducted a literature search in 6 databases that cover research within both health care and computer science domains. We used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology to review articles. A total of 5865 articles were retrieved and screened by 3 researchers, with 21 (0.36%) articles included for in-depth analysis. Results All of the reviewed articles (21/21, 100%) used off-the-shelf smart glass device and videoconferencing software, which had a high level of technology readiness for real-world use and deployment in care settings. The common system features used and evaluated in these studies included video and audio streaming, annotation, augmented reality, and hands-free interactions. These studies focused on evaluating the technical feasibility, effectiveness, and user experience of smart glasses. Although the smart glass technology has demonstrated numerous benefits and high levels of user acceptance, the reviewed studies noted a variety of barriers to successful adoption of this novel technology in actual care settings, including technical limitations, human factors and ergonomics, privacy and security issues, and organizational challenges. Conclusions User-centered system design, improved hardware performance, and software reliability are needed to realize the potential of smart glasses. More research is needed to examine and evaluate medical professionals’ needs, preferences, and perceptions, as well as elucidate how smart glasses affect the clinical workflow in complex care environments. Our findings inform the design, implementation, and evaluation of smart glasses that will improve organizational and patient outcomes. 
    more » « less
  2. Background In the last decade, there has been a rapid increase in research on the use of artificial intelligence (AI) to improve child and youth participation in daily life activities, which is a key rehabilitation outcome. However, existing reviews place variable focus on participation, are narrow in scope, and are restricted to select diagnoses, hindering interpretability regarding the existing scope of AI applications that target the participation of children and youth in a pediatric rehabilitation setting. Objective The aim of this scoping review is to examine how AI is integrated into pediatric rehabilitation interventions targeting the participation of children and youth with disabilities or other diagnosed health conditions in valued activities. Methods We conducted a comprehensive literature search using established Applied Health Sciences and Computer Science databases. Two independent researchers screened and selected the studies based on a systematic procedure. Inclusion criteria were as follows: participation was an explicit study aim or outcome or the targeted focus of the AI application; AI was applied as part of the provided and tested intervention; children or youth with a disability or other diagnosed health conditions were the focus of either the study or AI application or both; and the study was published in English. Data were mapped according to the types of AI, the mode of delivery, the type of personalization, and whether the intervention addressed individual goal-setting. Results The literature search identified 3029 documents, of which 94 met the inclusion criteria. Most of the included studies used multiple applications of AI with the highest prevalence of robotics (72/94, 77%) and human-machine interaction (51/94, 54%). Regarding mode of delivery, most of the included studies described an intervention delivered in-person (84/94, 89%), and only 11% (10/94) were delivered remotely. Most interventions were tailored to groups of individuals (93/94, 99%). Only 1% (1/94) of interventions was tailored to patients’ individually reported participation needs, and only one intervention (1/94, 1%) described individual goal-setting as part of their therapy process or intervention planning. Conclusions There is an increasing amount of research on interventions using AI to target the participation of children and youth with disabilities or other diagnosed health conditions, supporting the potential of using AI in pediatric rehabilitation. On the basis of our results, 3 major gaps for further research and development were identified: a lack of remotely delivered participation-focused interventions using AI; a lack of individual goal-setting integrated in interventions; and a lack of interventions tailored to individually reported participation needs of children, youth, or families. 
    more » « less
  3. Background While there are thousands of behavioral health apps available to consumers, users often quickly discontinue their use, which limits their therapeutic value. By varying the types and number of ways that users can interact with behavioral health mobile health apps, developers may be able to support greater therapeutic engagement and increase app stickiness. Objective The main objective of this analysis was to systematically characterize the types of user interactions that are available in behavioral health apps and then examine if greater interactivity was associated with greater user satisfaction, as measured by app metrics. Methods Using a modified PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology, we searched several different app clearinghouse websites and identified 76 behavioral health apps that included some type of interactivity. We then filtered the results to ensure we were examining behavioral health apps and further refined our search to include apps that identified one or more of the following terms: peer or therapist forum, discussion, feedback, professional, licensed, buddy, friend, artificial intelligence, chatbot, counselor, therapist, provider, mentor, bot, coach, message, comment, chat room, community, games, care team, connect, share, and support in the app descriptions. In the final group of 34 apps, we examined the presence of 6 types of human-machine interactivities: human-to-human with peers, human-to-human with providers, human-to–artificial intelligence, human-to-algorithms, human-to-data, and novel interactive smartphone modalities. We also downloaded information on app user ratings and visibility, as well as reviewed other key app features. Results We found that on average, the 34 apps reviewed included 2.53 (SD 1.05; range 1-5) features of interactivity. The most common types of interactivities were human-to-data (n=34, 100%), followed by human-to-algorithm (n=15, 44.2%). The least common type of interactivity was human–artificial intelligence (n=7, 20.5%). There were no significant associations between the total number of app interactivity features and user ratings or app visibility. We found that a full range of therapeutic interactivity features were not used in behavioral health apps. Conclusions Ideally, app developers would do well to include more interactivity features in behavioral health apps in order to fully use the capabilities of smartphone technologies and increase app stickiness. Theoretically, increased user engagement would occur by using multiple types of user interactivity, thereby maximizing the benefits that a person would receive when using a mobile health app. 
    more » « less
  4. Objective: We present a literature review on workplace physical fatigue interventions, focusing on evaluating the methodological quality and strength of evidence. Background: Physical fatigue is a recognized workplace problem, with negative effects on performance and health-related complaints. Although many studies have focused on the mechanisms and consequences of fatigue, few have considered the effectiveness of interventions to mitigate fatigue. Method: A systematic review of the workplace safety literature for controlled trials of physical fatigue interventions was conducted. Data on intervention type, subject characteristics, targeted tasks and body locations, outcome measures, and study design were extracted. The methodological quality for each study was evaluated using the PEDro scale, and the level of evidence was based on quality, amount, and consistency. Results: Forty-five controlled trials were reviewed, examining 18 interventions. We categorized those interventions into individual-focused ( N = 28 studies, nine interventions), workplace-focused ( N = 12 studies, five interventions), and multiple interventions ( N = 5 studies, four interventions). We identified moderate evidence for interventions related to assistive devices and task variation. There was moderate evidence supporting no fatigue attenuation for the garment change category of interventions. The interventions in the remaining categories had limited to minimal evidence of efficacy. The heterogeneity of the included trials precludes the determination of effect size. Conclusion: This review showed a lack of high levels of evidence for the effectiveness of most physical fatigue interventions. Application: Due to a lack of high levels of evidence for any category of reviewed physical fatigue interventions, further high-quality studies are needed to establish the efficacy of others. 
    more » « less
  5. Abstract Background

    Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.

    Method

    We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged$$F_{1}$$F1score.

    Results

    On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features.

    Conclusion

    As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.

     
    more » « less