skip to main content


Title: ProKnow: Process knowledge for safety constrained and explainable question generation for mental health diagnostic assistance
Virtual Mental Health Assistants (VMHAs) are utilized in health care to provide patient services such as counseling and suggestive care. They are not used for patient diagnostic assistance because they cannot adhere to safety constraints and specialized clinical process knowledge ( ProKnow ) used to obtain clinical diagnoses. In this work, we define ProKnow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. We also introduce a new dataset of diagnostic conversations guided by safety constraints and ProKnow that healthcare professionals use ( ProKnow - data ). We develop a method for natural language question generation (NLG) that collects diagnostic information from the patient interactively ( ProKnow - algo ). We demonstrate the limitations of using state-of-the-art large-scale language models (LMs) on this dataset. ProKnow - algo incorporates the process knowledge through explicitly modeling safety, knowledge capture, and explainability. As computational metrics for evaluation do not directly translate to clinical settings, we involve expert clinicians in designing evaluation metrics that test four properties: safety, logical coherence, and knowledge capture for explainability while minimizing the standard cross entropy loss to preserve distribution semantics-based similarity to the ground truth. LMs with ProKnow - algo generated 89% safer questions in the depression and anxiety domain (tested property: safety ). Further, without ProKnow - algo generations question did not adhere to clinical process knowledge in ProKnow - data (tested property: knowledge capture ). In comparison, ProKnow - algo -based generations yield a 96% reduction in our metrics to measure knowledge capture. The explainability of the generated question is assessed by computing similarity with concepts in depression and anxiety knowledge bases. Overall, irrespective of the type of LMs, ProKnow - algo achieved an averaged 82% improvement over simple pre-trained LMs on safety, explainability, and process-guided question generation. For reproducibility, we will make ProKnow - data and the code repository of ProKnow - algo publicly available upon acceptance.  more » « less
Award ID(s):
2133842 1761931
NSF-PAR ID:
10429255
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Big Data
Volume:
5
ISSN:
2624-909X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Conversational Agents (CAs) powered with deep language models (DLMs) have shown tremendous promise in the domain of mental health. Prominently, the CAs have been used to provide informational or therapeutic services (e.g., cognitive behavioral therapy) to patients. However, the utility of CAs to assist in mental health triaging has not been explored in the existing work as it requires a controlled generation of follow-up questions (FQs), which are often initiated and guided by the mental health professionals (MHPs) in clinical settings. In the context of ‘depression’, our experiments show that DLMs coupled with process knowledge in a mental health questionnaire generate 12.54% and 9.37% better FQs based on similarity and longest common subsequence matches to questions in the PHQ-9 dataset respectively, when compared with DLMs without process knowledge support.Despite coupling with process knowledge, we find that DLMs are still prone to hallucination, i.e., generating redundant, irrelevant, and unsafe FQs. We demonstrate the challenge of using existing datasets to train a DLM for generating FQs that adhere to clinical process knowledge. To address this limitation, we prepared an extended PHQ-9 based dataset, PRIMATE, in collaboration with MHPs. PRIMATE contains annotations regarding whether a particular question in the PHQ-9 dataset has already been answered in the user’s initial description of the mental health condition. We used PRIMATE to train a DLM in a supervised setting to identify which of the PHQ-9 questions can be answered directly from the user’s post and which ones would require more information from the user. Using performance analysis based on MCC scores, we show that PRIMATE is appropriate for identifying questions in PHQ-9 that could guide generative DLMs towards controlled FQ generation (with minimal hallucination) suitable for aiding triaging. The dataset created as a part of this research can be obtained from https://github.com/primate-mh/Primate2022 
    more » « less
  2. Abstract Objective

    The use of electronic health records (EHRs) for clinical risk prediction is on the rise. However, in many practical settings, the limited availability of task-specific EHR data can restrict the application of standard machine learning pipelines. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks.

    Methods

    We propose two novel LM-based methods, namely “LLaMA2-EHR” and “Sent-e-Med.” Our focus is on utilizing the textual descriptions within structured EHRs to make risk predictions about future diagnoses. We conduct a comprehensive comparison with previous approaches across various data types and sizes.

    Results

    Experiments across 6 different methods and 3 separate risk prediction tasks reveal that employing LMs to represent structured EHRs, such as diagnostic histories, results in significant performance improvements when evaluated using standard metrics such as area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Additionally, they offer benefits such as few-shot learning, the ability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. However, it is noteworthy that outcomes may exhibit sensitivity to a specific prompt.

    Conclusion

    LMs encompass extensive embedded knowledge, making them valuable for the analysis of EHRs in the context of risk prediction. Nevertheless, it is important to exercise caution in their application, as ongoing safety concerns related to LMs persist and require continuous consideration.

     
    more » « less
  3. Artificial Intelligence (AI) systems for mental healthcare (MHCare) have been ever-growing after realizing the importance of early interventions for patients with chronic mental health (MH) conditions. Social media (SocMedia) emerged as the go-to platform for supporting patients seeking MHCare. The creation of peer-support groups without social stigma has resulted in patients transitioning from clinical settings to SocMedia supported interactions for quick help. Researchers started exploring SocMedia content in search of cues that showcase correlation or causation between different MH conditions to design better interventional strategies. User-level Classification-based AI systems were designed to leverage diverse SocMedia data from various MH conditions, to predict MH conditions. Subsequently, researchers created classification schemes to measure the severity of each MH condition. Such ad-hoc schemes, engineered features, and models not only require a large amount of data but fail to allow clinically acceptable and explainable reasoning over the outcomes. To improve Neural-AI for MHCare, infusion of clinical symbolic knowledge that clinicans use in decision making is required. An impactful use case of Neural-AI systems in MH is conversational systems. These systems require coordination between classification and generation to facilitate humanistic conversation in conversational agents (CA). Current CAs with deep language models lack factual correctness, medical relevance, and safety in their generations, which intertwine with unexplainable statistical classification techniques. This lecture-style tutorial will demonstrate our investigations into Neuro-symbolic methods of infusing clinical knowledge to improve the outcomes of Neural-AI systems to improve interventions for MHCare:(a) We will discuss the use of diverse clinical knowledge in creating specialized datasets to train Neural-AI systems effectively. (b) Patients with cardiovascular disease express MH symptoms differently based on gender differences. We will show that knowledge-infused Neural-AI systems can identify gender-specific MH symptoms in such patients. (c) We will describe strategies for infusing clinical process knowledge as heuristics and constraints to improve language models in generating relevant questions and responses. 
    more » « less
  4. Abstract

    Although digital health solutions are increasingly popular in clinical psychiatry, one application that has not been fully explored is the utilization of survey technology to monitor patients outside of the clinic. Supplementing routine care with digital information collected in the “clinical whitespace” between visits could improve care for patients with severe mental illness. This study evaluated the feasibility and validity of using online self-report questionnaires to supplement in-person clinical evaluations in persons with and without psychiatric diagnoses. We performed a rigorous in-person clinical diagnostic and assessment battery in 54 participants with schizophrenia (N = 23), depressive disorder (N = 14), and healthy controls (N = 17) using standard assessments for depressive and psychotic symptomatology. Participants were then asked to complete brief online assessments of depressive (Quick Inventory of Depressive Symptomatology) and psychotic (Community Assessment of Psychic Experiences) symptoms outside of the clinic for comparison with the ground-truth in-person assessments. We found that online self-report ratings of severity were significantly correlated with the clinical assessments for depression (two assessments used: R = 0.63, p < 0.001; R = 0.73, p < 0.001) and psychosis (R = 0.62, p < 0.001). Our results demonstrate the feasibility and validity of collecting psychiatric symptom ratings through online surveys. Surveillance of this kind may be especially useful in detecting acute mental health crises between patient visits and can generally contribute to more comprehensive psychiatric treatment.

     
    more » « less
  5. Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so. 
    more » « less