skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Design and Evaluation Challenges of Conversational Agents in Health Care and Well-being: Selective Review Study
Background Health care and well-being are 2 main interconnected application areas of conversational agents (CAs). There is a significant increase in research, development, and commercial implementations in this area. In parallel to the increasing interest, new challenges in designing and evaluating CAs have emerged. Objective This study aims to identify key design, development, and evaluation challenges of CAs in health care and well-being research. The focus is on the very recent projects with their emerging challenges. Methods A review study was conducted with 17 invited studies, most of which were presented at the ACM (Association for Computing Machinery) CHI 2020 conference workshop on CAs for health and well-being. Eligibility criteria required the studies to involve a CA applied to a health or well-being project (ongoing or recently finished). The participating studies were asked to report on their projects’ design and evaluation challenges. We used thematic analysis to review the studies. Results The findings include a range of topics from primary care to caring for older adults to health coaching. We identified 4 major themes: (1) Domain Information and Integration, (2) User-System Interaction and Partnership, (3) Evaluation, and (4) Conversational Competence. Conclusions CAs proved their worth during the pandemic as health screening tools, and are expected to stay to further support various health care domains, especially personal health care. Growth in investment in CAs also shows the value as a personal assistant. Our study shows that while some challenges are shared with other CA application areas, safety and privacy remain the major challenges in the health care and well-being domains. An increased level of collaboration across different institutions and entities may be a promising direction to address some of the major challenges that otherwise would be too complex to be addressed by the projects with their limited scope and budget.  more » « less
Award ID(s):
2144880
PAR ID:
10404511
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; « less
Date Published:
Journal Name:
Journal of Medical Internet Research
Volume:
24
Issue:
11
ISSN:
1438-8871
Page Range / eLocation ID:
e38525
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ImportanceLarge language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. ObjectiveTo summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty. Data SourcesA systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024. Study SelectionStudies evaluating 1 or more LLMs in health care. Data Extraction and SynthesisThree independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty. ResultsOf 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented. Conclusions and RelevanceExisting evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties. 
    more » « less
  2. Older adults often rely on information provided during doctors’ visits or online to manage their health but can experience challenges accessing this information at home. Recently, conversational assistants are being explored to aid navigation of health information included in online portals, but we still know little about users’ perceptions of using these tools for managing personal health information. In this paper, we conducted a wizard-of-oz study to better understand older adults’ perceptions of a conversational assistant, MIHA, to help with navigating personal health information. Participants saw value in using a tool such as MIHA to help facilitate access to their personal health information and to help them become more engaged in their health. Participants believed MIHA’s features helped build confidence in the responses returned, but made suggestions for improving the interactions. We share insights of potential uses and design implications for conversational assistants that help older adults navigate personal health information. 
    more » « less
  3. Background Home health aides (HHAs) provide necessary hands-on care to older adults and those with chronic conditions in their homes. Despite their integral role, HHAs experience numerous challenges in their work, including their ability to communicate with other health care professionals about patient care while caring for patients and access to educational resources. Although technological interventions have the potential to address these challenges, little is known about the technological landscape and existing technology-based interventions designed for and used by this workforce. Objective We conducted a scoping review of the scientific literature to identify existing studies that have described, designed, deployed, or tested technology-based tools and apps intended for use by HHAs to care for patients at home. To complement our literature review, we conducted a landscape analysis of existing mobile apps intended for HHAs providing in-home care. Methods We searched the following databases from their inception to October 2020: Ovid MEDLINE, Ovid Embase, Cochrane Library, and CINAHL (EBSCO). A total of 3 researchers screened the yield using prespecified inclusion and exclusion criteria. In addition, 4 researchers independently reviewed these articles, and a fifth researcher arbitrated when needed. Among studies that met the inclusion criteria, data were extracted and summarized narratively. An analysis of mobile health apps designed for HHAs was performed using a predefined set of terms to search Google Play and Apple App stores. Overall, 2 researchers independently screened the resulting apps, and those that met the inclusion criteria were categorized according to their intended purpose and functionality. Results Of the 8643 studies retrieved, 182 (2.11%) underwent full-text review, and 4.9% (9/182) met our inclusion criteria. Approximately half (4/9, 44%) of the studies were descriptive in nature, proposing technology-based systems (eg, web portals and dashboards) or prototypes without a technical or user-based evaluation of the technology. In most (7/9, 78%) papers, HHAs were just one of several users and not the sole or primary intended users of the technology. Our review of mobile apps yielded 166 Android and iOS apps, of which 48 (29%) met the inclusion criteria. These apps provided HHAs with one or more of the following functions: electronic visit verification (29/48, 60%), clocking in and out (23/48, 48%), documentation (22/48, 46%), task checklist (19/48, 40%), communication between HHA and agency (14/48, 29%), patient information (6/48, 13%), resources (5/48, 10%), and communication between HHA and patients (4/48, 8%). Of the 48 apps, 25 (52%) performed monitoring functions, 4 (8%) performed supporting functions, and 19 (40%) performed both. Conclusions A limited number of studies and mobile apps have been designed to support HHAs in their work. Further research and rigorous evaluation of technology-based tools are needed to assess their impact on the work HHAs provide in patient’s homes. 
    more » « less
  4. A Mavragani (Ed.)
    BackgroundPosttraumatic stress disorder (PTSD) is a serious public health concern. However, individuals with PTSD often do not have access to adequate treatment. A conversational agent (CA) can help to bridge the treatment gap by providing interactive and timely interventions at scale. Toward this goal, we have developed PTSDialogue—a CA to support the self-management of individuals living with PTSD. PTSDialogue is designed to be highly interactive (eg, brief questions, ability to specify preferences, and quick turn-taking) and supports social presence to promote user engagement and sustain adherence. It includes a range of support features, including psychoeducation, assessment tools, and several symptom management tools. ObjectiveThis paper focuses on the preliminary evaluation of PTSDialogue from clinical experts. Given that PTSDialogue focuses on a vulnerable population, it is critical to establish its usability and acceptance with clinical experts before deployment. Expert feedback is also important to ensure user safety and effective risk management in CAs aiming to support individuals living with PTSD. MethodsWe conducted remote, one-on-one, semistructured interviews with clinical experts (N=10) to gather insight into the use of CAs. All participants have completed their doctoral degrees and have prior experience in PTSD care. The web-based PTSDialogue prototype was then shared with the participant so that they could interact with different functionalities and features. We encouraged them to “think aloud” as they interacted with the prototype. Participants also shared their screens throughout the interaction session. A semistructured interview script was also used to gather insights and feedback from the participants. The sample size is consistent with that of prior works. We analyzed interview data using a qualitative interpretivist approach resulting in a bottom-up thematic analysis. ResultsOur data establish the feasibility and acceptance of PTSDialogue, a supportive tool for individuals with PTSD. Most participants agreed that PTSDialogue could be useful for supporting self-management of individuals with PTSD. We have also assessed how features, functionalities, and interactions in PTSDialogue can support different self-management needs and strategies for this population. These data were then used to identify design requirements and guidelines for a CA aiming to support individuals with PTSD. Experts specifically noted the importance of empathetic and tailored CA interactions for effective PTSD self-management. They also suggested steps to ensure safe and engaging interactions with PTSDialogue. ConclusionsBased on interviews with experts, we have provided design recommendations for future CAs aiming to support vulnerable populations. The study suggests that well-designed CAs have the potential to reshape effective intervention delivery and help address the treatment gap in mental health. 
    more » « less
  5. The field of intelligent virtual agents (IVAs) has evolved immensely over the past 15 years, introducing new application opportunities in areas such as training, health care, and virtual assistants. In this survey paper, we provide a systematic review of the most influential user studies published in the IVA conference from 2001 to 2015 focusing on IVA development, human perception, and interactions. A total of 247 papers with 276 user studies have been classified and reviewed based on their contributions and impact. We identify the different areas of research and provide a summary of the papers with the highest impact. With the trends of past user studies and the current state of technology, we provide insights into future trends and research challenges. 
    more » « less