NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

User Interaction Patterns and Breakdowns in Conversing with LLM-Powered Voice Assistants

https://doi.org/10.1016/j.ijhcs.2024.103406

Mahmood, Amama; Wang, Junxiang; Yao, Bingsheng; Wang, Dakuo; Huang, Chien-Ming (January 2025, International Journal of Human-Computer Studies)

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (October 2024, CHI conference proceedingsCHI Conference)

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (October 2024, CHI conference proceedingsCHI Conference)

Full Text Available
SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

https://doi.org/10.1145/3637528.3671586

Yin, Changchang; Chen, Pin-Yu; Yao, Bingsheng; Wang, Dakuo; Caterino, Jeffrey; Zhang, Ping (August 2024, ACM)

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (May 2024, The First Workshop on Human-Centered Evaluation and Auditing of Language Models (CHI Workshop HEAL))

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (May 2024, The First Workshop on Human-Centered Evaluation and Auditing of Language Models (CHI Workshop HEAL))

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (May 2024, The First Workshop on Human-Centered Evaluation and Auditing of Language Models (CHI Workshop HEAL))

Full Text Available
Evaluating the LLM Agents for Simulating Humanoid Behavior

Chen, Chaoran; Yao, Bingsheng; Ye, Yanfang; Wang, Dakuo; Li, Toby Jia-Jun (May 2024, CHI conference proceedingsCHI Conference)

Full Text Available
Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis

https://doi.org/10.1145/3613904.3642343

Zhang, Shao; Yu, Jianing; Xu, Xuhai; Yin, Changchang; Lu, Yuxuan; Yao, Bingsheng; Tory, Melanie; Padilla, Lace M; Caterino, Jeffrey; Zhang, Ping; et al (May 2024, ACM)

Full Text Available
Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

https://doi.org/10.1145/3643540

Xu, Xuhai; Yao, Bingsheng; Dong, Yuanzhe; Gabriel, Saadia; Yu, Hong; Hendler, James; Ghassemi, Marzyeh; Dey, Anind K; Wang, Dakuo (March 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.
more » « less
Full Text Available

« Prev Next »

Search for: All records