Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available May 11, 2025
-
Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research.more » « lessFree, publicly-accessible full text available March 6, 2025
-
During the COVID-19 pandemic, risk negotiation became an important precursor to in-person contact. For young adults, social planning generally occurs through computer-mediated communication. Given the importance of social connectedness for mental health and academic engagement, we sought to understand how young adults plan in-person meetups over computer-mediated communication in the context of the pandemic. We present a qualitative study that explores young adults’ risk negotiation during the COVID-19 pandemic, a period of conflicting public health guidance. Inspired by cultural probe studies, we invited participants to express their preferred precautions for one week as they planned in-person meetups. We interviewed and surveyed participants about their experiences. Through qualitative analysis, we identify strategies for risk negotiation, social complexities that impede risk negotiation, and emotional consequences of risk negotiation. Our findings have implications for AI-mediated support for risk negotiation and assertive communication more generally. We explore tensions between risks and potential benefits of such systems.more » « less
-
There is a growing body of research revealing that longitudinal passive sensing data from smartphones and wearable devices can capture daily behavior signals for human behavior modeling, such as depression detection. Most prior studies build and evaluate machine learning models using data collected from a single population. However, to ensure that a behavior model can work for a larger group of users, its generalizability needs to be verified on multiple datasets from different populations. We present the first work evaluating cross-dataset generalizability of longitudinal behavior models, using depression detection as an application. We collect multiple longitudinal passive mobile sensing datasets with over 500 users from two institutes over a two-year span, leading to four institute-year datasets. Using the datasets, we closely re-implement and evaluated nine prior depression detection algorithms. Our experiment reveals the lack of model generalizability of these methods. We also implement eight recently popular domain generalization algorithms from the machine learning community. Our results indicate that these methods also do not generalize well on our datasets, with barely any advantage over the naive baseline of guessing the majority. We then present two new algorithms with better generalizability. Our new algorithm, Reorder, significantly and consistently outperforms existing methods on most cross-dataset generalization setups. However, the overall advantage is incremental and still has great room for improvement. Our analysis reveals that the individual differences (both within and between populations) may play the most important role in the cross-dataset generalization challenge. Finally, we provide an open-source benchmark platform GLOBEM- short for Generalization of Longitudinal BEhavior Modeling - to consolidate all 19 algorithms. GLOBEM can support researchers in using, developing, and evaluating different longitudinal behavior modeling methods. We call for researchers' attention to model generalizability evaluation for future longitudinal human behavior modeling studies.more » « less
-
Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users’ data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms’ generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms. The GLOBEM website can be found at the-globem.github.io Our datasets are available at physionet.org/content/globem Our codebase is open-sourced at github.com/UW-EXP/GLOBEMmore » « less
-
Feeling a sense of belonging is a central human motivation that has consequences for mental health and well-being, yet surprisingly little research has examined how belonging shapes mental health among young adults. In three data sets from two universities (exploratory study: N = 157; Confirmatory Study 1: N = 121; Confirmatory Study 2: n = 188 in winter term, n = 172 in spring term), we found that lower levels of daily-assessed feelings of belonging early and across the academic term predicted higher depressive symptoms at the end of the term. Furthermore, these relationships held when models controlled for baseline depressive symptoms, sense of social fit, and other social factors (loneliness and frequency of social interactions). These results highlight the relationship between feelings of belonging and depressive symptoms over and above other social factors. This work underscores the importance of daily-assessed feelings of belonging in predicting subsequent depressive symptoms and has implications for early detection and mental health interventions among young adults.more » « less
-
This paper presents a computational framework for modeling biobehavioral rhythms - the repeating cycles of physiological, psychological, social, and environmental events - from mobile and wearable data streams. The framework incorporates four main components: mobile data processing, rhythm discovery, rhythm modeling, and machine learning. We evaluate the framework with two case studies using datasets of smartphone, Fitbit, and OURA smart ring to evaluate the framework’s ability to (1) detect cyclic biobehavior, (2) model commonality and differences in rhythms of human participants in the sample datasets, and (3) predict their health and readiness status using models of biobehavioral rhythms. Our evaluation demonstrates the framework’s ability to generate new knowledge and findings through rigorous micro- and macro-level modeling of human rhythms from mobile and wearable data streams collected in the wild and using them to assess and predict different life and health outcomes.more » « less
-
Smartphone overuse is related to a variety of issues such as lack of sleep and anxiety. We explore the application of Self-Affirmation Theory on smartphone overuse intervention in a just-in-time manner. We present TypeOut, a just-in-time intervention technique that integrates two components: an in-situ typing-based unlock process to improve user engagement, and self-affirmation-based typing con- tent to enhance effectiveness. We hypothesize that the integration of typing and self-affirmation content can better reduce smartphone overuse. We conducted a 10-week within-subject field experiment (N=54) and compared TypeOut against two baselines: one only showing the self-affirmation content (a common notification-based intervention), and one only requiring typing non-semantic content (a state-of-the-art method). TypeOut reduces app usage by over 50%, and both app opening frequency and usage duration by over 25%, all significantly outperforming baselines. TypeOut can potentially be used in other domains where an intervention may benefit from integrating self-affirmation exercises with an engaging just-in-time mechanism.more » « less
-
Methods are fundamental to doing research and can directly impact who is included in scientific advances. Given accessibility research's increasing popularity and pervasive barriers to conducting and participating in research experienced by people with disabilities, it is critical to ask how methods are made accessible. Yet papers rarely describe their methods in detail. This paper reports on 17 interviews with accessibility experts about how they include both facilitators and participants with disabilities in popular user research methods. Our findings offer strategies for anticipating access needs while remaining flexible and responsive to unexpected access barriers. We emphasize the importance of considering accessibility at all stages of the research process, and contextualize access work in recent disability and accessibility literature. We explore how technology or processes could reflect a norm of accessibility. Finally, we discuss how various needs intersect and conflict and offer a practical structure for planning accessible research.more » « less