skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models
The emotions we experience involve complex processes; besides physiological aspects, research in psychology has studied cognitive appraisals where people assess their situations subjectively, according to their own values (Scherer, 2005). Thus, the same situation can often result in different emotional experiences. While the detection of emotion is a well-established task, there is very limited work so far on the automatic prediction of cognitive appraisals. This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions, each with a natural language rationale, across 241 Reddit posts. CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models — excelling at a wide range of NLP tasks — to automatically assess and explain cognitive appraisals. We found that while the best models are performant, open-sourced LLMs fall short at this task, presenting a new challenge in the future development of emotionally intelligent models. We release our dataset at https://github.com/honglizhan/CovidET-Appraisals-Public.  more » « less
Award ID(s):
2107524
PAR ID:
10521349
Author(s) / Creator(s):
; ;
Publisher / Repository:
Findings of the Association for Computational Linguistics: EMNLP 2023
Date Published:
Page Range / eLocation ID:
14418 to 14446
Format(s):
Medium: X
Location:
Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. Large language models (LLMs) have offered new opportunities for emotional support, and recent work has shown that they can produce empathic responses to people in distress. However, long-term mental well-being requires emotional self-regulation, where a one-time empathic response falls short. This work takes a first step by engaging with cognitive reappraisals, a strategy from psychology practitioners that uses language to targetedly change negative appraisals that an individual makes of the situation; such appraisals is known to sit at the root of human emotional experience. We hypothesize that psychologically grounded principles could enable such advanced psychology capabilities in LLMs, and design RESORT which consists of a series of reappraisal constitutions across multiple dimensions that can be used as LLM instructions. We conduct a first-of-its-kind expert evaluation (by clinical psychologists with M.S. or Ph.D. degrees) of an LLM's zero-shot ability to generate cognitive reappraisal responses to medium-length social media messages asking for support. This fine-grained evaluation showed that even LLMs at the 7B scale guided by RESORT are capable of generating empathic responses that can help users reappraise their situations. 
    more » « less
  2. This thesis investigates the computational modeling of belief and related cognitive states as expressed in text and speech. Understanding how speakers or authors convey commitment, certainty, and emotions is crucial for language understanding, yet poses significant challenges for current NLP systems. We present a comprehensive study spanning multiple facets of belief prediction. We begin by re-examining the widely used FactBank corpus, correcting a critical projection error and establishing new state-of-the-art results for author-only belief prediction through multi-task learning and error analysis. We then tackle the more complex task of source-and-target belief prediction, introducing a novel generative framework using Flan-T5. This includes developing a structured database representation for FactBank and proposing a linearized tree generation approach, culminating in the BeLeaf system for visualization and analysis, which achieves state-of-the-art performance on both FactBank and the MDP corpus. With the rise of large language models (LLMs), we investigate their zero-shot capabilities for the source-and-target belief task. We propose Unified and Hybrid prompting frameworks, finding that while current LLMs struggle, particularly with nested beliefs, our Hybrid approach paired with reasoning-focused LLMs achieves new state-of-the-art results on FactBank. Finally, we explore the role of multimodality among multiple cognitive states. We present the first study on multimodal belief prediction using the CB-Prosody corpus, demonstrating that integrating audio features via fine-tuned Whisper models significantly improves performance over text-only BERT models. We further introduce Synthetic Audio Data (SAD), showing that even synthetic audio generated by TTS systems provides orthogonal, beneficial signals for various cognitive state tasks (belief, emotion, sentiment). We conclude by presenting OmniVox, the first systematic evaluation of omni-LLMs for zero-shot emotion recognition directly from audio, demonstrating their competitiveness with fine-tuned models and analyzing their acoustic reasoning capabilities. 
    more » « less
  3. In this work, we present CogBeacon, a multi-modal dataset designed to target the effects of cognitive fatigue in human performance. The dataset consists of 76 sessions collected from 19 male and female users performing different versions of a cognitive task inspired by the principles of the Wisconsin Card Sorting Test (WCST), a popular cognitive test in experimental and clinical psychology designed to assess cognitive flexibility, reasoning, and specific aspects of cognitive functioning. During each session, we record and fully annotate user EEG functionality, facial keypoints, real-time self-reports on cognitive fatigue, as well as detailed information of the performance metrics achieved during the cognitive task (success rate, response time, number of errors, etc.). Along with the dataset we provide free access to the CogBeacon data-collection software to provide a standardized mechanism to the community for collecting and annotating physiological and behavioral data for cognitive fatigue analysis. Our goal is to provide other researchers with the tools to expand or modify the functionalities of the CogBeacon data-collection framework in a hardware-independent way. As a proof of concept we show some preliminary machine learning-based experiments on cognitive fatigue detection using the EEG information and the subjective user reports as ground truth. Our experiments highlight the meaningfulness of the current dataset, and encourage our efforts towards expanding the CogBeacon platform. To our knowledge, this is the first multi-modal dataset specifically designed to assess cognitive fatigue and the only free software available to allow experiment reproducibility for multi-modal cognitive fatigue analysis. 
    more » « less
  4. Assessment in the context of foreign language learning can be difficult and time-consuming for instructors. Distinctive from other domains, language learning often requires teachers to assess each student’s ability to speak the language, making this process even more time-consuming in large classrooms which are particularly common in post-secondary settings; considering that language instructors often assess students through assignments requiring recorded audio, a lack of tools to support such teachers makes providing individual feedback even more challenging. In this work, we seek to explore the development of tools to automatically assess audio responses within a college-level Chinese language-learning course. We build a model designed to grade student audio assignments with the purpose of incorporating such a model into tools focused on helping both teachers and students in real classrooms. Building upon our prior work which explored features extracted from audio, the goal of this work is to explore additional features derived from tone and speech recognition models to help assess students on two outcomes commonly observed in language learning classes: fluency and accuracy of speech. In addition to the exploration of features, this work explores the application of Siamese deep learning models for this assessment task. We find that models utilizing tonal features exhibit higher predictive performance of student fluency while text-based features derived from speech recognition models exhibit higher predictive performance of student accuracy of speech. 
    more » « less
  5. Video-language models (VLMs), large models pre-trained on numerous but noisy video-text pairs from the internet, have revolutionized activity recognition through their remarkable generalization and open-vocabulary capabilities. While complex human activities are often hierarchical and compositional, most existing tasks for evaluating VLMs focus only on high-level video understanding, making it difficult to accurately assess and interpret the ability of VLMs to understand complex and fine-grained human activities. Inspired by the recently proposed MOMA framework, we define activity graphs as a single universal representation of human activities that encompasses video understanding at the activity, sub10 activity, and atomic action level. We redefine activity parsing as the overarching task of activity graph generation, requiring understanding human activities across all three levels. To facilitate the evaluation of models on activity parsing, we introduce MOMA-LRG (Multi-Object Multi-Actor Language-Refined Graphs), a large dataset of complex human activities with activity graph annotations that can be readily transformed into natural language sentences. Lastly, we present a model-agnostic and lightweight approach to adapting and evaluating VLMs by incorporating structured knowledge from activity graphs into VLMs, addressing the individual limitations of language and graphical models. We demonstrate a strong performance on activity parsing and few-shot video classification, and our framework is intended to foster future research in the joint modeling of videos, graphs, and language. 
    more » « less