skip to main content


Title: A Socratic epistemology for verbal emotional intelligence

We describe and experimentally validate a question-asking framework for machine-learned linguistic knowledge about human emotions. Using the Socratic method as a theoretical inspiration, we develop an experimental method and computational model for computers to learn subjective information about emotions by playing emotion twenty questions (EMO20Q), a game of twenty questions limited to words denoting emotions. Using human–human EMO20Q data we bootstrap a sequential Bayesian model that drives a generalized pushdown automaton-based dialog agent that further learns from 300 human–computer dialogs collected on Amazon Mechanical Turk. The human–human EMO20Q dialogs show the capability of humans to use a large, rich, subjective vocabulary of emotion words. Training on successive batches of human–computer EMO20Q dialogs shows that the automated agent is able to learn from subsequent human–computer interactions. Our results show that the training procedure enables the agent to learn a large set of emotion words. The fully trained agent successfully completes EMO20Q at 67% of human performance and 30% better than the bootstrapped agent. Even when the agent fails to guess the human opponent’s emotion word in the EMO20Q game, the agent’s behavior of searching for knowledge makes it appear human-like, which enables the agent to maintain user engagement and learn new, out-of-vocabulary words. These results lead us to conclude that the question-asking methodology and its implementation as a sequential Bayes pushdown automaton are a successful model for the cognitive abilities involved in learning, retrieving, and using emotion words by an automated agent in a dialog setting.

 
more » « less
NSF-PAR ID:
10013136
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
PeerJ
Date Published:
Journal Name:
PeerJ Computer Science
Volume:
2
ISSN:
2376-5992
Page Range / eLocation ID:
Article No. e40
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Active learning identifies data points from a pool of unlabeled examples whose labels, if made available, are most likely to improve the predictions of a supervised model. Most research on active learning assumes that an agent has access to the entire pool of unlabeled data and can ask for labels of any data points during an initial training phase. However, when incorporated in a larger task, an agent may only be able to query some subset of the unlabeled pool. An agent can also opportunistically query for labels that may be useful in the future, even if they are not immediately relevant. In this paper, we demonstrate that this type of opportunistic active learning can improve performance in grounding natural language descriptions of everyday objects---an important skill for home and office robots. We find, with a real robot in an object identification setting, that inquisitive behavior---asking users important questions about the meanings of words that may be off-topic for the current dialog---leads to identifying the correct object more often over time. 
    more » « less
  2. A Natural Language Interface (NLI) enables the use of human languages to interact with computer systems, including smart phones and robots. Compared to other types of interfaces, such as command line interfaces (CLIs) or graphical user interfaces (GUIs), NLIs stand to enable more people to have access to functionality behind databases or APIs as they only require knowledge of natural languages. Many NLI applications involve structured data for the domain (e.g., applications such as hotel booking, product search, and factual question answering.) Thus, to fully process user questions, in addition to natural language comprehension, understanding of structured data is also crucial for the model. In this paper, we study neural network methods for building Natural Language Interfaces (NLIs) with a focus on learning structure data representations that can generalize to novel data sources and schemata not seen at training time. Specifically, we review two tasks related to natural language interfaces: i) semantic parsing where we focus on text-to-SQL for database access, and ii) task-oriented dialog systems for API access. We survey representative methods for text-to-SQL and task-oriented dialog tasks, focusing on representing and incorporating structured data. Lastly, we present two of our original studies on structured data representation methods for NLIs to enable access to i) databases, and ii) visualization APIs. 
    more » « less
  3. In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy. The agent can be used for showing navigation routes, delivering objects to people, and relocating objects from one location to another. We use dialog clari_cation questions both to understand commands and to generate additional parsing training data. The agent employs opportunistic active learning to select questions about how words relate to objects, improving its understanding of perceptual concepts. We evaluated this agent on Amazon Mechanical Turk. After training on data induced from conversations, the agent reduced the number of dialog questions it asked while receiving higher usability ratings. Additionally, we demonstrated the agent on a robotic platform, where it learned new perceptual concepts on the y while completing a real-world task. 
    more » « less
  4. Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart. 
    more » « less
  5. We introduce a novel framework for emotional state detection from facial expression targeted to learning environments. Our framework is based on a convolutional deep neural network that classifies people’s emotions that are captured through a web-cam. For our classification outcome we adopt Russel’s model of core affect in which any particular emotion can be placed in one of four quadrants: pleasant-active, pleasant-inactive, unpleasant-active, and unpleasant-inactive. We gathered data from various datasets that were normalized and used to train the deep learning model. We use the fully-connected layers of the VGG_S network which was trained on human facial expressions that were manually labeled. We have tested our application by splitting the data into 80:20 and re-training the model. The overall test accuracy of all detected emotions was 66%. We have a working application that is capable of reporting the user emotional state at about five frames per second on a standard laptop computer with a web-cam. The emotional state detector will be integrated into an affective pedagogical agent system where it will serve as a feedback to an intelligent animated educational tutor. 
    more » « less