skip to main content


This content will become publicly available on March 31, 2025

Title: Improving Dialog Safety using Socially Aware Contrastive Learning
State-of-the-art conversational AI systems raise concerns due to their potential risks of generating unsafe, toxic, unethical, or dangerous content. Previous works have developed datasets to teach conversational agents the appropriate social paradigms to respond effectively to specifically designed hazardous content. However, models trained on these adversarial datasets still struggle to recognize subtle unsafe situations that appear naturally in conversations or introduce an inappropriate response in a casual context. To understand the extent of this problem, we study prosociality in both adversarial and casual dialog contexts and audit the response quality of general-purpose language models in terms of propensity to produce unsafe content. We propose a dual-step fine-tuning process to address these issues using a socially aware n-pair contrastive loss. Subsequently, we train a base model that integrates prosocial behavior by leveraging datasets like Moral Integrity Corpus (MIC) and ProsocialDialog. Experimental results on several dialog datasets demonstrate the effectiveness of our approach in generating socially appropriate responses.  more » « less
Award ID(s):
2214070
PAR ID:
10543974
Author(s) / Creator(s):
;
Publisher / Repository:
ACL Anthology
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently, some research instructs crowdworkers to goad the bots into triggering such problems. However, humans leverage superficial clues such as hate speech, while leaving systematic problems undercover. In this paper, we propose two methods including reinforcement learning to automatically trigger a dialog model into generating problematic responses. We show the effect of our methods in exposing safety and contradiction issues with state-of-the-art dialog models. 
    more » « less
  2. Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UniWiz, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a “safety-priming” method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UniWiz dataset consists of 17,638 quality-controlled conversations and 10,000 augmented preference data. Pretrained models fine-tuned on UniWiz show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets. 
    more » « less
  3. The Visual Dialog task requires a model to exploit both im- age and conversational context information to generate the next response to the dialogue. However, via manual analysis, we find that a large number of conversational questions can be answered by only looking at the image without any access to the context history, while others still need the conversa- tion context to predict the correct answers. We demonstrate that due to this reason, previous joint-modality (history and image) models over-rely on and are more prone to memoriz- ing the dialogue history (e.g., by extracting certain keywords or patterns in the context information), whereas image-only models are more generalizable (because they cannot memo- rize or extract keywords from history) and perform substan- tially better at the primary normalized discounted cumula- tive gain (NDCG) task metric which allows multiple correct answers. Hence, this observation encourages us to explic- itly maintain two models, i.e., an image-only model and an image-history joint model, and combine their complementary abilities for a more balanced multimodal model. We present multiple methods for this integration of the two models, via ensemble and consensus dropout fusion with shared param- eters. Empirically, our models achieve strong results on the Visual Dialog challenge 2019 (rank 3 on NDCG and high bal- ance across metrics), and substantially outperform the winner of the Visual Dialog challenge 2018 on most metrics. 
    more » « less
  4. Recent years have witnessed the emerging of conversational systems, including both physical devices and mobile-based applications, such as Amazon Echo, Google Now, Microsoft Cortana, Apple Siri, and many others. Both the research community and industry believe that conversational systems will have a major impact on human-computer interaction, and specifically, the IR community has begun to focus on Conversational Search. Conversational search based on user-system dialog exhibits major differences from conventional search in that 1) the user and system can interact for multiple semantically coherent rounds on a task through natural language dialog, and 2) it becomes possible for the system to understand user needs or to help users clarify their needs by asking appropriate questions from the users directly. In this paper, we propose and evaluate a unified conversational search framework. Specifically, we define the major components for conversational search, assemble them into a unified framework, and test an implementation of the framework using a conversational product search scenario in Amazon. To accomplish this, we propose the Multi-Memory Network (MMN) architecture, which is end-to-end trainable based on large-scale collections of user reviews in e-commerce. The system is capable of asking aspect-based questions in the right order so as to understand user needs, while (personalized) search is conducted during the conversation and results are provided when the system feels confident. Experiments on real-world user purchasing data verified the advantages of conversational search against conventional search algorithms in terms of standard evaluation measures such as NDCG. 
    more » « less
  5. Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 23 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and stateof-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work. 
    more » « less