UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations

Das, Souvik; Srihari, Rohini

doi:10.18653/v1/2024.findings-acl.102

Citation Details

UNIWIZ: A Unified Large Language Model Orchestrated Wizard for Safe Knowledge Grounded Conversations

Large Language Models (LLMs) have made significant progress in integrating safety and knowledge alignment. However, adversarial actors can manipulate these models into generating unsafe responses, and excessive safety alignment can lead to unintended hallucinations. To address these challenges, we introduce UniWiz, a novel 2-step data orchestration framework that unifies safety and knowledge data generation. We propose a “safety-priming” method to generate synthetic safety data and overcome safety bottlenecks. We also inject relevant knowledge into conversations by retrieving factual information from curated sources. UniWiz dataset consists of 17,638 quality-controlled conversations and 10,000 augmented preference data. Pretrained models fine-tuned on UniWiz show improvements across various metrics and outperform state-of-the-art instruction-tuned models trained on much larger datasets. more »

Award ID(s):: 2214070

PAR ID:: 10543965

Author(s) / Creator(s):: Das, Souvik; Srihari, Rohini

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2024-08-11

Page Range / eLocation ID:: 1749 to 1762

Format(s):: Medium: X

Location:: Bangkok, Thailand and virtual meeting

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2024.findings-acl.102

More Like this