Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Kumar, Sachin; Balachandran, Vidhisha; Njoo, Lucille; Anastasopoulos, Antonios; Tsvetkov, Yulia

Citation Details

Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings. In parallel, these improvements have prompted a heated discourse around the risks of societal harms they introduce, whether inadvertent or malicious. Several studies have explored these harms and called for their mitigation via development of safer, fairer models. Going beyond enumerating the risks of harms, this work provides a survey of practical methods for addressing potential threats and societal harms from language generation models. We draw on several prior works’ taxonomies of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators. Bridging diverse strands of research, this survey aims to serve as a practical guide for both LM researchers and practitioners, with explanations of different strategies’ motivations, their limitations, and open problems for future research. more »

Award ID(s):: 2142739 2040926 2203097 2125201

PAR ID:: 10433138

Author(s) / Creator(s):: Kumar, Sachin; Balachandran, Vidhisha; Njoo, Lucille; Anastasopoulos, Antonios; Tsvetkov, Yulia

Date Published:: 2023-05-01

Journal Name:: European Chapter of the Association for Computational Linguistics

Page Range / eLocation ID:: 3299–3321

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this