Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots
- PAR ID:
- 10399974
- Date Published:
- Journal Name:
- Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security
- Page Range / eLocation ID:
- 2659 to 2673
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract TheMicrocystismobilome is a well-known but understudied component of this bloom-forming cyanobacterium. Through genomic and transcriptomic comparisons, we found five families of transposases that altered the expression of genes in the well-studied toxigenic type-strain,Microcystis aeruginosaPCC 7086, and a non-toxigenic genetic mutant,Microcystis aeruginosaPCC 7806 ΔmcyB. Since its creation in 1997, the ΔmcyBstrain has been used in comparative physiology studies against the wildtype strain by research labs throughout the world. Some differences in gene expression between what were thought to be otherwise genetically identical strains have appeared due to insertion events in both intra- and intergenic regions. In our ΔmcyBisolate, a sulfate transporter gene cluster (sbp-cysTWA) showed differential expression from the wildtype, which may have been caused by the insertion of a miniature inverted repeat transposable element (MITE) in the sulfate-binding protein gene (sbp). Differences in growth in sulfate-limited media also were also observed between the two isolates. This paper highlights howMicrocystisstrains continue to “evolve” in lab conditions and illustrates the importance of insertion sequences / transposable elements in shaping genomic and physiological differences betweenMicrocystisstrains thought otherwise identical. This study forces the necessity of knowing the complete genetic background of isolates in comparative physiological experiments, to facilitate the correct conclusions (and caveats) from experiments.more » « less
-
Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English). Our comprehensive experiments establish that existing methods are limited in their ability to prevent biased behavior in current toxicity detectors. We then propose an automatic, dialect-aware data correction method, as a proof-of-concept. Despite the use of synthetic labels, this method reduces dialectal associations with toxicity. Overall, our findings show that debiasing a model trained on biased toxic language data is not as effective as simply relabeling the data to remove existing biases.more » « less
An official website of the United States government

