<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>CAT s are Fuzzy PETs : A Corpus and Analysis of Potentially Euphemistic Terms</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10463473</idno>
					<idno type="doi"></idno>
					<title level='j'>arXiv preprint arXiv:2205.02728.</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>M. Gavidia</author><author>P. Lee</author><author>A. Feldman</author><author>J. Peng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Euphemisms have not received much attention in natural language processing, despite being an important element of polite and figurative language. Euphemisms prove to be a difficult topic, not only because they are subject to language change, but also because humans may not agree on what is a euphemism and what is not. Nevertheless, the first step to tackling the issue is to collect and analyze examples of euphemisms. We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. Additionally, we present a subcorpus of texts where these PETs are not being used euphemistically, which may be useful for future applications. We also discuss the results of multiple analyses run on the corpus. Firstly, we find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. Secondly, we observe cases ofdisagreement in an annotation task, where humans are asked to label PETs as euphemistic or not in a subset of our corpus text examples. We attribute the disagreement to a variety of potential reasons, including if the PET was a commonly accepted term (CAT).]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Euphemisms are mild or indirect expressions that are used in place of more unpleasant or offensive ones. They can be used in generally polite conversation (e.g., passed away instead of died), or even as a way to try and hide the truth <ref type="bibr">(Rababah, 2014)</ref> by minimizing a threat and downplaying situations in order to create a favorable image <ref type="bibr">(Karam, 2011)</ref>; for example, saying armed conflict instead of war. For this paper, we consider a variety of euphemistic expressions that provide alternatives to more direct meanings. These alternatives allow us to avoid potential awkwardness or offensiveness when discussing sensitive or taboo topics such as death, sexual activity, employment, bodily functions, politics, physical/mental attributes, etc. Because of their figurative nature, euphemisms can be ambiguous, either because of human subjectivity, or because euphemistic words and phrases may also be used literally (e.g., between jobs). As such, working with euphemisms in natural language processing is not straightforward. In this paper, we present an expanded list of common euphemisms which we will refer to as PETs, or Potentially Euphemistic Terms, as well as a corpus of euphemistic and literal usages of said PETs from web-based text data. To the best of our knowledge, there are no existing corpora of English sentences containing euphemisms. We hope the development of these new resources help build upon current NLP applications surrounding euphemisms, particularly in providing context differences to disambiguate these terms. The rest of the paper is as follows: Section 2 reviews previous work done on the language of politeness and how it relates to euphemisms as well as current computational approaches on euphemism recognition, detection and generation. Section 3 gives details on how we compiled our corpus including information on our list of PETs, source text, and sentence extraction and selection. In Section 4, we describe our corpus and provide examples of euphemistic and literal usages of PETs within our corpora. Section 5 discusses experimental results of sentiment analysis with a roBERTabase model <ref type="bibr">(Liu et al., 2019;</ref><ref type="bibr">Barbieri et al., 2020)</ref>; we theorize that there is a shift in sentiment and offensiveness of PETs vs. their literal meanings in the same context since the usage of euphemisms makes our speech less emotionally charged. We also describe an annotation task and offer possible explanations as to why euphemisms are ambiguous. Section 6 finally concludes with a discussion about future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Euphemisms and Politeness</head><p>Euphemisms are related to the language of politeness, e.g., <ref type="bibr">(Danescu-Niculescu-Mizil et al., 2013;</ref><ref type="bibr">Rababah, 2014)</ref>, which plays a role in applications involving dialogue and social interactions in different contexts, including political discourse or doctor-patient interactions. Politeness has been a central concern in pragmatic theory <ref type="bibr">(Grice, 1975;</ref><ref type="bibr">Leech, 1983;</ref><ref type="bibr">Lakoff, 1973;</ref><ref type="bibr">Lakoff, 1979;</ref><ref type="bibr">Brown et al., 1987)</ref> because we can learn about language, culture and society through the language of politeness. <ref type="bibr">Danescu-Niculescu-Mizil et al. (2013)</ref> propose a computational framework for identifying linguistic aspects of politeness and use their framework to study the relationship between politeness and social power. Politeness is learned within our communities and daily social interactions, so it is natural that when we communicate on the internet some of those features are carried over with us as is seen in certain aspects of online social communication, including forums and message boards. This assumption drives our investigation into the use of euphemisms in web based data as a politeness marker. <ref type="bibr">Madaan et al. (2020)</ref> introduce a task of politeness transfer which involves converting non-polite sentences to polite sentences while preserving the meaning. <ref type="bibr">Madaan et al. (2020)</ref> adopt the data-driven approach to politeness proposed by <ref type="bibr">Danescu-Niculescu-Mizil et al. (2013)</ref> Additionally, they create a corpus of 1.39 million instances automatically labeled for politeness. They propose a tag and generate pipeline that identifies stylistic attributes and generates a sentence in the target style while preserving most of the source content. While this work is concerned with style transfer rather than euphemisms, we find this work relevant, especially, for the euphemism generation task. Additionally, <ref type="bibr">Chaves and Gerosa (2021)</ref> discuss the growing popularity of chatbots and conduct a survey on eleven social characteristics a chatbot can have that benefit humanchatbot interactions; manners is defined as one of them. They refer to manners as the ability of a chatbot to manifest polite behavior and conversational habits <ref type="bibr">(Chaves and Gerosa, 2021;</ref><ref type="bibr">Morrissey and Kirakowski, 2013)</ref>. The adoption of speech acts such as greetings, apologies, and closings <ref type="bibr">(Jain et al., 2018)</ref> and minimizing impositions <ref type="bibr">(Tallyn et al., 2018;</ref><ref type="bibr">Toxtli et al., 2018)</ref> are a few of the ways in which chatbots currently manifest politeness <ref type="bibr">(Chaves and Gerosa, 2021)</ref>. We find this relevant to the future studies in euphemism generation as a way to manifest politeness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Previous Computational Work on Euphemisms</head><p>There is not much computational work on recognizing and interpreting euphemisms. The most directly related work is by <ref type="bibr">Magu and Luo (2018)</ref>, <ref type="bibr">Felt and Riloff (2020)</ref>, Kapron-King and X u (2021), <ref type="bibr">Zhu et al. (2021)</ref> and <ref type="bibr">Zhu and Bhat (2021)</ref>. <ref type="bibr">Felt and Riloff (2020)</ref> present the first effort to recognize x-phemisms, euphemisms and dysphemisms (derogatory terms), using NLP. They identify near-synonym phrases for three topics (FIRING, LYING, and STEALING) using a weakly supervised bootstrapping algorithm for semantic lexicon induction <ref type="bibr">(Thelen and Riloff, 2002)</ref>. Next, they classify phrases as euphemistic, dysphemistic, or neutral using lexical sentiment cues and contextual sentiment analysis. Additionally, they contribute a gold-standard dataset of human x-phemism judgements. <ref type="bibr">Thelen and Riloff (2002)</ref> show that sentiment connotation and affective polarity are useful for identifying x-phemisms, but not sufficient and while the performance of Felt and Riloff (2020)'s system is relatively low and the range of topics is very narrow, this work is certainly inspiring further investigations. <ref type="bibr">Zhu et al. (2021)</ref> define two tasks: 1) euphemism detection (based on the input keywords, produce a list of candidate euphemisms) 2) euphemism identification (take the list of candidate euphemisms produced in (1) and output an interpretation). They approach the task as an unsupervised fill in the mask problem and use a masked language model twice: 1) to filter the masked sentences and 2) generate the euphemism candidates from the masked sentences. For euphemism identification (=interpretation), <ref type="bibr">Zhu et al. (2021)</ref> extract phrases from a base corpus, then use word embeddings' similarities to filter out ones that are associated with a seed list of euphemisms, then finally use a masked language model SpanBERT to rank the euphemistic candi-dates. Their system outperforms all the baselines including <ref type="bibr">Felt and Riloff (2020)</ref>. The technical innovation of this work relies on the idea of self-supervision <ref type="bibr">(Zhu et al., 2021)</ref>, a form of unsupervised learning where the data itself provides the supervision. While the approach appears promising, it has a number of limitations. Like Felt and Riloff (2020)'s system, <ref type="bibr">Zhu et al. (2021)</ref> rely on a set of predefined terms (topics such as drugs, weapons, and sexuality). The system is not capable to discover new contexts in which euphemisms are used. In addition, <ref type="bibr">Zhu et al. (2021)</ref> treat euphemisms as mere substitutions. In this respect their work is similar to <ref type="bibr">Magu and Luo (2018)</ref>, who also treat code words as euphemisms. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Corpus Creation</head><p>Using a list of PETs, we extract sentences from a base corpus <ref type="bibr">(Davies and Fuchs, 2015)</ref> and manually annotate each as either euphemistic or non-euphemistic (literal). We then select 1,382 euphemistic sentences and 583 additional sentences in which select PETs were also found to be used literally. For the PETs dataset and corpus see: h t t p s : / / g i t h u b . c o m / m a r s g a v / e u p h e m i s m _ p r o j e c t .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Potentially Euphemistic Terms (PETs)</head><p>We compile a collection of 184 PETs from several sources including euphemism dictionaries, English websites designed for second language learners, online articles highlighting some of the top most common euphemisms as well as our own linguistic knowledge of euphemisms <ref type="bibr">(Kapron-King and Xu, 2021;</ref><ref type="bibr">Rawson, 1981;</ref><ref type="bibr">Holder, 2008;</ref><ref type="bibr">En-glishClub, 2022;</ref><ref type="bibr">Jones, 2017;</ref><ref type="bibr">Silver, 2015;</ref><ref type="bibr">Woelfel, 2019;</ref><ref type="bibr">Gormandy White, 2022;</ref><ref type="bibr">OED, 1989;</ref><ref type="bibr">Hereema, 2020;</ref><ref type="bibr">O'Conner and Kellerman, 2012;</ref><ref type="bibr">Martin, 1991)</ref>. We chose these different sources to make sure we cover a variety of taboo topics, but also to keep up with the common euphemisms as of January 2022. Euphemisms are constantly being created and removed; the "euphemism treadmill" describes how euphemisms can sometimes become offensive over time and thus lose their euphemism status <ref type="bibr">(Pinker, 2003)</ref>. While a definitive list of euphemisms can never be created, we aim to cover a variety of euphemisms relating to death, sexual activity, employment, bodily functions, politics, physical/mental attributes, substances, and other miscellaneous taboo topics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">The GloWbE Corpus</head><p>The Corpus of Global Web-Based English (GloWbE) corpus <ref type="bibr">(Davies and Fuchs, 2015)</ref> contains 1.9 billion words of text from twenty different English speaking countries. Its inclusion of 20 different dialects of English makes it an optimal source for examining euphemisms since euphemisms are cultural and geographical. For this reason, our extracted sentences are derived from only a portion of the US dialect of English text contained within GloWbE.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Sentence Extraction and Selection</head><p>We use spaCy's PhraseMatcher <ref type="bibr">(Honnibal and Montani, 2017)</ref> to identify rows from our raw text data which contain terms from our pre-defined list. that can be gathered from the context differences surrounding the euphemistic and literal usages of PETs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Corpus Details</head><p>We combine our euphemistic and literal examples into one corpus of 1,965 total sentences. We label the 71 PETs that were always used euphemistically as always euphs and the 58 found with both usages as sometimes euphs. The euphemistic and literal usages of sometimes euphs can be easily filtered within the corpus. The average word count per example is 65 words, and the average character count is 373. These details are summarized in Table <ref type="table">2</ref>. Additionally, we examine the taboo or sensitive topics that our euphemistic PETs cover; these are displayed in Table <ref type="table">3</ref> along with example PETs. PhraseMatcher yielded an output of over 5,500 rows of text containing some of our target PETs. Every row had a different amount of text so we preprocessed our text to include the sentence containing the target PET as well as 1-3 surrounding sentence for added context. We then manually annotated every row as either '1' -euphemistic or '0' -noneuphemistic. Given the ambiguous nature of euphemisms, we had a disproportionate amount of non-euphemistic texts vs. euphemistic texts. In an attempt to create a balanced corpus, we selected a maximum of 30 sentences for every PET found with PhraseMatcher that was used in a euphemistic sense. Results of this yielded a total of 1,382 euphemistic sentences spanning 129 different PETs.</p><p>We follow the same methodology to select a maximum of 30 sentences for the PETs that were also found to be used in a non-euphemistic sense. This sub-corpus contains 583 non-euphemistic (literal) sentences spanning only 58 out of 129 total PETs. In other words, our corpus contains 71 PETs that were always found in a euphemistic sense and 58 PETs that were found with both a euphemistic and literal sense. See Appendix A and B for both lists of PETs. We include this sub-corpus as it may contain valuable insights</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">ro B E RTa for Sentiment and Offensive Ratings</head><p>Since euphemisms are used with the aim to be polite, like <ref type="bibr">Felt and Riloff (2020)</ref>, we hypothesize that the sentiment of a sentence containing a euphemism should generally be more positive and less offensive <ref type="bibr">(Bakhriddionova, 2021)</ref>.</p><p>To investigate this, we performed a sentiment analysis on our corpus, in which sentiment and offensiveness scores were computed for each text sample, and then re-computed after substituting each phrase with its literal meaning. An example substitution is shown below:</p><p>Just from my personal observations, among low-income kids, those with a strong home life tend to do better. # Just from my personal observations, among poor kids, those with a strong home life tend to do better.</p><p>The sentiment scores were computed using a roBERTabased model, which was trained on Tweets (which is suitable for our examples' informal text), fine-tuned for sentiment analysis and offensive language identification, and evaluated using the TweetEval framework <ref type="bibr">(Liu et</ref>  Then, fill a glass or pop a top or load a bong or whatever one does, to get along these days.</p><p>In some ways, cultivating for &lt;weed &gt;control is almost a lost art. Herbicides seemed to work so well for so long that many farmers abandoned mechanical means of control.</p><p>No no no no. I'm in the same situation-&lt;disabled &gt; , chronic pain, artist, no "visible disability" (even when I'm in my chair), and nobody understands that it takes us longer to do *everything*. I'm honestly surprised you even humored your neighbor this far! They claim there is no network or storage capability in these machines, clearly this is not true. These features may be &lt;disabled &gt;or only available to administrators who service the equipment, but in any event the T S A @ @ @ @ @ @ @ @ @ @ problems. As to the veterans out there who work for the TSA, I share your frustration I would still donate food and clothing for people in need but at least I would know that it was my choice and it was being used for it's intended purpose. I applied for temporary assistance when I was &lt;between jobs &gt;for a month to support my family. We had no savings or income and we were denied because I had made too much money the previous year.</p><p>The more new people you meet, the more your chances of finding out about a great job increases. Then if you hear back from multiple places, you'll have choices and who wouldn't want to be able to choose &lt;between jobs &gt;rather than grasping at the first one that comes along.</p><p>Table <ref type="table">4</ref>: Euphemistic and Literal Usages of PETs <ref type="bibr">Barbieri et al., 2020)</ref>. The scores before and after substitution are compared using relative change, since each score is a probability of a classification label, rather than an absolute score (and should therefore be considered relative to that particular text). The results indicate that the use of a euphemism, as opposed to its literal meaning, affects sentiment scores. In particular, negative and offensive scores increase noticeably after substitution, which supports the assumption that euphemism softens language <ref type="bibr">(Bakhriddionova, 2021)</ref>. Additionally, the sentiment scores were grouped by PET and averaged, which shows the average sentiment changes per PET (see Appendix C). These results could be significant for future work involving euphemism detection using sentiment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Corpus Annotation Task</head><p>Because we know that euphemisms can be interpreted differently, we decided to let language experts (graduate students of NLP at Montclair) examine what their perceived interpretations were for our selected PETs given both euphemistic and literal context. For this final portion of our paper, we analyze their interpretations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D o</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1.">Task Instructions</head><p>Annotators were given a sample of 500 sentences in which the target PET was contained within h i. Without supplying the annotators with the literal meanings, they were asked to follow our annotation model and enter a 1 if they considered the sentence to be euphemistic and a 0 if they considered it to be non-euphemistic given the target PET. For every instance they were asked to provide their interpretations as well so that we could evaluate whether these PETs were in fact common enough to evoke similar interpretations. A confidence score was also requested on a scale of 1-3 to test how confident they each were of their interpretation. Appendix D includes the task instructions given to the annotators.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Inter-rater Agreement</head><p>Recognizing and agreeing on whether a term is a euphemism or not can present some challenges given that euphemisms are ambiguous. We were curious to examine whether the inter-rater reliability scores between our own annotations and those of our language experts reflected this ambiguity. We evaluated our observed agreement, and calculated Krippendorf's alpha to test reliability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.3.">Observed Agreement</head><p>We examined the observed agreement between ourselves and each individual annotator as well as the agreement between different pairs of annotators. This is simply a measure of how frequently a pair of annotators agreed on a label.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.4.">Krippendorf's alpha</head><p>To test inter-rater reliability we use a freely available macro written for SPSS and S A S to calculate Krippendorff's alpha <ref type="bibr">(Hayes and Krippendorff, 2007</ref>). Krippendorf's alpha, described in <ref type="bibr">Krippendorff (2011)</ref>, is a reliability coefficient which measures the agreement among any number of annotators where the general form for is:</p><p>= 1 D e Do being the observed disagreement among the values assigned to the units of analysis and D e being the disagreement one would expect when the coding of units is attributable to chance rather than to the properties of these units. <ref type="bibr">(Hayes and Krippendorff, 2007)</ref> describe the two reliability scale points for Krippendorf alpha as 1.000 for perfect reliability and 0.000 for the absence of reliability and say that these two points enable an index to be interpreted as the degree to which the data can be relied on in subsequent analyses. As illustrated by Table <ref type="table">6</ref>, analysis on our annotator sample shows an average observed agreement of 71.74% and a k-alpha of 0.415. We classify our score of 0.415 as 'fair' given the aforementioned index since euphemisms are ambiguous by nature. Future work to build upon the corpus may take a consensus coding approach to better decide on labels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.5.">Disagreement Examples</head><p>In the examples where annotators showed disagreement, we found their supplied interpretations to be particularly use- 1. Varying interpretations. Annotators sometimes differed significantly in what they deemed to be the meaning of a PET, even given context. The PET "freedom fighter", for example, might be interpreted as "a person who fights for freedom" (literal) or someone who "uses violence to achieve political goals" (euphemistic). PETs interpreted to have more emotionally charged meanings within the context generally received a euphemistic label.</p><p>2. The use of a commonly accepted term (CAT). Annotators tended to disagree when the PET in question was a commonly accepted term (CAT) in a particular domain (e.g., medical, journalism) or community (e.g., the disability community, LGBT+). As an example, the PET "venereal disease" can be seen as an alternative to "sexually transmitted disease" (a euphemistic usage) or simply as a C AT in the medical domain, in which case the usage is objective (a literal usage). Generally, it seems CATs could be viewed as noneuphemistic because they are the "default" term, but also as formalisms or categories in some contexts used to avoid an impolite alternative or undesired specification (euphemistic usage). The identification of CATs as a reason for ambiguity and disagreement in this task could be significant for euphemism research, as they can identified fairly clearly, and marked as needing special attention. In this sense, we consider CATs to be fuzzy PETs since depending on the hearer's interpretation they may or may not label the term as euphemistic.</p><p>3. Similar interpretations. Examples where the interpretations were nearly the same, but had varying labels, could indicate disagreement of something outside of the context. Examples include texts with PETs like "slim" and "overweight", which sometimes had disagreement despite being unanimously interpreted as "skinny" and "fat". Annotators' judgments about the nuance of these terms, or even speakers' intent, could have led to disagreement; i.e., if the use of the nuance is deliberate, the PET may be literal, but this could depend on the speaker's intent.</p><p>For these cases, there appears to be an inherent ambiguity in the classification task, which points to ambiguity in judgments about euphemisms as a whole. Factors such as varying interpretations, the use of CATs, and subjective judgments about speaker intent may all contribute to disagreement in human interpretations of PETs. (More examples of each case can be found in Appendix E).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this paper we described the creation of a new corpus of euphemistic and non-euphemistic usages of Potentially Euphemistic Terms (PETs). We performed two experiments: 1) Sentiment Analysis with a roBERTa-base model to confirm our assumptions about how euphemisms are used to soften language, and 2) conducted a survey and observe some cases of disagreement when using euphemisms. Our contributions were made in an effort to further along research done in automatic euphemism detection, identification and generation for a variety of NLP applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A List of 71 PETs only used Euphemistically</head><p>Below are the 71 PETs that were found to only ever be used in the euphemistic sense. Counts for the numbers of examples per PET are provided as well. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Annotation Task Instructions</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Euphemism Annotation Task Instructions</head><p>A euphemism is a mild or indirect word or phrase that is used instead of one that is unpleasant or offensive. (Merriam Webster) For example, "pass away" is a euphemism for "die".</p><p>We use euphemisms when we talk about sensitive topics such as death, sex, employment, bodily functions, politics, etc. Sometimes, words and phrases that could be used euphemistically are used literally, too. For example, the word "dismissed" may sometimes be used as a euphemism for getting fired, but it may sometimes be used non-euphemistically. For this reason, each keyword used is one that we have deemed to be potentially euphemistic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example of euphemistic vs. literal usage:</head><p>The suspect, identified as Neil Edwin Prescott in a court document obtained by ABC News, was being &lt;dismissed&gt; from his job.</p><p>vs.</p><p>An appeal must not be &lt;dismissed&gt; for informality of form or title of the notice of appeal, or for failure to name a party whose intent to appeal is otherwise clear from the notice.</p><p>For this task, you will read through the attached excel files and decide whether the keyword contained within &lt; &gt; is being used euphemistically or not. For every instance, you will need to provide your interpretation as well as a confidence score between 1-3 indicating how sure you are of your assessment.</p><p>1. If the usage is not euphemistic, enter a 0 in the "is_euph" column.</p><p>2. If the usage is euphemistic, enter a 1 in the "is_euph" column.</p><p>3. Your interpretation should be what you understand the keyword in question to mean given the context.</p><p>4. On a scale of 1-3, rate how sure you are of your assessment-1 not sure, 2 somewhat sure, 3 very sure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tips</head><p>&#8226; Keep in mind that your interpretation might be unpleasant/offensive. Additionally, It should be as direct and concise as possible. For example, "overweight" should be interpreted as something along the lines of "fat," rather than "a bit on the heavier side." &#8226; Your interpretation should still make grammatical sense if substituted in the sentence. If this is not possible, simply provide your interpretation as best as you can. &#8226; Some of the text is copyrighted so you will notice some rows may contain '@@@@@@@@@@'. We have gone through each one and feel that the meaning of each target keyword was kept intact despite having missing words replaced by these symbols.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E Examples of annotator disagreement, organized by possible causes</head><p>Below are example texts where annotators differed in their labels, and had significantly varying interpretations of the PET. P E T text annotator1 annotator2 annotator3 annotator4 freedom The Palestinian woman is un-Label: 0 Label: 0 Label: 0 Label: 1 fighter like @ @ @ @ @ @ @ @ @ Interpretation: Interpretation: Interpretation: Interpretation: @ be the </p></div></body>
		</text>
</TEI>
