<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>#WashTheHate: Understanding the Prevalence of Anti-Asian Prejudice on Twitter During the COVID-19 Pandemic</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10384701</idno>
					<idno type="doi"></idno>
					<title level='j'>The 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>B. Wheeler</author><author>S. Jung</author><author>M. C. Nardini-Barioni</author><author>M. Purohit</author><author>D. L. Hall</author><author>Y. N. Silva</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Prejudice and hate directed toward Asian individuals has increased in prevalence and salience during the COVID-19 pandemic, with notable rises in physical violence. Concurrently, as many governments enacted stay-at-home mandates, the spread of anti-Asian content increased in online spaces, including social media. In the present study, we investigated temporal and geographical patterns in social media content relevant to anti-Asian prejudice during the COVID-19 pandemic. Using the Twitter Data Collection API, we queried over 13 million tweets posted between January 30, 2020, and April 30, 2021, for both negative (e.g., #kungflu) and positive (e.g., #stopAAPIhate) hashtags and keywords related to anti-Asian prejudice. In a series of descriptive analyses, we found differences in the frequency of negative and positive keywords based on geographic location. Using burst detection, we also identified distinct increases in negative and positive content in relation to key political tweets and events. These largely exploratory analyses shed light on the role of social media in the expression and proliferation of prejudice as well as positive responses online.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>On January 4, 2020, the World Health Organization (WHO) reported that it was monitoring an outbreak of a new virus in the Wuhan, Hubei Province of China <ref type="bibr">[1]</ref>. At this time, knowledge of and concern about the virus from the public was limited. Less than one month later, however, on January 30, the WHO declared the spread of the virus, termed COVID-19, a public health emergency, bringing global attention to this widespread health concern <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>. The name 'coronavirus' was developed according to WHO's "Best Practices for the Naming of New Human Infectious Diseases," which recommends avoiding any cultural, social, regional, or ethnic associations when naming a disease <ref type="bibr">[3]</ref>. Despite these recommendations, given the origins of the virus, COVID-19 was frequently referred to in the media as the "Chinese virus," the "Wuhan virus," and the "Asian virus" <ref type="bibr">[4]</ref>- <ref type="bibr">[7]</ref>. While some have argued that this terminology is not inherently racist given the virus' origin, anti-Asian prejudice did notably increase in prevalence and salience during this time <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref>. For example, police reports in the U.S. involving anti-Asian hate and physical violence against Asian Americans and Pacific Islanders (AAPI) increased 145% in 2020 compared to previous years <ref type="bibr">[10]</ref> and Stop AAPI Hate-a non-profit organization dedicated to reducing anti-Asian prejudice-reported 2,583 incidents of anti-Asian prejudice between March 18, 2020 and August 5, 2020 <ref type="bibr">[11]</ref>.</p><p>Increases in anti-Asian prejudice have also been observed in online spaces, including social media. The Anti-Defamation League, for instance, reported an 85% increase in anti-Asian discrimination online <ref type="bibr">[12]</ref>. To illustrate, during the first months of the pandemic, 72,000 posts on Instagram contained the hashtag #WuhanVirus, while another 10,000 contained the hashtag #KungFlu <ref type="bibr">[13]</ref>. Notably, social media posts (i.e., tweets) generated by President Trump and other political leaders used the phrase "Chinese Virus" <ref type="bibr">[8]</ref>. The role of these tweets in promoting the continued use of the term is perhaps reflected by the finding that 18% of tweets using anti-Asian hashtags referred to Trump in some capacity <ref type="bibr">[8]</ref>, <ref type="bibr">[14]</ref>. In fact, recent research by Kim and Kesari <ref type="bibr">[15]</ref> identified marked increases in anti-Asian terminology after President Trump first started using similar language. Interestingly, counter-hate (i.e., positive language intended to combat hateful messages) that drew connections between anti-Asian terminology and xenophobia and prejudice also increased during this time <ref type="bibr">[15]</ref>.</p><p>The goal of the present study was to investigate temporal  and geographical patterns in social media content relevant to anti-Asian prejudice and positive (i.e., counter-hate) messages during the COVID-19 pandemic. Crucially, whereas other research has explored anti-Asian prejudice in online spaces during COVID-19 <ref type="bibr">[8]</ref>, <ref type="bibr">[14]</ref>- <ref type="bibr">[20]</ref>, the present study makes a vital contribution by (1) covering a significantly longer time frame than those previously studied (i.e., 15 months);</p><p>(2) considering both negative (i.e., anti-Asian) and positive hashtags during this period; (3) employing temporal analyses involving burst detection; and (4) integrating findings from data obtained using keyword searches as well as the 1% general sample stream on Twitter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. DATA</head><p>All data were collected according to Twitter data collection guidelines and using the proper API access provided to researchers <ref type="bibr">[21]</ref>, <ref type="bibr">[22]</ref>. In the following sections, we refer to anti-Asian content as "negative" and counter-hate content as "positive."</p><p>Archive Dataset. Using the Twitter Data Collection API 1 , we queried tweets containing negative and positive hashtags and keywords related to anti-Asian prejudice from January 30, 2020 to April 30, 2021. This time frame was selected to correspond with the date on which the World Health Organization indicated the spread of COVID-19 was a global health issue and the start of AAPI Heritage Month the following year (when positive AAPI messages might increase independent of COVID-19). We used 12 specific negative hashtags/keywords as indicators of anti-Asian prejudice (#batsoup, #chinavirus, #gobacktochina, 1 <ref type="url">https://developer.twitter.com/en/docs/twitter-api</ref> #chinesevirus, #chineseplague, #gook, #chinaliedpeopledied, #kungflu, #wuflu, #chingchong, #makechinapay, #ccpvirus) and 5 specific positive hashtags/keywords (#hateisavirus, #Iamnotavirus, #racismisavirus, #washthehate, #stopasianhate), which were chosen based on the relevant literature <ref type="bibr">[19]</ref>, news publications <ref type="bibr">[23]</ref>- <ref type="bibr">[25]</ref>, and social media posts discussing anti-Asian attitudes during the beginning of the pandemic. The total sample consisted of 13,008,053 tweets from 3,298,940 distinct users.</p><p>1% Dataset. The 1% sample stream dataset was generated from Twitter's sample stream endpoint <ref type="foot">2</ref> , which provides access to a roughly 1% random sample of publicly available tweets in real-time. This dataset was compiled from the tweets gathered over the course of 24 hours (August 1-2, 2021) to estimate the amount of activity that 1% of the Twitter platform could generate in one day. 4,093,933 tweets were collected in this sample from 2,956,806 distinct users.</p><p>Geographic Location Labeling. During the collection of both datasets, a filter was applied to collect a list of users that had publicly available geolocation data through their location setting. To perform descriptive analyses based on geographic location, we devised a geolocation labeling strategy similar to Jiang and colleagues <ref type="bibr">[26]</ref>. This strategy was necessary because less than 0.5% of the tweets in our dataset had available geo place information. For our analysis, we considered the state granularity for the tweets originating in the U.S. and the country granularity for the tweets originating in other countries, based on self-reported user profile locations. Using a fuzzy text matching algorithm <ref type="bibr">[27]</ref>, pre-processed user-  reported locations were matched against a set of predetermined locations inside and outside of the U.S. The similarity between user-reported locations and predetermined locations was computed using the edit distance metric. The score threshold to consider a matching pair of locations, which was set to 80%, was defined based on a validation analysis conducted by an external human annotator who manually verified a random sample of labeled locations considering country names and U.S. state names. Considering the precision validation measure (TP/(TP + FP)), the geolocation labeling strategy achieved a predictive positive value of 99.8% for the U.S. locations and varied from 89.8% to 100% for the other countries that, together with the U.S., account for 90% of the collected data. The set of predetermined locations inside the U.S. consisted of state names and state abbreviations. The set of predetermined locations outside the U.S. was built using the top 20 countries with the most Twitter users as of July 2020<ref type="foot">foot_1</ref> and their 5 most populous cities<ref type="foot">foot_2</ref> . To avoid ambiguity, only the country abbreviations that didn't overlap with a U.S. state abbreviation were included. Additionally, we used an ambiguous locations listbuilt throughout the testing process-to adjust the geolocation labeling by removing ambiguous matches. (An example of an ambiguous match is the token 'valencia,' which can refer to a city in Spain and a town in California.)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. RESULTS</head><p>We performed a series of exploratory descriptive analyses to investigate negative and positive keyword use and how this might vary over time and by geographic location. <ref type="foot">5</ref>Descriptive Analyses. As shown in Fig. <ref type="figure">1</ref>, the use of negative keywords before March 2020 was low. However, within that month, there was a marked increase in the use of negative keywords, with the frequency of negative keywords reaching its peak. Although considerably less frequent throughout most portions of the timeline, the use of positive keywords, in contrast, culminated in major spikes in late February 2021. As shown in Fig. <ref type="figure">2</ref>, globally, 4,521,457 distinct tweets contained at least one of the 12 negative keywords, with most of this content generated in the U.S. and India (USA = 233,705 tweets; IND = 228,621 tweets). 6,660,469 distinct tweets contained at least one of the 5 positive keywords, with most of the positive content also generated in the U.S., followed by Thailand (USA = 263,827 tweets; TH = 82,696 tweets). In the U.S., New York, California, and  Florida were the largest producers of negative content (NY = 37,986 tweets; CA = 30,139 tweets; FL = 27,076). Notably, however, California and New York also produced the most content containing positive keywords (CA = 45,886; NY = 48,595). Fig. <ref type="figure">3</ref> depicts usage trends of negative and positive keywords specifically in the U.S over time. These trends are similar to those found globally, with sharp spikes of negative activity following President Trump's first use of the term "Chinese Virus" in March 2020. Additionally, this figure also illustrates that although there were more tweets containing positive keywords utilized in the U.S., these tweets were mainly generated between February 2021 and April 2021. That is, they coincided with a prominent event that occurred in the U.S.-the Atlanta-area spa shootings that resulted in the deaths of multiple individuals of Asian descent <ref type="bibr">[28]</ref>. There was minimal use of these keywords during the early months of the pandemic. Out of the negative tweets produced in the U.S., the most frequently used negative hashtag was "ccpvirus," followed by "chinavirus" and "chinesevirus" (see Fig. <ref type="figure">4</ref>). The most frequently used positive hashtags in the U.S. were "stopasianhate" and "hateisavirus."</p><p>Analysis Using the 1% Dataset. The goal of this task was  to normalize the frequencies of tweets based on the amount of overall Twitter activity in each state of the U.S. To this end, an initial ratio was computed by dividing the counts of positive and negative tweets with valid geolocation in the archive dataset by the count of tweets in the 1% dataset for each state. Then, the average initial ratio was computed across all states (i.e., 4.277 for positive keywords and 3.786 for negative keywords). A new ratio was calculated for each state by dividing the initial ratio by the average ratio. The final ratios for negative and positive keywords are reported in Fig. <ref type="figure">5</ref>. As depicted, Tennessee had the highest ratio of negative keywords (i.e., 201% higher than the average), followed by Alaska, which was 151% higher than the average. Washington DC, on the other hand, had the highest ratio of positive keywords (i.e., 211% higher than the average), followed by Washington state (i.e., 203% higher than the average) and New York (i.e., 148% higher than the average).</p><p>Burst Analysis. We used Kleinberg's burst analysis algorithm <ref type="bibr">[29]</ref> to identify bursts of heightened negative and positive keyword use across time. This approach identifies bursts of activity in a series of events by modeling the transitions between two states-baseline and bursty. Bursty states are associated with periods of time when an event (e.g., negative  or positive tweets) is unusually frequent. The approach uses two main parameters, s and gamma, which affect different aspects of the way the algorithm detects bursts.</p><p>&#8226; s: This parameter controls the threshold of event frequencies, or intensiveness, for each state. Higher values of this parameter will require stronger increases of activity to detect a burst. &#8226; gamma: Gamma controls the difficulty of changing states.</p><p>Higher values of this parameter will require more effort to switch states.</p><p>Multiple s and gamma parameters for determining the sensitivity of the bursts were assessed in an iterative fashion. For example, as the s parameter cannot be less than or equal to 1, steadily decreasing values of gamma were tested ranging from the default of 1 to 0. During many of these combinations of s and gamma values, either the analysis resulted in a binary burst (i.e., all of the data represents a burst of activity or none of the activity is considered a burst) or the bursts were inconsequential. From this testing, values of 1.1 for the s parameter and 0.0 for gamma were selected, as these values provided optimal visual output. We performed separate burst analyses (with the same parameters) for the datasets with negative and positive keywords. Specifically, we used the burst detection algorithm to identify bursts in discrete bundles of events, where each bundle was defined as the set of negative or positive tweets received in a single day. For this analysis, we considered the tweets in the U.S. based on the geolocation labeling strategy previously described. To facilitate the processing of large frequency values using the Python Burst Detection library <ref type="foot">6</ref> , we applied a logarithmic transformation before feeding the data to the algorithm. The output of this burst analysis step was a set of date ranges for the identified bursts.</p><p>Negative Keyword Use. The dates identified in the burst analysis were labeled with events on a timeline corresponding to the use of anti-Asian terminology (e.g., "Chinese virus," "China virus") on President Trump's Twitter account, key political events, and COVID-19 milestones. In total, 8 bursts of activity were identified (labeled A through H in Fig. <ref type="figure">6</ref>).</p><p>Events were taken from dates up to 2 days before and after the beginning and end of the date ranges identified by the burst analysis. Events for Bursts A through F correspond primarily with tweets posted by (and originating from) Trump's Twitter account <ref type="bibr">[30]</ref>. Events for Bursts G and H were taken from news media coverage of significant events <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref> that occurred at the time, as well as from the CDC's COVID-19 pandemic timeline <ref type="bibr">[33]</ref>.</p><p>Positive Keyword Use. Evaluating the positive keyword use, 3 bursts of activity were identified, ranging from March 17, 2020 to June 16, 2020; June 19, 2020 to June 30, 2020; and February 2, 2021 to April 4, 2021 (Fig. <ref type="figure">7</ref>). These bursts in positive keyword use immediately followed increases in physical violence and hate in-person toward Asian Americans. For example, from March to June 2020, the Federal Bureau of Investigation reported increases in crimes directed toward Asian Americans (<ref type="url">https://crime-data-explorer.fr.cloud</ref>. gov/pages/explorer/crime/hate-crime). Further, the burst of positive activity following February 2, 2021 culminates in a marked increase in physical violence against Asian individuals. For example, within this time frame, the highly-publicized Atlanta-area spa shootings occurred, in which Asian women were targeted, leading to the deaths of 8 individuals <ref type="bibr">[28]</ref>. There were also several reports of individuals of Asian descent being verbally and physically assaulted in public, resulting in serious injury or death <ref type="bibr">[34]</ref>, <ref type="bibr">[35]</ref>. The burst in positive keyword use, in the form of prosocial, counter-hate messages, could be interpreted as a protective response to raise awareness as protests, rallies, and non-profit organizations were developed to fight this hostility <ref type="bibr">[36]</ref>- <ref type="bibr">[38]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. DISCUSSION AND CONCLUDING REMARKS</head><p>The present study investigated temporal and geographic trends in anti-Asian prejudice and counter-hate messages on Twitter in the 15 months after the World Health Organization  declared COVID-19 a public health emergency. Consistent with other recent research, our findings indicate that the increased prevalence of anti-Asian prejudice during early stages of the pandemic was a global phenomenon <ref type="bibr">[39]</ref>. Our findings also, however, revealed geographic differences in the frequency of negative (anti-Asian) and positive (counterhate) content generated by Twitter users within the U.S. For instance, New York, California, and Florida were the largest producers of negative keywords, overall, in our archived dataset (based on our query of over 13 million tweets). However, a complementary analysis performed on a random sample of approximately 1% of publicly available tweets from a single date yielded additional insights. When considering the 1% of tweets on a given day, the states with the highest ratio of negative keywords to all Twitter content generated by users in the state were Tennessee and Alaska. In contrast, whereas California and New York were the largest producers of positive keywords in our archived dataset, Washington DC had the highest ratio of positive keywords in the 1% dataset, followed by Washington state and New York. The greater positive Twitter content generated by users in New York, in particular, is interesting in light of the relatively higher rate of crime targeting AAPI individuals in this state. That is, data from Stop AAPI Hate <ref type="bibr">[40]</ref> indicates that out of 9,081 reported incidents of anti-Asian hate (i.e., physical violence, online harassment, civil rights violations) in the U.S. from March 2020 to June 2021, roughly 15% occurred in New York. The dynamic ways in which prejudice manifests itself in faceto-face interactions and online spaces-and the role of social media in conveying messages of support and solidarity in response to acts of racial animosity-warrant further empirical attention.</p><p>Using burst analysis, we identified several significant surges (i.e., bursts) in the frequency of both anti-Asian and counterhate keywords on Twitter. Examination of these bursts in relation to relevant content generated by President Trump on Twitter, political events, and key milestones in the COVID-19 timeline helps contextualize these temporal findings and underscores the extent to which social media can both reflect and influence anti-Asian sentiment. Crucially, our results are largely consistent with previous research indicating that President Trump's use of politically incorrect terminology when discussing political events has led to increases in White nationalist ideals and racism <ref type="bibr">[41]</ref>, broadly, and the finding that bursts of negative activity occurred after President Trump started using anti-Asian rhetoric in his tweets, speeches, and interviews during the pandemic <ref type="bibr">[8]</ref>. Furthermore, the complexity of the prejudice fueled by and evident throughout the pandemic is perhaps illustrated by the political connotation of some of the anti-Asian keywords. For example, "ccpvirus"in reference to the Chinese Communist Party-likely stemmed from news reports that this political party withheld information about COVID-19 during the early months of the pandemic <ref type="bibr">[42]</ref>.</p><p>Finally, our findings also suggest that positive online activity may act as a protective response, bringing heightened awareness to anti-Asian prejudice through "hashtag activism." It remains unclear, however, whether surges in the use of positive keywords (e.g., #hateisavirus, #stopasianhate) led to a measurable reduction in verbal and physical attacks against AAPI individuals; notably, similar campaigns aimed at reducing violence have lost momentum over time <ref type="bibr">[43]</ref>. Nonetheless, our hope is that our efforts to expand on recent research in this area will contribute to a deeper understanding of how prejudice and hatred, as well as empathy and counter-hate, proliferates online during global crises.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>https://developer.twitter.com/en/docs/twitter-api/tweets/volume-streams</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1"><p>https://www.statista.com/statistics/242606/number-of-active-twitter-usersin-selected-countries/</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2"><p>https://worldpopulationreview.com/world-cities</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3"><p>In the subsequent sections, we use the term "keyword" to refer to both hashtags and keywords (e.g., "chinavirus" and "China virus").</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4"><p>https://pypi.org/project/burst detection</p></note>
		</body>
		</text>
</TEI>
