<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>American Sign Language Video Anonymization to Support Online Participation of Deaf and Hard of Hearing Users</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/17/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10345322</idno>
					<idno type="doi">10.1145/3441852.3471200</idno>
					<title level='j'>Proceedings of ASSETS '21: The 23rd International ACM SIGACCESS Conference on Computers and Accessibility</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue">Article 22</biblScope>					

					<author>Sooyeon Lee</author><author>Abraham Glasser</author><author>Becca Dingman</author><author>Zhaoyang Xia</author><author>Dimitris Metaxas</author><author>Carol Neidle</author><author>Matt Huenerfauth</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Without a commonly accepted writing system for American Sign Language (ASL), Deaf or Hard of Hearing (DHH) ASL signers who wish to express opinions or ask questions online must post a video of their signing, if they prefer not to use written English, a language in which they may feel less proficient. Since the face conveys essential linguistic meaning, the face cannot simply be removed from the video in order to preserve anonymity. Thus, DHH ASL signers cannot easily discuss sensitive, personal, or controversial topics in their primary language, limiting engagement in online debate or inquiries about health or legal issues. We explored several recent attempts to address this problem through development of “face swap” technologies to automatically disguise the face in videos while preserving essential facial expressions and natural human appearance. We presented several prototypes to DHH ASL signers (N=16) and examined their interests in and requirements for such technology. After viewing transformed videos of other signers and of themselves, participants evaluated the understandability, naturalness of appearance, and degree of anonymity protection of these technologies. Our study revealed users’ perception of key trade-offs among these three dimensions, factors that contribute to each, and their views on transformation options enabled by this technology, for use in various contexts. Our findings guide future designers of this technology and inform selection of applications and design features.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION AND MOTIVATION</head><p>Our research evaluates Deaf and Hard of Hearing (DHH) users' interest, preferences, and concerns in relation to a prototype for anonymization of sign language video communications, to provide guidance for designers of this technology. While there are many sign languages, our work focuses on American Sign Language (ASL), used by over 500,000 people in the U.S. <ref type="bibr">[34]</ref>. Although often used in countries in which English is spoken, ASL is a language distinct from English, produced by movements of the face, head, hands, and torso <ref type="bibr">[4,</ref><ref type="bibr">11,</ref><ref type="bibr">35,</ref><ref type="bibr">47,</ref><ref type="bibr">55]</ref>.</p><p>Being able to communicate anonymously in one's preferred language is essential for participating in a variety of social, professional, and societal contexts. Some prior work <ref type="bibr">[3,</ref><ref type="bibr">33]</ref> has focused on techniques to hide the face of a user for privacy protection in circumstances where this may be important. For instance, Internet users may visit discussion boards to ask questions about sensitive topics; individuals may express dissenting political or religious views that could subject them to persecution; or essential professional activities like academic peer-review may require anonymity.</p><p>While it is relatively straightforward for users of written languages to engage in anonymous written communications online, such options have not been available for users of sign languages. These languages generally lack a written form in common use among the language community, and therefore video-based communication, which reveals the face, is necessary.</p><p>While users of spoken language can hide their face on online video-sharing platforms <ref type="bibr">[17,</ref><ref type="bibr">24,</ref><ref type="bibr">46,</ref><ref type="bibr">52,</ref><ref type="bibr">59</ref>], this option is not available to ASL users, as the face conveys essential linguistic information <ref type="bibr">[4,</ref><ref type="bibr">11,</ref><ref type="bibr">27,</ref><ref type="bibr">35,</ref><ref type="bibr">55]</ref>. Barriers to private communication in one's primary language limit online debate or enquiries, e.g., in relation to sensitive topics, such as reproductive health, domestic abuse, or substance abuse, which prior research has revealed to have higher prevalence in the DHH community <ref type="bibr">[5,</ref><ref type="bibr">42]</ref>. Anonymizing the face, while retaining the key linguistic information it conveys, would also enable peer review of academic publications in sign language, conformity in appearance when multiple individuals contribute to a composite video or collection (e.g., entries in a video ASL dictionary), and privacy protection when users contribute videos to ASL datasets for AI research -applications discussed in <ref type="bibr">[7,</ref><ref type="bibr">32]</ref>.</p><p>Over the past decade, real-time tools for face transformations have become popular among consumers, e.g., to make someone appear to be wearing makeup <ref type="bibr">[25]</ref> or overlay a virtual cute animal mask <ref type="bibr">[61]</ref>. More recently, AI technologies for real-time face transformation (sophisticated technologies that preserve facial expressions) have matured and become available to non-technical users for producing realistic videos in which a synthetically generated human face in a video is driven by the face of another person. As compared to earlier face-flter technologies (simplistic technologies that do not preserving facial expressions), these advancements enable new applications for DHH ASL users, as it is now possible to replace the face while preserving detailed facial expressions and head movements.</p><p>In this research, we conducted an interview study to evaluate prototype face-disguise technology (a generic term for technologies that obscure the face) applied to videos of human ASL signers, infuenced by recent image-to-video technology <ref type="bibr">[44,</ref><ref type="bibr">45,</ref><ref type="bibr">50,</ref><ref type="bibr">60]</ref>, for replacing the face in a video with a new face from a given photograph, preserving facial expressions and head movements. In one prototype variation, the torso of the human remains in the video, and in another, the torso is hidden to disguise the clothing and body for further obscuring the identity of the signer. For comparison, we also evaluated a simpler face-flter with a virtual cartoon-like Tiger mask, previously evaluated in <ref type="bibr">[7]</ref>. In a 70-minute appointment, participants: (1) viewed disguised videos and attempted to identify the person in the original video from a line-up of photos, (2) viewed original and disguised videos processed by prototype variations, and provided subjective feedback about each, and (3) viewed videos of themselves transformed by this technology. In a semi-structured interview, participants discussed their views of the technology, preferences among appearance options, factors afecting acceptability, potential uses, and concerns.</p><p>The contributions of this work are empirical and include: (1) The frst evaluation with DHH ASL users of modern face-transformation technology, capable of preserving ASL linguistic facial expressions, revealing its efectiveness at preserving anonymity; (2) Quantitative and qualitative evaluation of understandability, naturalness, and anonymitypreservation, to compare prototypes varying in their appearance transformations; (3) Evidence of users' views on the acceptability of this technology, its potential uses, and their concerns; (4) Identifcation of users' perceived tradeofs among understandability, naturalness, and anonymity protection, with design considerations from our analysis;</p><p>(5) Evidence of ways in which preservation and transformation of identity relate to users' acceptance of this technology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">PRIOR WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Existing Methods of Conveying ASL Anonymously</head><p>While researchers have acknowledged the importance of enabling deaf signers to communicate anonymously online <ref type="bibr">[15,</ref><ref type="bibr">16]</ref>, most prior eforts to address this problem have aimed to produce artifcial writing systems for sign language or to create tools to allow deaf signers to create their own animations of a virtual human signing their message. Despite eforts to invent sign language writing systems, e.g., <ref type="bibr">[2,</ref><ref type="bibr">39,</ref><ref type="bibr">48]</ref> or related technologies <ref type="bibr">[8]</ref>, no writing system has yet gained widespread popularity within the DHH community. Thus, written communication in ASL is not practical for enabling signers to communicate without revealing their identity.</p><p>Other work seeks to enable users to create synthetic animations of sign languages, which could, in principle, produce anonymous messages. Prior sign language animation research has largely focused on machine-translation contexts <ref type="bibr">[6]</ref>, but some work examines how to enable users to script the movements of virtual humans to perform sign language, e.g., <ref type="bibr">[13,</ref><ref type="bibr">21]</ref>. Unfortunately, existing tools are not yet sufciently expressive to produce clear virtual animation, nor are the tools and techniques for building novel animated messages likely to become simple enough for use by non-experts, despite recent eforts <ref type="bibr">[1,</ref><ref type="bibr">56]</ref>. In summary, despite work on writing systems and avatar technologies, no existing approaches yet provide a satisfactory solution to the challenge of anonymous communication in sign language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Accessibility of Writen/Spoken and Sign Language Online Content Creation</head><p>Prior work has examined DHH users' interests, current practices, and barriers, in relation to producing content to share online, e.g., <ref type="bibr">[10,</ref><ref type="bibr">14,</ref><ref type="bibr">15,</ref><ref type="bibr">23]</ref> or in the context of social media interaction <ref type="bibr">[32]</ref>. When privacy is a concern, DHH users must currently use written English to prepare online messages or content. Given the diversity in written-language literacy levels among DHH individuals <ref type="bibr">[53]</ref> and the preference of many DHH users for communication in ASL, DHH users face barriers to online participation <ref type="bibr">[32]</ref>, if they wish to preserve their anonymity during interactions. This is an inequitable situation, as hearing individuals can express themselves online much more easily, in written or spoken form (assuming that their voice is not recognizable and their face is disguised).</p><p>Prior work has revealed particular challenges for users who prefer to produce content in sign language, as they must create and post a video of themselves, with their faces and physical appearance visible to whoever watches the video. Recent research <ref type="bibr">[32]</ref> has highlighted challenges that DHH ASL signers face in participating in social media sites by recording and sharing ASL video. As reported in <ref type="bibr">[32]</ref>, the need to hold the phone with one hand (e.g., while standing) in order to record themselves leaves only one hand for signing, which is not ideal, because signing in ASL normally requires two hands. Adding text captions to videos to enable them to be understood by individuals who do not know ASL is also time-consuming. The authors provided potential solutions for these challenges, such as incorporating automatic captioning into social media platforms. While <ref type="bibr">[32]</ref> focused on barriers to communication on social media platforms, our work focuses on preserving DHH individuals' privacy in video communication. In summary, prior work has revealed that there is strong interest among DHH users for technologies that could facilitate ASL-based communication online, especially in a manner that is privacy preserving; yet existing technologies are not providing an adequate solution to this challenge.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Video de-identification for privacy in video sharing sites</head><p>Some recent work has investigated face-disguise technology for motivating ASL signers to feel comfortable sharing videos in public ASL datasets for research <ref type="bibr">[7]</ref>. This study is of particular interest, in that participants were asked questions about their interest in and impressions of face-disguise technology -albeit within this specifc context of contributing to a dataset. Participants were able to see their own video transformed through some simple face-flter technology, including a flter that overlaid a cartoon tiger face on top of the signer's face without preservation of any facial expressions, aside from the degree to which the mouth opens. Participants were more willing to share their video publicly with flters mitigating privacy concerns, yet they were dissatisfed with the fact that the flters did not preserve facial expressions.</p><p>In the video/photo sharing context, trade-ofs between the utility of the anonymized video/photo and privacy protection have also been investigated <ref type="bibr">[18]</ref><ref type="bibr">[19]</ref><ref type="bibr">[20]</ref><ref type="bibr">29]</ref>. Prior work has studied how the level of obfuscation from various image fltering techniques (e.g., blurring, pixelization, masking) afects the viewer's experience and the utility of the video/image for specifc tasks, e.g., patient training video in a clinical setting <ref type="bibr">[18]</ref>. As found in prior work <ref type="bibr">[7]</ref>, obfuscation from some common privacy enhancing techniques does not satisfy ASL signers because facial expressions are not preserved. Prior research suggests that providing adequate privacy protection for various contexts and uses requires careful selection of the relationship between the level (ranging from no recognition to full recognition) and the types (e.g., blurred, masking, face disguise) of anonymization. Focusing specifcally on DHH signers, our study difers from prior work in two ways: (a) We investigate more advanced face-transformation technologies capable of preserving facial expressions; and (b) We investigate these technologies for preserving privacy in ASL videos for a wider variety of uses and contexts, e.g., participation on social media platforms.</p><p>In recent years, there has been tremendous progress in technologies for analyzing and synthesizing video of human faces, e.g., <ref type="bibr">[3,</ref><ref type="bibr">40,</ref><ref type="bibr">49,</ref><ref type="bibr">50,</ref><ref type="bibr">60]</ref>, with new applications in smart home technologies <ref type="bibr">[54]</ref>, health <ref type="bibr">[12,</ref><ref type="bibr">22]</ref>, and other felds.</p><p>Another key application of this technology has been for de-identifying videos in order to preserve privacy, e.g., <ref type="bibr">[3,</ref><ref type="bibr">33]</ref>.</p><p>While most work has focused on technical details and performance of this technology, some researchers have conducted research with human participants to understand their interests in or concerns about this technology. Advances in this technology have led to recent public awareness of "deep fake" technologies for producing seemingly realistic videos of humans, in which the movement of the face is based on the performance of a human in an original video. The ease of creating videos that impersonate someone, making it appear that they are saying or doing things that they had never said or done, has raised signifcant ethical concerns <ref type="bibr">[28,</ref><ref type="bibr">43]</ref>.</p><p>Given the complex face and head movements used in ASL for a variety of linguistic purposes, e.g., involving subtle movements of the eyebrows or head <ref type="bibr">[4,</ref><ref type="bibr">11,</ref><ref type="bibr">35]</ref>, there has been a question as to whether the resulting video would sufciently preserve these key linguistic elements of the performance. Some researchers have begun to design facedisguise technology with a particular focus on preserving such elements of the performance <ref type="bibr">[44,</ref><ref type="bibr">45,</ref><ref type="bibr">50,</ref><ref type="bibr">60]</ref>, necessary for applying this technology to sign language videos. However, there is a need for empirical research with DHH ASL signers, to understand the performance of this technology, as well as users' impressions and judgments of its suitability for the task of anonymizing ASL videos to be shared online.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">RESEARCH GOALS AND METHODS</head><p>Emerging face-transformation technology has the potential to create realistic videos with new faces; yet prior work has revealed ethical concerns with the use of such technology. While some research has examined DHH users' interest in simple face-flter technologies for specifc contexts, no prior study with DHH users has investigated state-of-the-art face-disguise technology capable of preserving facial expressions and natural human appearance for sign language video.</p><p>As these new technological capabilities emerge, it is important to understand DHH users' interest in and impressions of this technology for protecting anonymity, including users' views of various dimensions of system performance, e.g., understandability and naturalness of appearance. The goal of this research is to guide the development of ASL-optimized face technology and inform designers of future applications for these users.</p><p>We conducted an interview-based study with 16 DHH individuals who reported using ASL on a daily basis; each participated in a 70-minute Zoom teleconference meeting with a DHH ASL-signing researcher. In this IRB-approved study, the participants were shown examples of videos of ASL signing processed by prototype face-transformation technology (section 3.1). Prior to transformation, some of these videos had been of the participant, submitted to us in advance of the appointment, and some were of other ASL signers from a public research dataset of ASL signing.</p><p>The interview was conducted entirely in ASL, while the researchers typed notes in English. Participants were asked a mixture of open-and closed-ended questions about their subjective impression of the videos, especially in regard to how well they preserve anonymity, their understandability, and other factors, as described in section 3.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Anonymization Technology Prototypes</head><p>In this study, we compare multiple prototype technologies for disguising the face of an ASL signer. We refer to our frst prototype as tiger-face, a simple video flter technology, similar to those used in SnapChat, in which a 3D mask is virtually overlaid on the face in the video. Our rationale for selecting this prototype is three-fold: (a) It refects the state-of-the-art of consumer-grade face technologies popular during the 2010s; (b) The specifc flter was used in a prior study that had examined DHH users' interests in using flters to hide their identity <ref type="bibr">[7]</ref>, the open-source tiger-face flter from Jeeliz <ref type="bibr">[26]</ref>; and (c) It also provides a baseline point-of-comparison for participants, to determine whether the more computationally intensive facial-expression-preserving transformations were useful. The flter detects the human's face and overlays an animated tiger avatar head, which emits blue bubbles from its mouth, triggered whenever the human's mouth opens. Participants in that prior study commented on the limitations of this flter, which does not preserve any other facial expression details, e.g., eyebrow movements, despite this being linguistically important in ASL. We included tiger-face in our study as a baseline for comparison, refective of the prior state-of-the-art for available face-disguise technologies.</p><p>Our prototype, with-torso, is based on recent work on image-to-video transformation and video editing, to enable the replacement of the underlying facial geometry, while preserving the linguistically signifcant facial expressions <ref type="bibr">[44,</ref><ref type="bibr">45,</ref><ref type="bibr">50,</ref><ref type="bibr">60]</ref>. The rationale for including this transformation in our study was that it refects a state-of-the-art facial image animation and transformation technology. This specifc technology was selected because of its ability to animate face images based on image-to-video transformation, to enable the replacement of the underlying facial geometry by editing the latent facial representations <ref type="bibr">[51,</ref><ref type="bibr">57]</ref>. The torso and background of the signer are not touched or modifed in any way. Colloquially, we may refer to the face of the signer being "swapped" with a diferent human face, based on an input photograph of the desired "target face." However, the resulting output video actually appears as a blend of the facial structure of the original signer and those of the individual pictured in the "target face, " resulting in a novel composite face that mimics the head movements and facial expressions of the original signer. Sample images of the output of this transformation are shown in Figure <ref type="figure">2</ref>, and the electronic supplementary fles provided with this paper include video samples.</p><p>The third prototype, without-torso, is identical to the with-torso prototype, except that the signer's torso and the background are both replaced by a fat gray color, as shown in Figure <ref type="figure">2</ref>. The rationale for including this transformation is that identity may be revealed not only by the face, but also by body appearance, clothing, or background, especially if the person viewing the video is familiar with the person in the video.</p><p>For both with-torso and without-torso, the resulting output can be varied, by selection of diferent "target faces, " and throughout our study we displayed videos based on a variety of target faces, selected from the Chicago Faces Dataset <ref type="bibr">[30,</ref><ref type="bibr">31]</ref>. We took into account the gender and race/ethnicity of the person in the original video, and we selected target faces of other people with corresponding demographic characteristics -with variation in age, hair style, and hair color.</p><p>The rationale for selecting these variations was that they refect common options for the selection of video-game avatars or personalized emojis on social media, and several pilot interviews with DHH ASL signers prior to our study revealed their interest in such options. More details about the transformations used in the separate phases of this study are described below. Figure <ref type="figure">2</ref> shows screenshots from a few videos and their transformations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Study Design</head><p>The 70-minute appointment was temporally partitioned into three phases, for participation in three diferent activities.</p><p>During each phase, the participant viewed the videos and then answered semi-structured interview questions. In the frst phase, we evaluated face disguise technology from the perspective of participants' seeing a disguised video of other people. In the next phase, the understandability, naturalness, and anonymity protection of the transformed videos were assessed, with participants viewing a variety of face-disguise options. (Prior to the main study, we had conducted pilot interview studies with DHH participants to ask them about their interest in technologies for disguising the face, and this had suggested that understandability, naturalness, and anonymity may be key issues for users, which helped us in fnalizing the design of our interview questions for this phase.) In the fnal phase, participants saw themselves disguised, and they commented on the acceptability of the transformed videos and shared other concerns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Phase 1 of the Study</head><p>The frst phase focused on evaluating how efectively videos had been disguised by the with-torso and withouttorso software; participants were asked to attempt to identify the original person in the video. The source videos used in this phase of the study were from the Boston University American Sign Language Linguistic Research Project <ref type="bibr">[36]</ref><ref type="bibr">[37]</ref><ref type="bibr">[38]</ref>.</p><p>To produce a variety of videos, we selected two videos of a male signer and two of a female signer from this dataset; in each video, the signer produces 1-2 ASL sentences. Next, we processed the videos using each of the two prototypes, with-torso and without-torso, using two diferent "target faces" for each (two male target faces for the male signer, and two target faces for the female signer). Overall, this yielded 16 disguised output videos.</p><p>Each participant viewed one disguised video of the male signer, and one disguised video of the female signer. One video was processed using the with-torso prototype, and the other, using the without-torso prototype. The order in which these stimuli were shown to participants, and the assignment of prototype-condition to each gender, were counterbalanced via Latin square. After viewing each video, participants were shown a line-up of six diferent faces, one of which was the true face of the ASL signer in the anonymized videos. The order in which these line-up faces were shown to the participants was also counterbalanced via Latin square. Figure <ref type="figure">1</ref> shows example line-up photos for both the male and female faces. After participants guessed which face was the original person in the video, they indicated their agreement with the Likert item: "It was very difcult to guess the original signer." Phase 1 concluded with questions about participants' opinions of the videos and their difculty in guessing the signer, including whether seeing the original signer's body and background made it easier to guess the original signer's face.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Phase 2 of the Study</head><p>The second phase focused on the understandability, naturalness, and anonymity-protection of videos from all three prototypes, including with-torso and without-torso videos based on a variety of target faces, as well as the tiger-face prototype. In this phase, each participant viewed a total of 34 videos, half based on a source video from a male signer from <ref type="bibr">[36]</ref><ref type="bibr">[37]</ref><ref type="bibr">[38]</ref>, and half from a female signer from the same dataset. For each signer, participants were shown an original, unmodifed video, followed by 16 transformed videos associated with that source video. The 16 transformed videos consisted of several sets, each of which focused on one appearance characteristic that varied within each set:</p><p>&#8226; age (3 videos; based on a young, middle, and older-aged target face),</p><p>&#8226; artifcially colored hair (3 videos; blue, pink, and green colored hair),</p><p>&#8226; natural-colored hair (3 videos; light, medium, and dark shades),</p><p>&#8226; with-torso (2 videos with the torso visible -all the others had the torso removed), and</p><p>&#8226; tiger-face (1 video shown with an animated cartoon tiger face, as used in <ref type="bibr">[7]</ref>).</p><p>The order of these sets was counterbalanced between participants, and whether male or female videos were shown frst was also counterbalanced. After the frst with-torso video was shown, the researcher on the video call interrupted the participant to ask the participant to indicate agreement with each of three Likert items, "This video was completely understandable," "This video was very natural in appearance," and "This video disguised the identity of the original signer completely. " Similarly, as soon as the frst without-torso video was shown, and immediately after the tiger-face was shown, the participant was asked these same three questions. After the participant viewed all videos in this phase,</p><p>semi-structured open-ended interview questions were asked about the overall understandability, naturalness, and anonymity-protection of the transformations.  transformed to (e). Samples include: (e) with-torso, (f-g) without-torso, and (h) tiger-face. Source videos (a-c) from <ref type="bibr">[36,</ref><ref type="bibr">38]</ref> and (d) illustrates the type of videos participants submited (blocked here for anonymity). Videos are in electronic supplementary files.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Phase 3 of the Study</head><p>In the third phase, participants saw a video of themselves transformed using all three prototypes so that we could evaluate their view of how acceptable this technology is for disguising their own videos. While the tiger-face prototype could run in real time, the with-torso and without-torso prototypes required additional processing time.</p><p>Thus, prior to the appointment, we asked participants to submit a video of themselves signing a short ASL passage.</p><p>Because of limitations in the anonymization prototype and in order to ensure good-quality output, participants were instructed to make sure they had good lighting and a plain background, and they were asked to pull shoulder-length or longer hair back in a ponytail. Participants were also asked to remove any glasses, headgear, and hand jewelry. Lastly, participants were asked to sign in a manner that avoids having their hands obstruct their face, as the prototype system is not robust to face occlusions. For this reason, signers were given an ASL gloss script for a specifc passage to perform that excluded signs in which the hands would come close to the face, while also requiring the grammatical use of several facial expressions in ASL: "BOOK, I BUY. TODAY, YOU BORROW. BOOK, READ YOU? BOOK WHERE?" During phase 3 of the appointment, participants viewed 13 transformed videos, based on the video they had submitted.</p><p>Six were with-torso, with another six without-torso, using the same set of target faces. The target-face set was matched to the participants' self-reported gender and apparent race/ethnicity in their submitted videos. The 13th video was a live demo website with the tiger-face efect, which participants were instructed on how to use.</p><p>After viewing all videos, participants responded to open-ended questions about their perception of and preference among the videos, whether they thought the quality of these videos was good enough for them to consider using software like this, and whether it would be helpful for them to have software that could anonymize videos. Finally, participants were asked what situations they would or would not use this software for, whether they thought it would be acceptable for other people to use software like this, and whether they had concerns about software like this.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Participants</head><p>Via social-media postings, we recruited 16 DHH adults who use ASL on a daily basis; 12 indicated that ASL was their primary language. Four participants had used ASL since birth, 6 learned ASL by age 5, and 6 learned ASL during their late teens (with all in this latter group having used ASL for at least 8 years). Participants' ages ranged from 19 to 47 years old (median 27.5). Eight self-identifed as male, 1 as non-binary, and 7 as female. Participants' education levels varied: 1 had some undergraduate education, 1 had an associate's degree, 10 had a bachelor's degree, and 4 had a master's degree.</p><p>Eight self-identifed as Caucasian, 1 as Black, 3 as Asian, 1 as Vietnamese, 1 as Latino, 1 as Asian &amp; Hispanic, and 1 as Spanish &amp; Native American. A demographics table appears in electronic supplementary fles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7">Data Analysis</head><p>All data collected from the three phases of studies were analyzed with both quantitative and qualitative approaches. We conducted statistical analysis with Friedman tests on the quantitative data, and we performed an iterative thematic analysis <ref type="bibr">[9]</ref> on our qualitative data, employing both deductive and inductive approaches. We manually developed a deductive coding framework with the main topics of our interview questions. In the framework, we aggregated all the data and iteratively performed open coding using colors. Then codes were generated with the color-coded data and organized with categorization. Finally, main and sub-themes were identifed and developed using a bottom-up approach. We went through the same process with the data from all three phases of the study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">FINDINGS</head><p>To investigate the usefulness of the anonymized ASL video, we compared three prototypes (with-torso, without-torso, tiger-face) along three evaluation dimensions: understandability, naturalness, and anonymity. During the study we had collected some quantitative data, e.g., participants' Likert response to questions in phase 2 about each of these dimensions. Our quantitative analysis consisted of conducting Friedman tests, which indicated a statistical signifcance in understandability and naturalness among the three types of transformations, but no statistical signifcance for anonymity-protection. Following up with pairwise Wilcoxon signed-rank tests, signifcant diferences among the types of transformations were identifed. In our qualitative analysis, we found that the participants overall perceive the video transformation as interesting and useful. However we observed difering perspectives among participants in regard to how they compare these three prototypes along the three dimensions, as well as how this afects their overall views on the ASL video anonymization and its value. We present the details of the fndings in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Understandability</head><p>4.1.1 Qantitative Analysis. Figure <ref type="figure">3</ref> displays participants' responses during phase 2 of the study to the Likert item "This video was completely understandable. " Analysis with a Friedman test revealed that the type of video transformation had a signifcant main efect on understandability (p &lt; .05). Overall, 81% of respondents strongly agreed with this statement in regard to the with-torso videos, 62% of respondents strongly agreed in regard to without-torso videos, but only 25% agreed in regard to tiger-face videos. Post hoc pairwise analysis with a Wilcoxon signed-rank test and Bonferroni correction revealed that participants believed the with-torso videos were more understandable than the tiger-face videos (p&lt;0.01). However, no signifcant diference was observed between with-torso and without-torso, nor between without-torso and tiger-face. Overall, these quantitative fndings indicate that ASL signers believed that the modern 3D face transformation videos with a torso displayed (with-torso) were more understandable than the simple mask-overlay videos (tiger-face), when viewing videos of ASL.</p><p>4.1.2 Qalitative Analysis. The overall feedback in regard to the understandability of the anonymized videos of all three prototypes was generally positive, which aligned with the quantitative fndings presented above. Most participants indicated that the transformed videos were clear and conveyed the same information as the original videos. Among the ASSETS '21, October 18-22, 2021, Virtual Event, USA Lee et al.</p><p>"Knowing the person and seeing their torso and background would make it easier to identify them because the more you hang out with the person you know their body language and how they sign. " P14 agreed that the without-torso videos had the greatest anonymity protection: "Without torso is the best, sometimes you can identify people by the body shape, etc, but without seeing the body it is very difcult to guess despite that it might be harder to understand or not natural. " Some participants believed that the tiger-face videos were most efective at disguising the face, which is simply blocked, without any facial expressions revealed. However, P13 explained that there are trade-ofs between the ability of some prototypes to disguise the face or to disguise the body. As P13 explained, "Without torso is the best. It covers the face and also hides the body language. You can't look at the body shape, size, etc. For tiger face, it hides the face the best but it doesn't hide the body at all. Without-torso has the best balance at hiding body but keeping facial expression. " While participants agreed that without-torso videos were most efective at preserving anonymity, all participants commented that they would prefer to view a video with a torso -because of naturalness and understandability, as discussed previously. Several commented that it would be useful if this technology could make modifcations to the body of the signer instead of removing it, e.g., suggesting that the tool could change the signer's clothing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Preferences for Transforming Specific Characteristics</head><p>Throughout the study, participants viewed disguised videos of both themselves and other people, with a variety of characteristics transformed, e.g., age, hair color. Many participants indicated their preferences for video transformations that closely matched their own traits, such as race, age, hair, and skin color. P6's comment conveys this clearly: "What I liked was all the faces using the same race or traits as me...I liked the faces that looked similar to my face." Participants also emphasized the importance of having the transformed video match their own age. For instance, in response to a question about which of their own transformed videos of was their favorite, P7 answered: "4th video. It was similar age, and looked the most natural to me. " P7 went on to emphasize the importance of the age feature for the natural appearance of the video: "I would use a similar age, but I don't care about the other features as long as it doesn't look way of or too unnatural." In the same vein, P1 expressed unhappiness with a transformed video with an older looking face, indicating that it was the least favorite video, and commenting, "I didn't like that you made me old, I didn't like the age change. " What mattered most to participants was whether changing specifc characteristics reduced the naturalness of the resulting video; participants generally disliked transformations that resulted in artifcial-looking hair color or the tiger face. In fact, all but one participant disliked having their hair transformed into bright colors. For instance, P2 commented, "I didn't like using the diferent hair colors like purple hair was strange., and P9 added, "it was funny to see my hair color look diferent. " Similarly, all but one participant disliked having their video transformed into the tiger face -with participants commenting on its artifcial appearance and oversized head.</p><p>To a lesser degree, participants preferred transformations of characteristics that supported understandability. For instance, P12 noticed that some transformations preserved facial expression more clearly than others: "it was easier to understand the younger faces than the older faces because I could see their mouth move. " Participants also mentioned that some transformations led to a distracting result, which interfered with their visual focus and thus their understanding.</p><p>For example, P13 said, "with tiger face, it was not very clear because it kind of blocked the signing, the face was big, and was distracting. " For the without-torso version, P14 commented, "It was distracting to have no body". P4 disliked brightly colored hair, explaining: "it seemed distracting for me. "</p><p>Participants' preference among most transformations did not depend upon whether it was applied to videos of themselves or of others, with one exception: the removal of the torso from a video. Before participants viewed their own transformed videos, all of participants favored the with-torso videos, commenting on the natural appearance.</p><p>Participants tended to retain this preference until they viewed their own transformed videos during phase 3, at which time half of the participants switched their preference from with-torso to without-torso, because they were worried that identifable characteristics were visible in their with-torso videos.</p><p>In fact, upon seeing transformed videos of themselves, some participants not only became interested in the withouttorso feature, but they also wondered how they could strategically transform as many demographic and appearance characteristics as possible, to protect their identity. For instance, P13 suggested: "for improving anonymization, I would use a neutral color skin on the arms, neck, etc. And doesn't have to match gender, you could use neutral gender or opposite gender instead of having to match. " However, some participants noted that using this technology to change the skin color of one's face could produce ofensive or insensitive results, with P9 musing, "There could be a few issues with race... " Finally, participants believed that the appropriateness of specifc appearance transformations would depend upon the context of use, as some situations required more anonymity or seriousness. P8 said, "Doesn't matter to me which appearance, it's more about how serious I want to be when hiding my anonymity. If I wanted to hide, as is, I would pick without torso, doesn't really matter what hair color/age. " P5 indicated that "If its formal, then it needs to look real/natural.</p><p>Suppose Biden was presenting with a funny tiger face then I would be more resistant to watching while if it was comedian using it then I would understand. I think context is important. "</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Potential Uses</head><p>Participants identifed a variety of possible use cases for ASL video anonymization technology. In particular, nearly all participants agreed that the technology would be useful for safely expressing personal views on sensitive or confdential topics. P3 was interested in using it "to avoid being targeted, want it to be anonymous. Some people might want to share important information but don't want to tie it to their identity." P7 wanted to us this technology to "post videos where I say things that I don't want associated with my identity. For example, political, abuse reports, protests, etc. " P10 was interested in using this tool to "share my personal experience or feelings and I didn't want people to know who I am." P2 explained they would use if for a "sticky question. If I was telling a powerful, heavy topic but wanted my identity hidden then I would use this. Mostly for sensitive topics. " The ASL sign STICKY, used by P2 in their response, translates to the English concepts of awkward or embarrassing.</p><p>Participants also identifed uses of this technology on social media, especially when they needed to share information that may be re-shared beyond their own immediate personal network, especially when ASL video would be more efective than text. For instance, P13 discussed sharing ASL lessons anonymously: "I would use it for posting videos that strangers have access to, teaching ASL without revealing my face." P11 discussed social media contexts in which protection of privacy is especially important, e.g., "social media, OnlyFans, anonymous groups, etc.." Participants also discussed uses for this technology on personal social media contexts during fun or casual interaction with people they know. P8 was interested in "entertainment with friends and family, like the gaming community." Similarly, P9 indicated that the transformed videos themselves may be entertaining or fun to share, explaining, "I would also use it for entertainment...with friends, assuming they would not share it publicly. " Finally, participants described contexts in which they would not use this technology, at times disagreeing with uses suggested by other participants. For example, several participants saw no use for this technology when interacting online with family or friends. As P13 explained, "I would not want to use this if I was just talking with friends or people I know and trust. " P14 agreed and extended this to fellow students: "If I was signing on my social media or with friends or the body, while preserving its location and movement. As previously mentioned, some participants suggested virtually changing the body appearance or the clothes of the signer.</p><p>Our fndings revealed that participants viewed the without-torso and tiger-face prototypes as being relatively similar in their degree of anonymity protection, which was striking given that these two tended to occlude or omit opposite portions of the signer's body. That is, the tiger-face blocked the signer's face-whereas, the without-torso videos omitted the signer's torso while conveying the facial expression information on a transformed face. Our qualitative fndings revealed that participants judged the without-torso videos as more understandable; thus, occlusion of the face led to a relatively greater reduction in understandability, for a similar anonymity improvement. Recent work by Bragg et al. <ref type="bibr">[7]</ref> had investigated ASL video anonymization within the context of motivating users to contribute videos to public research datasets; their participants had used the same tiger-face flter and had similar concerns about the negative efect on understandability of the absence of facial expressions.</p><p>For designers creating face-transformation applications, sensitivity to this understandability vs. anonymity trade-of is essential. While it would be ideal for the underlying transformation technology to achieve both high understandability and high anonymity (perhaps as further advances in face and body modifcation technology are created), in the meantime designers might consider ofering users choices in transformation options that vary along this trade-of axis. For evaluation of these applications in studies, it is important for both properties to be measured, in relation to intended use cases, to avoid optimizing for one at the expense of the other.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Naturalness vs. Anonymity: Design Considerations</head><p>Participants indicated that it was important for videos to appear natural; however, our analysis revealed that there was a trade-of between naturalness and anonymity protection. Unanimously, our participants indicated that the with-torso videos were the most natural, yet these videos had weaker anonymity protection, as details of the signer's body and background were visible. In contrast, our qualitative analysis revealed that participants believed the without-torso and tiger-face videos were better at protecting anonymity, yet both of these had much lower levels of naturalness, due to the unfamiliar appearance of the torso being cut out of the video or the artifcial animal face.</p><p>For individuals interested in disguising themselves, a decision must be made about where on this naturalness vs.</p><p>anonymity trade-of the user would prefer for their video to be. This decision may depend upon the context of use, and designers creating face-disguise applications may wish to provide users with options that vary along this axis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Understandability vs. Naturalness: Design Considerations</head><p>Whereas the discussion above identifed trade-ofs between naturalness vs. anonymity and understandability vs.</p><p>anonymity, our fndings revealed a complementary relationship between understandability and naturalness. Participants discussed how improvements in naturalness led to increased understandability, explaining that unnatural appearance could be distracting, which would draw attention away from the message. For designers of transformation technologies for face disguise applications, this relationship is important to consider when making improvements to the technology. In eforts to achieve increases in the understandability of the resulting video, it is important to ensure a baseline level of naturalness, to avoid interfering with the viewer's ability to focus on the message.</p><p>While there are relationships among these factors, the signer's intended usage of this technology is likely to infuence how these factors are prioritized. Before seeing transformed videos of themselves, participants focused on the perspective of people viewing videos of other people who have been disguised, and understandability was seen as being of greater importance so that the message could to be understood. After seeing videos of themselves transformed, Recent work has investigated applications of body-swap illusions in virtual reality <ref type="bibr">[41]</ref>, with users' new appearance leading to changes in behavior <ref type="bibr">[58]</ref>. Our study did not examine whether signers might change their signing content or style if they were to see their own face transformed in real time; future work is needed to investigate this.</p><p>Finally, the with-torso and without-torso prototypes in our study were based on modern face transformation technologies, of which the state of the art is rapidly advancing. Future research is needed to understand users' perspectives of these technologies as they improve over time. In fact, our work should inform the work of future designers of such technology and of researchers creating the underlying disguise technologies, as we discussed in section 5. In particular, our research has motivated future work on technology for disguising not only the face of a signer but also their body-to better protect anonymity-while also preserving body location and movement, which contribute to the understandability and naturalness of the resulting video.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">ACKNOWLEDGMENTS</head><p>This material is based upon work supported by the National Science Foundation under award No. 2040638.</p></div></body>
		</text>
</TEI>
