<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>I Don't Know Why You Need My Data: A Case Study of Popular Social Media Privacy Policies</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/14/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10353978</idno>
					<idno type="doi">10.1145/3508398.3519359</idno>
					<title level='j'>ACM CODASPY 2022</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>E. Miller</author><author>R. Rahman Md</author><author>M. Hossain</author><author>A. Ali-Gombe</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Data privacy, a critical human right, is gaining importance as new technologies are developed, and the old ones evolve. In mobile platforms such as Android, data privacy regulations require developers to communicate data access requests using privacy policy statements (PPS). This case study cross-examines the PPS in popular social media (SM) apps---Facebook and Twitter---for features of language ambiguity, sensitive data requests, and whether the statements tally with the data requests made in the Manifest file. Subsequently, we conduct a comparative analysis between the PPS of these two apps to examine trends that may constitute a threat to user data privacy.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>As of 2021, there are approximately 2.8 billion Android device users in the world with 2.56 million apps available to download via the Google Play store and the most popular of these applications are social media apps. In the United States, 82% of the population has a social networking profile <ref type="bibr">[3]</ref>. With a significant percentage of people using social media applications, user privacy has become an ever-increasing concern <ref type="bibr">[1]</ref>. Regulations &#208; e.g., the European General Data Protection Regulation (GDPR) <ref type="bibr">[4]</ref>, and the California Consumer Privacy Act (CCPA) <ref type="bibr">[2]</ref> &#208; have been put in place to address these privacy concerns and guarantee that users provide informed consent to these social media apps requesting the usage of their data. These regulations mandate that a data request must be Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CODASPY '22, April 24&#347;27, 2022, Baltimore, MD, USA &#169; 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9220-4/22/04. <ref type="url">https://doi.org/10.1145/3508398.3519359</ref> made unambiguous. More importantly, the type of data, the reason for the request, and in some cases, the purpose limitation must be stated and approved by the user ahead of time. The permission model is a dedicated system in the Android framework that ensures users give explicit access to their personal or device data. Unfortunately, in its current design, this model does not address why the data is requested, its destination, and with whom it would be shared. While an improved version of this model designed to address the permission intent <ref type="bibr">[5]</ref> has been proposed in the literature, it is not yet adopted into the Android system. Thus, for developers to comply with the stated regulations, they often leverage the combination of this permission model in conjunction with a privacy policies statement (PPS). However, given the lack of standardization in PPS, many developers have resorted to exploiting these contracts using vague and ambiguous legal jargon to request data access and declare reason and sharing limitations.</p><p>Thus, the fundamental goal of our research is to determine how comprehensible various social media privacy policies are. To evaluate this, we investigate the vagueness and language ambiguity of PPS in Facebook and Twitter apps. Our study examines: (1) whether these apps clearly and unambiguously ask for user permission in the PPS and the level of sensitivity of requested data, (2) whether the data requested in the PPS tallies with the explicit data requests made during execution, (3) a comparative analysis of the PPS of these two apps to identify trends in vagueness and sensitivity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methodology</head><p>In this research, we leveraged case study methodology to examine the PPS and permission list of Facebook and Twitter directly from Google play. Using a four-step process, we manually examine every statement in the PPS for our target apps. (I) Language Extraction -The first step is to manually read through the PPS for Facebook and Twitter and look for user data request statements called the candidate statements. We defined candidate statements as statements that contain three primary elements: i) a focused verb representing the data access action, e.g., transfer, obtain, etc. ii) a noun that identifies the type of data being accessed, and iii) a description of how the app will use the specified data type. An example of a data request statement is "We use your location data to recommend restaurants near you. " In this example, the requesting verb is use which shows that the app is accessing user information. The data type in this example is location data, while to recommend restaurants near you describes how the app plans to use the data. All candidate statements from each PPS are manually extracted, deconstructed, and recorded in the Results_Table 1 using this verb-data-purpose mapping technique. (II) Data Clustering -We use data clustering to organize the Results_Table from task 1 and group synonymous data types. For Poster Session I CODASPY '22, April 24-27, 2022, Baltimore, MD, USA</p><p>&#8226; 2.9% of statements did not ask for user permission and were vague. &#8226; 2% of statements fit all 3 categories-vague, sensitive, and did not ask for user permission.</p><p>Twitter -For Twitter, a total of 107 privacy statements were reviewed. From these statements we found that:</p><p>&#8226; 22.43% of statements were flagged as vague. &#347; 54% of vague statements were flagged as vague due to the possible usage. &#347; 46% of vague statements were flagged as vague due to the data type. &#8226; 60% of statements involved sensitive data types (PII and none-PII). &#8226; 8.4% of statements did not ask for permission via the Android permission model.</p><p>Similar to Facebook, we analyze the overlap of all three categories, as shown in the center of Figure <ref type="figure">2</ref>. The results indicate that:</p><p>&#8226; 3% of statements were both vague and strictly involved sensitive user data. &#8226; 7% of statements did not ask for user permission and involved sensitive user data. &#8226; 0% of statements did not ask for user permission and were vague. &#8226; 0% of statements fit all 3 categories-vague, sensitive, and did not ask for user permission.</p><p>Unknown Sensitivity and Permissions -It is important to note that for statements flagged as &#322;vague and sensitive&#382; or &#322;vague and does not ask permission,&#382; we only included those statements for which the sensitivity and permissions could be confirmed were included. For instance, in a statement such as &#322;We collect your information to personalize our services for you,&#382; the identified data type is &#322;your information.&#382; In this case, we do not know for certain what information is being collected. Therefore, we cannot determine if this information is sensitive or if the app is asking permission for this data. Such statements are flagged as &#322;vague&#382; but cannot be flagged as &#322;sensitive&#382; or &#322;does not ask permission .&#382; Thus, we put this statement in a separate category -Unknown Sensitivity and Permissions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Comparative Analysis</head><p>Examining the results of Facebook and Twitter side-by-side, we found that both apps requested a large number of sensitive data such as (payment information, IP address, location, account information/password, contact information). The results indicate that Twitter requested sensitive data in 64 statements compared to Facebook's 80 statements. More so, Facebook PPS has a higher percentage of vague statements (50%) than Twitter (22%). These vague statements correspond to more than half of all the candidate statements examined for Facebook (&#8776;103), with more than 42 data ambiguity statements. In contrast, Twitter recorded 22 vague statements, with about 10 data ambiguity statements. It is also important to note that, for both the two apps, usage ambiguity takes the higher percentages (60% and 54%), thus indicative that apps seldom provide reasons for data requests to the user. Additionally, comparing the number of vague statements requesting sensitive data, we found that Facebook (10%) is again higher than Twitter (3%). This percentage shows that more than 20 of all the candidate's statements for Facebook requested both sensitive data and are vague in specifying why the data is asked (ambiguity of usage). On the other hand, Twitter has seven statements that did not ask for user permission and involved sensitive data compared to Facebook's four statements. Another notable distinction between these two apps is that 2% of Facebook's PPS intersected in the three categories. Twitter's PPS, on the other hand, did not contain any such statements. Finally, we found that about 22% of all the candidate statements analyzed for Facebook fall into the unknown sensitivity and permission category. For Twitter, roughly about 13% falls into this category. As a result, the percentages of "sensitive and vague" and "vague, and do not ask permission" statements are equivalent to or higher than the percentages we reported.</p><p>Thus, our findings from this study indicate that Facebook has more ambiguous statements that lack clarity both in terms of data requests and usage in its privacy policy statement. The candidate statements, especially those that fall into two and three-category overlap, need to be carefully reviewed by the developers. Future Work -This study is limited to exploring the PPS of only two apps. Although these apps are the two most popular SM apps, they are not good representatives of the population. Thus, we plan to extend this research to include more SM applications as part of future work. In addition, we plan to manually generate a large corpus of deconstructed PPS that will enable us to leverage NLP for the automated detection of ambiguity and policy vagueness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>This study explored the level of ambiguity, sensitivity, and whether built-in PPS tally with the runtime permission requests for user data in two SM apps. Our results showed a significant portion of all the analyzed PPS statements for Twitter and Facebook requests for "very sensitive" user data. We also demonstrated that more than half of all the PPS analyzed, especially for Facebook, have some form of data or usage ambiguity. Of those analyzed statements, a substantial percentage falls into the intersection of vagueness (usage ambiguity) and sensitive data, thus indicative that users are not provided with clear and informed consent, thereby posing a potential threat to their privacy. Finally, an important finding in this study is that both apps have a substantial number of statements that fall into the unknown sensitivity and permissions category, which warrants further investigation.</p></div>		</body>
		</text>
</TEI>
