skip to main content


Title: Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text
Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill necessary to discover them. We describe a method for the automated detection of opt-out choices in privacy policy text and their presentation to users through a web browser extension. We describe the creation of two corpora of opt-out choices, which enable the training of classifiers to identify opt-outs in privacy policies. Our overall approach for extracting and classifying opt-out choices combines heuristics to identify commonly found opt-out hyperlinks with supervised machine learning to automatically identify less conspicuous instances. Our approach achieves a precision of 0.93 and a recall of 0.9. We introduce Opt-Out Easy, a web browser extension designed to present available opt-out choices to users as they browse the web. We evaluate the usability of our browser extension with a user study. We also present results of a large-scale analysis of opt-outs found in the text of thousands of the most popular websites.  more » « less
Award ID(s):
1914486
NSF-PAR ID:
10169862
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
WWW '20: Proceedings of the Web Conference 2020
Page Range / eLocation ID:
1943 to 1954
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mazurek, Michelle ; Sher, Micah (Ed.)

    Web tracking by ad networks and other data-driven businesses is often privacy-invasive. Privacy laws, such as the California Consumer Privacy Act, aim to give people more control over their data. In particular, they provide a right to opt out from web tracking via privacy preference signals, notably Global Privacy Control (GPC). GPC holds the promise of enabling people to exercise their opt out rights on the web. Broad adoption of GPC hinges on its usability. In a usability survey we find that 94% of the participants would turn on GPC indicating a need for such efficient and effective opt out mechanism. 81% of the participants in our survey also have a correct understanding of what GPC does ensuring that their intent is accurately represented by their choice. The effectiveness of GPC is dependent on whether websites' GPC compliance can be enforced. A site's GPC compliance can be analyzed based on privacy flags, such as the US Privacy String, which is used on many sites to indicate the opt out status of a web user. Leveraging the US Privacy String for GPC purposes we implement a proof-of-concept browser extension that successfully and correctly analyzes sites' GPC compliance at a rate of 89%. We further implement a web crawler for our browser extension demonstrating that our analysis approach is scalable. We find that many sites do not respect GPC opt out signals despite being legally obligated to do so. Only 54/464 (12%) sites with a US Privacy String opt out users after having received a GPC signal.

     
    more » « less
  2. null (Ed.)
    Increasingly, icons are being proposed to concisely convey privacy-related information and choices to users. However, complex privacy concepts can be difficult to communicate. We investigate which icons effectively signal the presence of privacy choices. In a series of user studies, we designed and evaluated icons and accompanying textual descriptions (link texts) conveying choice, opting-out, and sale of personal information — the latter an opt-out mandated by the California Consumer Privacy Act (CCPA). We identified icon-link text pairings that conveyed the presence of privacy choices without creating misconceptions, with a blue stylized toggle icon paired with “Privacy Options” performing best. The two CCPA-mandated link texts (“Do Not Sell My Personal Information” and “Do Not Sell My Info”) accurately communicated the presence of do-not-sell opt-outs with most icons. Our results provide insights for the design of privacy choice indicators and highlight the necessity of incorporating user testing into policy making. 
    more » « less
  3. Shafiq, Zubair ; Sherr, Micah (Ed.)
    The California Consumer Privacy Act and other privacy laws give people a right to opt out of the sale and sharing of personal information. In combination with privacy preference signals, especially, Global Privacy Control (GPC), such rights have the potential to empower people to assert control over their data. However, many laws prohibit opt out settings being turned on by default. The resulting usability challenges for people to exercise their rights motivate generalizable active privacy choice — an interface design principle to make opt out settings usable without defaults. It is based on the idea of generalizing one individual opt out choice towards a larger set of choices. For example, people may apply an opt out choice on one site towards a larger set of sites. We explore generalizable active privacy choice in the context of GPC. We design and implement nine privacy choice schemes in a browser extension and explore them in a usability study with 410 participants. We find that generalizability features tend to decrease opt out utility slightly. However, at the same time, they increase opt out efficiency and make opting out less disruptive, which was more important to most participants. For the least disruptive scheme, selecting website categories to opt out from, 98% of participants expressed not feeling disrupted, a 40% point increase over the baseline schemes. 83% of participants understood the meaning of GPC. They also made their opt out choices with intent and, thus, in a legally relevant manner. To help people exercise their opt out rights via GPC our results support the adoption of a generalizable active privacy choice interface in web browsers. 
    more » « less
  4. null (Ed.)
    Increasingly, icons are being proposed to concisely convey privacyrelated information and choices to users. However, complex privacy concepts can be difcult to communicate. We investigate which icons efectively signal the presence of privacy choices. In a series of user studies, we designed and evaluated icons and accompanying textual descriptions (link texts) conveying choice, opting-out, and sale of personal information — the latter an opt-out mandated by the California Consumer Privacy Act (CCPA). We identifed icon-link text pairings that conveyed the presence of privacy choices without creating misconceptions, with a blue stylized toggle icon paired with “Privacy Options” performing best. The two CCPA-mandated link texts (“Do Not Sell My Personal Information” and “Do Not Sell My Info”) accurately communicated the presence of do-notsell opt-outs with most icons. Our results provide insights for the design of privacy choice indicators and highlight the necessity of incorporating user testing into policy making. 
    more » « less
  5. Development of a comprehensive legal privacy framework in the United States should be based on identification of the common deficiencies of privacy policies. We attempt to delineate deficiencies by critically analyzing the privacy policies of mobile apps, application suites, social networks, Internet Service Providers, and Internet-of-Things devices. Whereas many studies have examined readability of privacy policies, few have specifically identified the information that should be provided in privacy policies but is not. Privacy legislation invariably starts a definition of personally identifiable information. We find that privacy policies’ definitions of personally identifiable information are far too restrictive, excluding information that does not itself identify a person but which can be used to reasonably identify a person, and excluding information paired with a device identifier which can be reasonably linked to a person. Legislation should define personally identifiable information to include such information, and should differentiate between information paired with a name versus information paired with a device identifier. Privacy legislation often excludes anonymous and de-identified information from notice and choice requirements. We find that privacy policies’ descriptions of anonymous and de-identified information are far too broad, including information paired with advertising identifiers. Computer science has repeatedly demonstrated that such information is reasonably linkable. Legislation should define these categories of information to align with technological abilities. Legislation should also not exempt de-identified information from notice requirements, to increase transparency. Privacy legislation relies heavily on notice requirements. We find that, because privacy policies’ disclosures of the uses of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are used for which purposes. Often, we cannot determine whether location or web browsing history is used solely for functional purposes or also for advertising. Legislation should require the disclosure of the purposes for each type of personal information collected. We also find that, because privacy policies disclosures of sharing of personal information are disconnected from their disclosures about the types of personal information collected, we are often unable to determine which types of information are shared. Legislation should require the disclosure of the types of personal information shared. Finally, privacy legislation relies heavily on user choice. We find that free services often require the collection and sharing of personal information. As a result, users often have no choices. We find that whereas some paid services afford users a wide variety of choices, paid services in less competitive sectors often afford users few choices over use and sharing of personal information for purposes unrelated to the service. As a result, users are often unable to dictate which types of information they wish to allow to be shared, and which types they wish to allow to be used for advertising. Legislation should differentiate between take-it-or-leave it, opt-out, and opt-in approaches based on the type of use and on whether the information is shared. Congress should consider whether user choices should be affected by the presence of market power. 
    more » « less