Abstract Over half of all visits to websites now take place in a mobile browser, yet the majority of web privacy studies take the vantage point of desktop browsers, use emulated mobile browsers, or focus on just a single mobile browser instead. In this paper, we present a comprehensive web-tracking measurement study on mobile browsers and privacy-focused mobile browsers. Our study leverages a new web measurement infrastructure, OmniCrawl, which we develop to drive browsers on desktop computers and smartphones located on two continents. We capture web tracking measurements using 42 different non-emulated browsers simultaneously. We find that the third-party advertising and tracking ecosystem of mobile browsers is more similar to that of desktop browsers than previous findings suggested. We study privacy-focused browsers and find their protections differ significantly and in general are less for lower-ranked sites. Our findings also show that common methodological choices made by web measurement studies, such as the use of emulated mobile browsers and Selenium, can lead to website behavior that deviates from what actual users experience.
more »
« less
A Comparative Measurement Study of Web Tracking on Mobile and Desktop Environments
Abstract Web measurement is a powerful approach to studying various tracking practices that may compromise the privacy of millions of users. Researchers have built several measurement frameworks and performed a few studies to measure web tracking on the desktop environment. However, little is known about web tracking on the mobile environment, and no tool is readily available for performing a comparative measurement study on mobile and desktop environments. In this work, we built a framework called WTPatrol that allows us and other researchers to perform web tracking measurement on both mobile and desktop environments. Using WTPatrol, we performed the first comparative measurement study of web tracking on 23,310 websites that have both mobile version and desktop version web-pages. We conducted an in-depth comparison of the web tracking practices of those websites between mobile and desktop environments from two perspectives: web tracking based on JavaScript APIs and web tracking based on HTTP cookies. Overall, we found that mobile web tracking has its unique characteristics especially due to mobile-specific trackers, and it has become increasingly as prevalent as desktop web tracking. However, the potential impact of mobile web tracking is more severe than that of desktop web tracking because a user may use a mobile device frequently in different places and be continuously tracked. We further gave some suggestions to web users, developers, and researchers to defend against web tracking.
more »
« less
- Award ID(s):
- 1936968
- PAR ID:
- 10175641
- Date Published:
- Journal Name:
- Proceedings on Privacy Enhancing Technologies
- Volume:
- 2020
- Issue:
- 2
- ISSN:
- 2299-0984
- Page Range / eLocation ID:
- 24 to 44
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Online trackers are invasive as they track our digital footprints, many of which are sensitive in nature, and when aggregated over time, they can help infer intricate details about our lifestyles and habits. Although much research has been conducted to understand the effectiveness of existing countermeasures for the desktop platform, little is known about how mobile browsers have evolved to handle online trackers. With mobile devices now generating more web traffic than their desktop counterparts, we fill this research gap through a large-scale comparative analysis of mobile web browsers. We crawl 10K valid websites from the Tranco list on real mobile devices. Our data collection process covers both popular generic browsers (e.g., Chrome, Firefox, and Safari) as well as privacy-focused browsers (e.g., Brave, Duck Duck Go, and Firefox-Focus). We use dynamic analysis of runtime execution traces and static analysis of source codes to highlight the tracking behavior of invasive fingerprinters. We also find evidence of tailored content being served to different browsers. In particular, we note that Firefox Focus sees altered script code, whereas Brave and Duck Duck Go have highly similar content. To test the privacy protection of browsers, we measure the responses of each browser in blocking trackers and advertisers and note the strengths and weaknesses of privacy browsers. To establish ground truth, we use well-known block lists, including EasyList, EasyPrivacy, Disconnect and WhoTracksMe and find that Brave generally blocks the highest number of content that should be blocked as per these lists. Focus performs better against social trackers, and Duck Duck Go restricts third-party trackers that perform email-based tracking.more » « less
-
Web forms are one of the primary ways to collect personal information online, yet they are relatively under-studied. Unlike web tracking, data collection through web forms is explicit and contextualized. Users (i) are asked to input specific personal information types, and (ii) know the specific context (i.e., on which website and for what purpose). For web forms to be trusted by users, they must meet the common sense standards of appropriate data collection practices within a particular context (i.e., privacy norms). In this paper, we extract the privacy norms embedded within web forms through a measurement study. First, we build a specialized crawler to discover web forms on websites. We run it on 11,500 popular websites, and we create a dataset of 293K web forms. Second, to process data of this scale, we develop a cost-efficient way to annotate web forms with form types and personal information types, using text classifiers trained with assistance of large language models (LLMs). Third, by analyzing the annotated dataset, we reveal common patterns of data collection practices. We find that (i) these patterns are explained by functional necessities and legal obligations, thus reflecting privacy norms, and that (ii) deviations from the observed norms often signal unnecessary data collection. In addition, we analyze the privacy policies that accompany web forms. We show that, despite their wide adoption and use, there is a disconnect between privacy policy disclosures and the observed privacy norms.more » « less
-
Advertisements have become commonplace on modern websites. While ads are typically designed for visual consumption, it is unclear how they affect blind users who interact with the ads using a screen reader. Existing research studies on non-visual web interaction predominantly focus on general web browsing; the specific impact of extraneous ad content on blind users’ experience remains largely unexplored. To fill this gap, we conducted an interview study with 18 blind participants; we found that blind users are often deceived by ads that contextually blend in with the surrounding web page content. While ad blockers can address this problem via a blanket filtering operation, many websites are increasingly denying access if an ad blocker is active. Moreover, ad blockers often do not filter out internal ads injected by the websites themselves. Therefore, we devised an algorithm to automatically identify contextually deceptive ads on a web page. Specifically, we built a detection model that leverages a multi-modal combination of handcrafted and automatically extracted features to determine if a particular ad is contextually deceptive. Evaluations of the model on a representative test dataset and ‘in-the-wild’ random websites yielded F1 scores of 0.86 and 0.88, respectively.more » « less
-
null (Ed.)Illicit website owners frequently rely on traffic distribution systems (TDSs) operated by less-than-scrupulous advertising networks to acquire user traffic. While researchers have described a number of case studies on various TDSs or the businesses they serve, we still lack an understanding of how users are differentiated in these ecosystems, how different illicit activities frequently leverage the same advertisement networks and, subsequently, the same malicious advertisers. We design ODIN (Observatory of Dynamic Illicit ad Networks), the first system to study cloaking, user differentiation and business integration at the same time in four different types of traffic sources: typosquatting, copyright-infringing movie streaming, ad-based URL shortening, and illicit online pharmacy websites. ODIN performed 874,494 scrapes over two months (June 19, 2019–August 24, 2019), posing as six different types of users (e.g., mobile, desktop, and crawler) and accumulating over 2TB of data. We observed 81% more malicious pages compared to using only the best performing crawl profile by itself. Three of the traffic sources we study redirect users to the same traffic broker domain names up to 44% of the time and all of them often expose users to the same malicious advertisers. Our experiments show that novel cloaking techniques could decrease by half the number of malicious pages observed. Worryingly, popular blacklists do not just suffer from the lack of coverage and delayed detection, but miss the vast majority of malicious pages targeting mobile users. We use these findings to design a classifier, which can make precise predictions about the likelihood of a user being redirected to a malicious advertiser.more » « less
An official website of the United States government

