Today’s mobile apps employ third-party advertising and tracking (A&T) libraries, which may pose a threat to privacy. State-of-the-art detects and blocks outgoing A&T HTTP/S requests by using manually curated filter lists (e.g. EasyList), and recently, using machine learning approaches. The major bottleneck of both filter lists and classifiers is that they rely on experts and the community to inspect traffic and manually create filter list rules that can then be used to block traffic or label ground truth datasets. We propose NoMoATS – a system that removes this bottleneck by reducing the daunting task of manually creating filter rules, to the much easier and scalable task of labeling A&T libraries. Our system leverages stack trace analysis to automatically label which network requests are generated by A&T libraries. Using NoMoATS, we collect and label a new mobile traffic dataset. We use this dataset to train decision tree classifiers, which can be applied in real-time on the mobile device and achieve an average F-score of 93%. We show that both our automatic labeling and our classifiers discover thousands of requests destined to hundreds of different hosts, previously undetected by popular filter lists. To the best of our knowledge, our system is the first to (1) automatically label which mobile network requests are engaged in A&T, while requiring to only manually label libraries to their purpose and (2) apply on-device machine learning classifiers that operate at the granularity of URLs, can inspect connections across all apps, and detect not only ads, but also tracking.
more »
« less
NoMoAds: Effective and Efficient Cross-App Mobile Ad-Blocking (The Andreas Pfitzmann Best Student Paper Award)
Although advertising is a popular strategy for mobile app monetization, it is often desirable to block ads in order to improve usability, performance, privacy, and security. In this paper, we propose NoMoAds to block ads served by any app on a mobile device. NoMoAds leverages the network interface as a universal vantage point: it can intercept, inspect, and block outgoing packets from all apps on a mobile device. NoMoAds extracts features from packet headers and/or payload to train machine learning classifiers for detecting ad requests. To evaluate NoMoAds, we collect and label a new dataset using both EasyList and manually created rules. We show that NoMoAds is effective: it achieves an F-score of up to 97.8% and performs well when deployed in the wild. Furthermore, NoMoAds is able to detect mobile ads that are missed by EasyList (more than one-third of ads in our dataset). We also show that NoMoAds is efficient: it performs ad classification on a per-packet basis in real-time. To the best of our knowledge, NoMoAds is the first mobile ad-blocker to effectively and efficiently block ads served across all apps using a machine learning approach.
more »
« less
- Award ID(s):
- 1649372
- PAR ID:
- 10077006
- Date Published:
- Journal Name:
- Proceedings of the Privacy Enhancing Technologies Symposium (PETS)
- Volume:
- 2018
- Issue:
- 4
- Page Range / eLocation ID:
- 125–140
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Machine learning-based malware detection systems are often vulnerable to evasion attacks, in which a malware developer manipulates their malicious software such that it is misclassified as benign. Such software hides some properties of the real class or adopts some properties of a different class by applying small perturbations. A special case of evasive malware hides by repackaging a bonafide benign mobile app to contain malware in addition to the original functionality of the app, thus retaining most of the benign properties of the original app. We present a novel malware detection system based on metamorphic testing principles that can detect such benign-seeming malware apps. We apply metamorphic testing to the feature representation of the mobile app, rather than to the app itself. That is, the source input is the original feature vector for the app and the derived input is that vector with selected features removed. If the app was originally classified benign, and is indeed benign, the output for the source and derived inputs should be the same class, i.e., benign, but if they differ, then the app is exposed as (likely) malware. Malware apps originally classified as malware should retain that classification, since only features prevalent in benign apps are removed. This approach enables the machine learning model to classify repackaged malware with reasonably few false negatives and false positives. Our training pipeline is simpler than many existing ML-based malware detection methods, as the network is trained end-to-end to jointly learn appropriate features and to perform classification. We pre-trained our classifier model on 3 million apps collected from the widely-used AndroZoo dataset. 1 We perform an extensive study on other publicly available datasets to show our approach’s effectiveness in detecting repackaged malware with more than 94% accuracy, 0.98 precision, 0.95 recall, and 0.96 F1 score.more » « less
-
Mobile apps nowadays are often packaged with third-party ad libraries to monetize user data. Many mobile ad networks exploit these mobile apps to extract sensitive real-time geographical data about the users for location-based targeted advertising. However, the massive collection of sensitive information by the ad networks has raised serious privacy concerns. Unfortunately, the extent and granularity of private data collection of the location-based ad networks remain obscure. In this work, we present a mobile tracking measurement study to characterize the severity and significance of location-based private data collection in mobile ad networks, by using an automated fine-grained data collection instrument running across different geographical areas. We perform extensive threat assessments for different ad networks using 1,100 popular apps running across 10 different cities. This study discovers that the number of location-based ads tend to be positively correlated with the population density of locations, ad networks' data collection behaviors differ across different locations, and most ad networks are capable of collecting precise location data. Detailed analysis further reveals the significant impact of geolocation on the tracking behavior of targeted ads, and a noteworthy security concern for advertising organizations to aggregate different types of private user data across multiple apps for a better targeted ad experience.more » « less
-
Abstract Small‐to‐medium businesses are always seeking affordable ways to advertise their products and services securely. With the emergence of mobile technology, it is possible than ever to implement innovative Location‐Based Advertising (LBS) systems using smartphones that preserve the privacy of mobile users. In this paper, we present a prototype implementation of such systems by developing a distributed privacy‐preserving system, which has parts executing on smartphones as a mobile app, as well as a web‐based application hosted on the cloud. The mobile app leverages Google Maps libraries to enhance the user experience in using the app. Mobile users can use the app to commute to their daily destinations while viewing relevant ads such as job openings in their neighborhood, discounts on favorite meals, etc. We developed a client‐server privacy architecture that anonymizes the mobile user trajectories using a bounded perturbation strategy. A multi‐modal sensing approach is proposed for modeling the context switching of the developed LBS system, which we represent as a Finite State Machine model. The multi‐modal sensing approach can reduce the power consumed by mobile devices by automatically detecting sensing mode changes to avoid unnecessary sensing. The developed LBS system is organized into two parts: the business side and the user side. First, the business side allows business owners to create new ads by providing the ad details, Geo‐location, photos, and any other instructions. Second, the user side allows mobile users to navigate through the map to see ads while walking, driving, bicycling, or quietly sitting in their offices. Experimental results are presented to demonstrate the scalability and performance of the mobile side. Our experimental evaluation demonstrates that the mobile app incurs low processing overhead and consequently has a small energy footprint.more » « less
-
Adblocking relies on filter lists, which are manually curated and maintained by a community of filter list authors. Filter list curation is a laborious process that does not scale well to a large number of sites or over time. In this paper, we introduce AutoFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation for sites of interest. We design an algorithm based on multi-arm bandits to generate filter rules that block ads while controlling the trade-off between blocking ads and avoiding visual breakage. We test AutoFR on thousands of sites and we show that it is efficient: it takes only a few minutes to generate filter rules for a site of interest. AutoFR is effective: it generates filter rules that can block 86% of the ads, as compared to 87% by EasyList, while achieving comparable visual breakage. Furthermore, AutoFR generates filter rules that generalize well to new sites. We envision that AutoFR can assist the adblocking community in filter rule generation at scale.more » « less
An official website of the United States government

