skip to main content


This content will become publicly available on September 28, 2024

Title: A Tale of Two Communities: Privacy of Third Party App Users in Crowdsourcing - The Case of Receipt Transcription

Mobile and web apps are increasingly relying on the data generated or provided by users such as from their uploaded documents and images. Unfortunately, those apps may raise significant user privacy concerns. Specifically, to train or adapt their models for accurately processing huge amounts of data continuously collected from millions of app users, app or service providers have widely adopted the approach of crowdsourcing for recruiting crowd workers to manually annotate or transcribe the sampled ever-changing user data. However, when users' data are uploaded through apps and then become widely accessible to hundreds of thousands of anonymous crowd workers, many human-in-the-loop related privacy questions arise concerning both the app user community and the crowd worker community. In this paper, we propose to investigate the privacy risks brought by this significant trend of large-scale crowd-powered processing of app users' data generated in their daily activities. We consider the representative case of receipt scanning apps that have millions of users, and focus on the corresponding receipt transcription tasks that appear popularly on crowdsourcing platforms. We design and conduct an app user survey study (n=108) to explore how app users perceive privacy in the context of using receipt scanning apps. We also design and conduct a crowd worker survey study (n=102) to explore crowd workers' experiences on receipt and other types of transcription tasks as well as their attitudes towards such tasks. Overall, we found that most app users and crowd workers expressed strong concerns about the potential privacy risks to receipt owners, and they also had a very high level of agreement with the need for protecting receipt owners' privacy. Our work provides insights on app users' potential privacy risks in crowdsourcing, and highlights the need and challenges for protecting third party users' privacy on crowdsourcing platforms. We have responsibly disclosed our findings to the related crowdsourcing platform and app providers.

 
more » « less
Award ID(s):
2246143
NSF-PAR ID:
10488757
Author(s) / Creator(s):
; ;
Publisher / Repository:
Association for Computing Machinery
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
7
Issue:
CSCW2
ISSN:
2573-0142
Page Range / eLocation ID:
1 to 43
Subject(s) / Keyword(s):
["Privacy","App User","Crowdsourcing","Receipt Transcription"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Residential proxy has emerged as a service gaining popularity recently, in which proxy providers relay their customers’ network traffic through millions of proxy peers under their control. We find that many of these proxy peers are mobile devices, whose role in the proxy network can have significant security implications since mobile devices tend to be privacy and resource-sensitive. However, little effort has been made so far to understand the extent of their involvement, not to mention how these devices are recruited by the proxy network and what security and privacy risks they may pose. In this paper, we report the first measurement study on the mobile proxy ecosystem. Our study was made possible by a novel measurement infrastructure, which enabled us to identify proxy providers, to discover proxy SDKs (software development kits), to detect Android proxy apps built upon the proxy SDKs, to harvest proxy IP addresses, and to understand proxy traffic. The information collected through this infrastructure has brought to us new understandings of this ecosystem and important security discoveries. More specifically, 4 proxy providers were found to offer app developers mobile proxy SDKs as a competitive app monetization channel, with $50K per month per 1M MAU (monthly active users). 1,701 Android APKs (belonging to 963 Android apps) turn out to have integrated those proxy SDKs, with most of them available on Google Play with at least 300M installations in total. Furthermore, 48.43% of these APKs are flagged by at least 5 anti-virus engines as malicious, which could explain why 86.60% of the 963 Android apps have been removed from Google Play by Oct 2019. Besides, while these apps display user consent dialogs on traffic relay, our user study indicates that the user consent texts are quite confusing. We even discover a proxy SDK that stealthily relays traffic without showing any notifications. We also captured 625K cellular proxy IPs, along with a set of suspicious activities observed in proxy traffic such as ads fraud. We have reported our findings to affected parties, offered suggestions, and proposed the methodologies to detect proxy apps and proxy traffic. 
    more » « less
  2. Witnessing the blooming adoption of push notifications on mobile devices, this new message delivery paradigm has become pervasive in diverse applications. Accompanying with its broad adoption, the potential security risks and privacy exposure issues raise public concerns regarding its great social impacts. This paper conducts the first attempt to exploit the mobile notification ecosystem. By dissecting its structural elements and implementation process, a comprehensive vulnerability analysis is conducted towards the complete flow of mobile notification from platform enrollment to messaging. Meanwhile, for privacy exposure, we first examine the implementation of privacy policy compliance by proposing a three-level inspection approach to guide our analysis. Then, our top-down methods from documentation analysis, application network traffic study, to static analysis expose the illicit data collection behaviors in released applications. In addition, we uncover the potential privacy inference resulted from the notification monitoring. To support our analysis, we conduct empirical studies on 12 most popular notification platforms and perform static analysis over 30,000+ applications. We discover: 1) six platforms either provide ambiguous KEY naming rules or offer vulnerable messaging APIs; 2) privacy policy compliance implementations are either stagnated at the documentation stages (8 of 12 platforms) or never implemented in apps, resulting in billions of users suffering from privacy exposure; and 3) some apps can stealthily monitor notification messages delivering to other apps, potentially incurring user privacy inference risks. Our study raises the urgent demand for better regulations of mobile notification deployment. 
    more » « less
  3. Online app search optimization (ASO) platforms that provide bulk installs and fake reviews for paying app developers in order to fraudulently boost their search rank in app stores, were shown to employ diverse and complex strategies that successfully evade state-of-the-art detection methods. In this paper we introduce RacketStore, a platform to collect data from Android devices of participating ASO providers and regular users, on their interactions with apps which they install from the Google Play Store. We present measurements from a study of 943 installs of RacketStore on 803 unique devices controlled by ASO providers and regular users, that consists of 58,362,249 data snapshots collected from these devices, the 12,341 apps installed on them and their 110,511,637 Google Play reviews. We reveal significant differences between ASO providers and regular users in terms of the number and types of user accounts registered on their devices, the number of apps they review, and the intervals between the installation times of apps and their review times. We leverage these insights to introduce features that model the usage of apps and devices, and show that they can train supervised learning algorithms to detect paid app installs and fake reviews with an F1-measure of 99.72% (AUC above 0.99), and detect devices controlled by ASO providers with an F1-measure of 95.29% (AUC = 0.95). We discuss the costs associated with evading detection by our classifiers and also the potential for app stores to use our approach to detect ASO work with privacy. 
    more » « less
  4. In recent years, mobile apps have become the infrastructure of many popular Internet services. It is now fairly common that a mobile app serves a large number of users across the globe. Different from web- based services whose important program logic is mostly placed on remote servers, many mobile apps require complicated client-side code to perform tasks that are critical to the businesses. The code of mobile apps can be easily accessed by any party after the software is installed on a rooted or jailbroken device. By examining the code, skilled reverse engineers can learn various knowledge about the design and implementation of an app. Real-world cases have shown that the disclosed critical information allows malicious parties to abuse or exploit the app-provided services for unrightful profits, leading to significant financial losses for app vendors. One of the most viable mitigations against malicious reverse engineering is to obfuscate the software before release. Despite that security by obscurity is typically considered to be an unsound protection methodology, software obfuscation can indeed increase the cost of reverse engineering, thus delivering practical merits for protecting mobile apps. In this paper, we share our experience of applying obfuscation to multiple commercial iOS apps, each of which has millions of users. We discuss the necessity of adopting obfuscation for protecting modern mobile business, the challenges of software obfuscation on the iOS platform, and our efforts in overcoming these obstacles. Our report can benefit many stakeholders in the iOS ecosystem, including developers, security service providers, and Apple as the administrator of the ecosystem. 
    more » « less
  5. A new type of malicious crowdsourcing (a.k.a., crowdturfing) clients, mobile apps with hidden crowdturfing user interface (UI), is increasingly being utilized by miscreants to coordinate crowdturfing workers and publish mobile-based crowdturfing tasks (e.g., app ranking manipulation) even on the strictly controlled Apple App Store. These apps hide their crowdturfing content behind innocent-looking UIs to bypass app vetting and infiltrate the app store. To the best of our knowledge, little has been done so far to understand this new abusive service, in terms of its scope, impact and techniques, not to mention any effort to identify such stealthy crowdturfing apps on a large scale, particularly on the Apple platform. In this paper, we report the first measurement study on iOS apps with hidden crowdturfing UIs. Our findings bring to light the mobile-based crowdturfing ecosystem (e.g., app promotion for worker recruitment, campaign identification) and the underground developer's tricks (e.g., scheme, logic bomb) for evading app vetting. 
    more » « less