skip to main content


Search for: All records

Award ID contains: 1817248

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Identifying privacy-sensitive data leaks by mobile applications has been a topic of great research interest for the past decade. Technically, such data flows are not “leaks” if they are disclosed in a privacy policy. To address this limitation in automated analysis, recent work has combined program analysis of applications with analysis of privacy policies to determine the flow-to-policy consistency, and hence violations thereof. However, this prior work has a fundamental weakness: it does not differentiate the entity (e.g., first-party vs. third-party) receiving the privacy-sensitive data. In this paper, we propose POLICHECK, which formalizes and implements an entity-sensitive flow-to-policy consistency model. We use POLICHECK to study 13,796 applications and their privacy policies and find that up to 42.4% of applications either incorrectly disclose or omit disclosing their privacy-sensitive data flows. Our results also demonstrate the significance of considering entities: without considering entity, prior approaches would falsely classify up to 38.4% of applications as having privacy-sensitive data flows consistent with their privacy policies. These false classifications include data flows to third-parties that are omitted (e.g., the policy states only the first-party collects the data type), incorrect (e.g., the policy states the third-party does not collect the data type), and ambiguous (e.g., the policy has conflicting statements about the data type collection). By defining a novel automated, entity-sensitive flow-to-policy consistency analysis, POLICHECK provides the highest-precision method to date to determine if applications properly disclose their privacy-sensitive behaviors. 
    more » « less
  2. It is commonly assumed that “free” mobile apps come at the cost of consumer privacy and that paying for apps could offer consumers protection from behavioral advertising and long-term tracking. This work empirically evaluates the validity of this assumption by comparing the privacy practices of free apps and their paid premium versions, while also gauging consumer expectations surrounding free and paid apps. We use both static and dynamic analysis to examine 5,877 pairs of free Android apps and their paid counterparts for differences in data collection practices and privacy policies between pairs. To understand user expectations for paid apps, we conducted a 998-participant online survey and found that consumers expect paid apps to have better security and privacy behaviors. However, there is no clear evidence that paying for an app will actually guarantee protection from extensive data collection in practice. Given that the free version had at least one thirdparty library or dangerous permission, respectively, we discovered that 45% of the paid versions reused all of the same third-party libraries as their free versions, and 74% of the paid versions had all of the dangerous permissions held by the free app. Likewise, our dynamic analysis revealed that 32% of the paid apps exhibit all of the same data collection and transmission behaviors as their free counterparts. Finally, we found that 40% of apps did not have a privacy policy link in the Google Play Store and that only 3.7% of the pairs that did reflected differences between the free and paid versions. 
    more » « less
  3. Modern smartphone platforms implement permission-based models to protect access to sensitive data and system resources. However, apps can circumvent the permission model and gain access to protected data without user consent by using both covert and side channels. Side channels present in the implementation of the permission system allow apps to access protected data and system resources without permission; whereas covert channels enable communication between two colluding apps so that one app can share its permission-protected data with another app lacking those permissions. Both pose threats to user privacy. In this work, we make use of our infrastructure that runs hundreds of thousands of apps in an instrumented environment. This testing environment includes mechanisms to monitor apps' runtime behaviour and network traffic. We look for evidence of side and covert channels being used in practice by searching for sensitive data being sent over the network for which the sending app did not have permissions to access it. We then reverse engineer the apps and third-party libraries responsible for this behaviour to determine how the unauthorized access occurred. We also use software fingerprinting methods to measure the static prevalence of the technique that we discover among other apps in our corpus. Using this testing environment and method, we uncovered a number of side and covert channels in active use by hundreds of popular apps and third-party SDKs to obtain unauthorized access to both unique identifiers as well as geolocation data. We have responsibly disclosed our findings to Google and have received a bug bounty for our work. 
    more » « less
  4. It is commonly assumed that the availability of “free” mobile apps comes at the cost of consumer privacy, and that paying for apps could offer consumers protection from behavioral advertising and long-term tracking. This work empirically evaluates the validity of this assumption by investigating the degree to which “free” apps and their paid premium versions differ in their bundled code, their declared permissions, and their data collection behaviors and privacy practices. We compare pairs of free and paid apps using a combination of static and dynamic analysis. We also examine the differences in the privacy policies within pairs. We rely on static analysis to determine the requested permissions and third-party SDKs in each app; we use dynamic analysis to detect sensitive data collected by remote services at the network traffic level; and we compare text versions of privacy policies to identify differences in the disclosure of data collection behaviors. In total, we analyzed 1,505 pairs of free Android apps and their paid counterparts, with free apps randomly drawn from the Google Play Store’s category-level top charts. Our results show that over our corpus of free and paid pairs, there is no clear evidence that paying for an app will guarantee protection from extensive data collection. Specifically, 48% of the paid versions reused all of the same third-party libraries as their free versions, while 56% of the paid versions inherited all of the free versions’ Android permissions to access sensitive device resources (when considering free apps that include at least one third-party library and request at least one Android permission). Additionally, our dynamic analysis reveals that 38% of the paid apps exhibit all of the same data collection and transmission behaviors as their free counterparts. Our exploration of privacy policies reveals that only 45% of the pairs provide a privacy policy of some sort, and less than 1% of the pairs overall have policies that differ between free and paid versions. 
    more » « less
  5. The dominant privacy framework of the information age relies on notions of “notice and consent.” That is, service providers will disclose, often through privacy policies, their data collection practices, and users can then consent to their terms. However, it is unlikely that most users comprehend these disclosures, which is due in no small part to ambiguous, deceptive, and misleading statements. By comparing actual collection and sharing practices to disclosures in privacy policies, we demonstrate the scope of the problem. Through analysis of 68,051 apps from the Google Play Store, their corresponding privacy policies, and observed data transmissions, we investigated the potential misrepresentations of apps in the Designed For Families (DFF) program, inconsistencies in disclosures regarding third-party data sharing, as well as contradictory disclosures about secure data transmissions. We find that of the 8,030 DFF apps (i.e., apps directed at children), 9.1% claim that their apps are not directed at children, while 30.6% claim to have no knowledge that the received data comes from children. In addition, we observe that 10.5% of 68,051 apps share personal identifiers with third-party service providers, yet do not declare any in their privacy policies, and only 22.2% of the apps explicitly name third parties. This ultimately makes it not only difficult, but in most cases impossible, for users to establish where their personal data is being processed. Furthermore, we find that 9,424 apps do not use TLS when transmitting personal identifiers, yet 28.4% of these apps claim to take measures to secure data transfer. Ultimately, these divergences between disclosures and actual app behaviors illustrate the ridiculousness of the notice and consent framework. 
    more » « less