skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DAISY: Dynamic-Analysis-Induced Source Discovery for Sensitive Data
Mobile apps are widely used and often process users’ sensitive data. Many taint analysis tools have been applied to analyze sensitive information flows and report data leaks in apps. These tools require a list of sources (where sensitive data is accessed) as input, and researchers have constructed such lists within the Android platform by identifying Android API methods that allow access to sensitive data. However, app developers may also define methods or use third-party library’s methods for accessing data. It is difficult to collect such source methods because they are unique to the apps, and there are a large number of third-party libraries available on the market that evolve over time. To address this problem, we propose DAISY, a Dynamic-Analysis-Induced Source discoverY approach for identifying methods that return sensitive information from apps and third-party libraries. Trained on an automatically labeled data set of methods and their calling context, DAISY identifies sensitive methods in unseen apps. We evaluated DAISY on real-world apps and the results show that DAISY can achieve an overall precision of 77.9% when reporting the most confident results. Most of the identified sources and leaks cannot be detected by existing technologies.  more » « less
Award ID(s):
2007718 1846467 2221843 1948244 1736209
PAR ID:
10397875
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Software Engineering and Methodology
ISSN:
1049-331X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There has been a proliferation of mobile apps in the Medical, as well as Health&Fitness categories. These apps have a wide audience, from medical providers, to patients, to end users who want to track their fitness goals. The low barrier to entry on mobile app stores raises questions about the diligence and competence of the developers who publish these apps, especially regarding the practices they use for user data collection, processing, and storage. To help understand the nature of data that is collected, and how it is processed, as well as where it is sent, we developed a tool named PIT (Personal Information Tracker) and made it available as open source. We used PIT to perform a multi-faceted study on 2832 Android apps: 2211 Medical apps and 621 Health&Fitness apps. We first define Personal Information (PI) as 17 different groups of sensitive information, e.g., user’s identity, address and financial information, medical history or anthropometric data. PIT first extracts the elements in the app’s User Interface (UI) where this information is collected. The collected information could be processed by the app’s own code or third-party code; our approach disambiguates between the two. Next, PIT tracks, via static analysis, where the information is “leaked”, i.e., it escapes the scope of the app, either locally on the phone or remotely via the network. Then, we conduct a link analysis that examines the URLs an app connects with, to understand the origin and destination of data that apps collect and process. We found that most apps leak 1–5 PI items (email, credit card, phone number, address, name, being the most frequent). Leak destinations include the network (25%), local databases (37%), logs (23%), and files or I/O (15%). While Medical apps have more leaks overall, as they collect data on medical history, surprisingly, Health&Fitness apps also collect, and leak, medical data. We also found that leaks that are due to third-party code (e.g., code for ads, analytics, or user engagement) are much more numerous (2x–12x) than leaks due to app’s own code. Finally, our link analysis shows that most apps access 20–80 URLs (typically third-party URLs and Cloud APIs) though some apps could access more than 1,000 URLs. 
    more » « less
  2. It is commonly assumed that the availability of “free” mobile apps comes at the cost of consumer privacy, and that paying for apps could offer consumers protection from behavioral advertising and long-term tracking. This work empirically evaluates the validity of this assumption by investigating the degree to which “free” apps and their paid premium versions differ in their bundled code, their declared permissions, and their data collection behaviors and privacy practices. We compare pairs of free and paid apps using a combination of static and dynamic analysis. We also examine the differences in the privacy policies within pairs. We rely on static analysis to determine the requested permissions and third-party SDKs in each app; we use dynamic analysis to detect sensitive data collected by remote services at the network traffic level; and we compare text versions of privacy policies to identify differences in the disclosure of data collection behaviors. In total, we analyzed 1,505 pairs of free Android apps and their paid counterparts, with free apps randomly drawn from the Google Play Store’s category-level top charts. Our results show that over our corpus of free and paid pairs, there is no clear evidence that paying for an app will guarantee protection from extensive data collection. Specifically, 48% of the paid versions reused all of the same third-party libraries as their free versions, while 56% of the paid versions inherited all of the free versions’ Android permissions to access sensitive device resources (when considering free apps that include at least one third-party library and request at least one Android permission). Additionally, our dynamic analysis reveals that 38% of the paid apps exhibit all of the same data collection and transmission behaviors as their free counterparts. Our exploration of privacy policies reveals that only 45% of the pairs provide a privacy policy of some sort, and less than 1% of the pairs overall have policies that differ between free and paid versions. 
    more » « less
  3. Android’s flexible communication model allows interactions among third-party apps, but it also leads to inter-app security vulnerabilities. Specifically, malicious apps can eavesdrop on interactions between other apps or exploit the functionality of those apps, which can expose a user’s sensitive information to attackers. While the state-of-the-art tools have focused on detecting inter-app vulnerabilities in Android, they neither accurately analyze realistically large numbers of apps nor effectively deliver the identified issues to users. This paper presents SEALANT, a novel tool that combines static analysis and visualization techniques that, together, enable accurate identification of inter-app vulnerabilities as well as their systematic visualization. SEALANT statically analyzes architectural information of a given set of apps, infers vulnerable communication channels where inter-app attacks can be launched, and visualizes the identified information in a compositional representation. SEALANT has been demonstrated to accurately identify inter-app vulnerabilities from hundreds of real-world Android apps and to effectively deliver the identified information to users. 
    more » « less
  4. In an era marked by ubiquitous reliance on mobile applications for nearly every need, the opacity of apps’ behavior poses significant threats to their users’ privacy. Although major data protection regulations require apps to disclose their data practices transparently, previous studies have pointed out difficulties in doing so. To further delve into this issue, this article describes an automated method to capture data-sharing practices in Android apps and assess their proper disclosure according to the EU General Data Protection Regulation. We applied the method to 9,000 random Android apps, unveiling an uncomfortable reality: over 80% of Android applications that transfer personal data off device potentially fail to meet GDPR transparency requirements. We further investigate the role of third-party libraries, shedding light on the source of this problem and pointing towards measures to address it. 
    more » « less
  5. Internet of Things is growing rapidly, with many connected devices now available to consumers. With this growth, the IoT apps that manage the devices from smartphones raise significant security concerns. Typically, these apps are secured via sensitive credentials such as email and password that need to be validated through specific servers, thus requiring permissions to access the Internet. Unfortunately, even when developers of these apps are well-intentioned, such apps can be non-trivial to secure so as to guarantee that user’s credentials do not leak to unauthorized servers on the Internet. For example, if the app relies on third-party libraries, as many do, those libraries can potentially capture and leak sensitive credentials. Bugs in the applications can also result in exploitable vulnerabilities that leak credentials. This paper presents our work in-progress on a prototype that enables developers to control how information flows within the app from sensitive UI data to specific servers. We extend FlowFence to enforce fine-grained information flow policies on sensitive UI data. 
    more » « less