skip to main content


Title: Not Your Father's Big Data (Lighting Talk)
Embedded database libraries provide developers with a com- mon and convenient data persistence layer. They have spread to many systems, including interactive devices like smart- phones, appearing in all major mobile systems. Their perfor- mance affects the response times and resource consumption of millions of phone apps and billions of phone users. It is thus critical that we better understand how they work, so they can be used more efficiently, and so developers can make faster libraries. Mobile databases differ significantly from server-class storage in terms of platform, usage, and measurement. Phones are multi-tenant, end-user devices that the database must share with other apps. Contrary to traditional database design goals, workloads on phones are single-app, bursty, and rarely saturate the CPU. We argue that mobile storage design should refocus on what matters on the mobile platform: latency and energy. As accurate per- formance measurement tools are necessary to evaluation of good database design, this uncovers another issue: Tradi- tional database benchmarking methods produce misleading results when applied to mobile devices, due to evaluating performance at saturation. Development of databases and measurements specifically designed for the mobile platform is necessary to optimize user experience of the most common database usage in the world.  more » « less
Award ID(s):
1617586
NSF-PAR ID:
10175110
Author(s) / Creator(s):
Date Published:
Journal Name:
Conference on Innovative Data Systems Research (CIDR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This work presents the first-ever detailed and large-scale measurement analysis of storage consumption behavior of applications (apps) on smart mobile devices. We start by carrying out a five-year longitudinal static analysis of millions of Android apps to study the increase in their sizes over time and identify various sources of app storage consumption. Our study reveals that mobile apps have evolved as large monolithic packages that are packed with features to monetize/engage users and optimized for performance at the cost of redundant storage consumption. We also carry out a mobile storage usage study with 140 Android participants. We built and deployed a lightweight context-aware storage tracing tool, called cosmos, on each participant's device. Leveraging the traces from our user study, we show that only a small fraction of apps/features are actively used and usage is correlated to user context. Our findings suggest a high degree of app feature bloat and unused functionality, which leads to inefficient use of storage. Furthermore, we found that apps are not constrained by storage quota limits, and developers freely abuse persistent storage by frequently caching data, creating debug logs, user analytics, and downloading advertisements as needed. Finally, drawing upon our findings, we discuss the need for efficient mobile storage management, and propose an elastic storage design to reclaim storage space when unused. We further identify research challenges and quantify expected storage savings from such a design. We believe our findings will be valuable to the storage research community as well as mobile app developers. 
    more » « less
  2. Use of mobile phones today has become pervasive throughout society. A common use of a phone involves calling another person using VoIP apps. However the OSes on mobile devices are prone to compromise creating a risk for users who want to have private conversations when calling someone. Mobile devices today provide a hardware-protected mode called trusted execution environment (TEE) to protect users from a compromised OS. In this paper we propose a design to allow a user to make a secure end-to-end protected VoIP call from a compromised mobile phone. We implemented our design, TruzCall using Android OS and TrustZone TEE running OP-TEE OS. We built a prototype using the TrustZone-enabled Hikey development board and tested our design using the open source VoIP app Linphone. Our testing utilizes a simulation based environment that allows a Hikey board to use a real phone for audio hardware. 
    more » « less
  3. null (Ed.)
    Residential proxy has emerged as a service gaining popularity recently, in which proxy providers relay their customers’ network traffic through millions of proxy peers under their control. We find that many of these proxy peers are mobile devices, whose role in the proxy network can have significant security implications since mobile devices tend to be privacy and resource-sensitive. However, little effort has been made so far to understand the extent of their involvement, not to mention how these devices are recruited by the proxy network and what security and privacy risks they may pose. In this paper, we report the first measurement study on the mobile proxy ecosystem. Our study was made possible by a novel measurement infrastructure, which enabled us to identify proxy providers, to discover proxy SDKs (software development kits), to detect Android proxy apps built upon the proxy SDKs, to harvest proxy IP addresses, and to understand proxy traffic. The information collected through this infrastructure has brought to us new understandings of this ecosystem and important security discoveries. More specifically, 4 proxy providers were found to offer app developers mobile proxy SDKs as a competitive app monetization channel, with $50K per month per 1M MAU (monthly active users). 1,701 Android APKs (belonging to 963 Android apps) turn out to have integrated those proxy SDKs, with most of them available on Google Play with at least 300M installations in total. Furthermore, 48.43% of these APKs are flagged by at least 5 anti-virus engines as malicious, which could explain why 86.60% of the 963 Android apps have been removed from Google Play by Oct 2019. Besides, while these apps display user consent dialogs on traffic relay, our user study indicates that the user consent texts are quite confusing. We even discover a proxy SDK that stealthily relays traffic without showing any notifications. We also captured 625K cellular proxy IPs, along with a set of suspicious activities observed in proxy traffic such as ads fraud. We have reported our findings to affected parties, offered suggestions, and proposed the methodologies to detect proxy apps and proxy traffic. 
    more » « less
  4. The majority of sensitive and personal user data is stored in different Database Management Systems (DBMS). For example, Oracle is frequently used to store corporate data, MySQL serves as the back-end storage for most webstores, and SQLite stores personal data such as SMS messages on a phone or browser bookmarks. Each DBMS manages its own storage (within the operating system), thus databases require their own set of forensic tools. While database carving solutions have been built by multiple research groups, forensic investigators today still lack the tools necessary to analyze DBMS forensic artifacts. The unique nature of database storage and the resulting forensic artifacts require established standards for artifact storage and viewing mechanisms in order for such advanced analysis tools to be developed. In this paper, we present 1) a standard storage format, Database Forensic File Format (DB3F), for database forensic tools output that follows the guidelines established by other (file system) forensic tools, and 2) a view and search toolkit, Database Forensic Toolkit (DF-Toolkit), that enables the analysis of data stored in our database forensic format. Using our prototype implementation, we demonstrate that our toolkit follows the state-of-the-art design used by current forensic tools and offers easy-to-interpret database artifact search capabilities. 
    more » « less
  5. Today there is no effective support for device-wide question answer- ing on mobile devices. State-of-the-art QA models are deep learning behemoths designed for the cloud which run extremely slow and require more memory than available on phones. We present DeQA, a suite of latency- and memory- optimizations that adapts existing QA systems to run completely locally on mobile phones. Specifi- cally, we design two latency optimizations that (1) stops processing documents if further processing cannot improve answer quality, and (2) identifies computation that does not depend on the ques- tion and moves it offline. These optimizations do not depend on the QA model internals and can be applied to several existing QA models. DeQA also implements a set of memory optimizations by (i) loading partial indexes in memory, (ii) working with smaller units of data, and (iii) replacing in-memory lookups with a key-value database. We use DeQA to port three state-of-the-art QA systems to the mobile device and evaluate over three datasets. The first is a large scale SQuAD dataset defined over Wikipedia collection. We also create two on-device QA datasets, one over a publicly available email data collection and the other using a cross-app data collection we obtain from two users. Our evaluations show that DeQA can run QA models with only a few hundred MBs of memory and provides at least 13x speedup on average on the mobile phone across all three datasets. 
    more » « less