skip to main content


Title: Heimdall: A Privacy-Respecting Implicit Preference Collection Framework
Many of the everyday decisions a user makes rely on the suggestions of online recommendation systems. These systems amass implicit (e.g., location, purchase history, browsing history) and explicit (e.g., reviews, ratings) feedback from multiple users, produce a general consensus, and provide suggestions based on that consensus. However, due to privacy concerns, users are uncomfortable with implicit data collection, thus requiring recommendation systems to be overly dependent on explicit feedback. Unfortunately, users do not frequently provide explicit feedback. This hampers the ability of recommendation systems to provide high-quality suggestions. We introduce Heimdall, the first privacy-respecting implicit preference collection framework that enables recommendation systems to extract user preferences from their activities in a privacy respect- ing manner. The key insight is to enable recommendation systems to run a collector on a user’s device and precisely control the information a collector transmits to the recommendation system back- end. Heimdall introduces immutable blobs as a mechanism to guarantee this property. We implemented Heimdall on the Android plat- form and wrote three example collectors to enhance recommendation systems with implicit feedback. Our performance results suggest that the overhead of immutable blobs is minimal, and a user study of 166 participants indicates that privacy concerns are significantly less when collectors record only specific information—a property that Heimdall enables.  more » « less
Award ID(s):
1318722
NSF-PAR ID:
10080655
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys)
Page Range / eLocation ID:
453 to 463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The emergence of mobile apps (e.g., location-based services, geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models with limited expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central to our approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to accelerate learning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques. 
    more » « less
  2. The emergence of mobile apps (e.g., location-based services,geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models with limited expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central toour approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to acceleratelearning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques. 
    more » « less
  3. null (Ed.)
    The success of cross-domain recommender systems in capturing user interests across multiple domains has recently brought much attention to them. These recommender systems aim to improve the quality of suggestions and defy the cold-start problem by transferring information from one (or more) source domain(s) to a target domain. However, most cross-domain recommenders ignore the sequential information in user history. They only rely on an aggregate or snapshot of user feedback in the past. Most importantly, they do not explicitly model how users transition from one domain to another domain as users continue to interact with different item domains. In this paper, we argue that between-domain transitions in user sequences are useful in improving recommendation quality, dealing with the cold-start problem, and revealing interesting aspects of how user interests transform from one domain to another. We propose TransCrossCF, transition-based cross-domain collaborative filtering, that can capture both within and between domain transitions of user feedback sequences while understanding the relationship between different item types in different domains. Specifically, we model each purchase of a user as a transition from his/her previous item to the next one, under the effect of item domains and user preferences. Our intensive experiments demonstrate that TransCrossCF outperforms the state-of-the-art methods in recommendation task on three real-world datasets, both in the cold-start and hot-start scenarios. Moreover, according to our context analysis evaluations, the between-domain relations captured by TransCrossCF are interpretable and intuitive. 
    more » « less
  4. null (Ed.)
    Amazon's voice-based assistant, Alexa, enables users to directly interact with various web services through natural language dialogues. It provides developers with the option to create third-party applications (known as Skills) to run on top of Alexa. While such applications ease users' interaction with smart devices and bolster a number of additional services, they also raise security and privacy concerns due to the personal setting they operate in. This paper aims to perform a systematic analysis of the Alexa skill ecosystem. We perform the first large-scale analysis of Alexa skills, obtained from seven different skill stores totaling to 90,194 unique skills. Our analysis reveals several limitations that exist in the current skill vetting process. We show that not only can a malicious user publish a skill under any arbitrary developer/company name, but she can also make backend code changes after approval to coax users into revealing unwanted information. We, next, formalize the different skill-squatting techniques and evaluate the efficacy of such techniques. We find that while certain approaches are more favorable than others, there is no substantial abuse of skill squatting in the real world. Lastly, we study the prevalence of privacy policies across different categories of skill, and more importantly the policy content of skills that use the Alexa permission model to access sensitive user data. We find that around 23.3% of such skills do not fully disclose the data types associated with the permissions requested. We conclude by providing some suggestions for strengthening the overall ecosystem, and thereby enhance transparency for end-users. 
    more » « less
  5. Implicit feedback is widely used in collaborative filtering methods for recommendation. It is well known that implicit feedback contains a large number of values that are missing not at random (MNAR); and the missing data is a mixture of negative and unknown feedback, making it difficult to learn users’ negative preferences. Recent studies modeled exposure, a latent missingness variable which indicates whether an item is exposed to a user, to give each missing entry a confidence of being negative feedback. However, these studies use static models and ignore the information in temporal dependencies among items, which seems to be an essential underlying factor to subsequent missingness. To model and exploit the dynamics of missingness, we propose a latent variable named “user intent” to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process. The resulting framework captures the dynamic item missingness and incorporate it into matrix factorization (MF) for recommendation. We also explore two types of constraints to achieve a more compact and interpretable representation of user intents. Experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommender systems. 
    more » « less