skip to main content

Title: Enabling Privacy Policies for mHealth Studies
Pervasive sensing has enabled continuous monitoring of user physiological state through mobile and wearable devices, allowing for large scale user studies to be conducted, such as those found in mHealth. However, current mHealth studies are limited in their ability of allowing users to express their privacy preferences on the data they share across multiple entities involved in a research study. In this work, we present mPolicy, a privacy policy language for study participants to express the context-aware and data-handling policies needed for mHealth. In addition, we provide a privacy-adaptive policy creation mechanism for byproduct data (such as motion inferences). Lastly, we create a software library called privLib for implementing parsing, enforcement, and policy creation on byproduct data for mPolicy. We evaluate the latency overhead of these operations, and discuss future improvements for scaling to realistic mHealth scenarios.
Award ID(s):
1636916 1640813 1822935
Publication Date:
Journal Name:
2019 IEEE International Conference on Big Data (Big Data)
Page Range or eLocation-ID:
4045 to 4054
Sponsoring Org:
National Science Foundation
More Like this
  1. Background The use of wearables facilitates data collection at a previously unobtainable scale, enabling the construction of complex predictive models with the potential to improve health. However, the highly personal nature of these data requires strong privacy protection against data breaches and the use of data in a way that users do not intend. One method to protect user privacy while taking advantage of sharing data across users is federated learning, a technique that allows a machine learning model to be trained using data from all users while only storing a user’s data on that user’s device. By keeping data on users’ devices, federated learning protects users’ private data from data leaks and breaches on the researcher’s central server and provides users with more control over how and when their data are used. However, there are few rigorous studies on the effectiveness of federated learning in the mobile health (mHealth) domain. Objective We review federated learning and assess whether it can be useful in the mHealth field, especially for addressing common mHealth challenges such as privacy concerns and user heterogeneity. The aims of this study are to describe federated learning in an mHealth context, apply a simulation of federated learningmore »to an mHealth data set, and compare the performance of federated learning with the performance of other predictive models. Methods We applied a simulation of federated learning to predict the affective state of 15 subjects using physiological and motion data collected from a chest-worn device for approximately 36 minutes. We compared the results from this federated model with those from a centralized or server model and with the results from training individual models for each subject. Results In a 3-class classification problem using physiological and motion data to predict whether the subject was undertaking a neutral, amusing, or stressful task, the federated model achieved 92.8% accuracy on average, the server model achieved 93.2% accuracy on average, and the individual model achieved 90.2% accuracy on average. Conclusions Our findings support the potential for using federated learning in mHealth. The results showed that the federated model performed better than a model trained separately on each individual and nearly as well as the server model. As federated learning offers more privacy than a server model, it may be a valuable option for designing sensitive data collection methods.« less
  2. Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill necessary to discover them. We describe a method for the automated detection of opt-out choices in privacy policy text and their presentation to users through a web browser extension. We describe the creation of two corpora of opt-out choices, which enable the training of classifiers to identify opt-outs in privacy policies. Our overall approach for extracting and classifying opt-out choices combines heuristics to identify commonly found opt-out hyperlinks with supervised machine learning to automatically identify less conspicuous instances. Our approach achieves a precision of 0.93 and a recall of 0.9. We introduce Opt-Out Easy, a web browser extension designed to present available opt-out choices to users as they browse the web. We evaluate the usability of our browser extension with a user study. We also present results of a large-scale analysis of opt-outs found in the text of thousands of the most popular websites.
  3. This paper presents the results of an interview study with twelve TikTok users to explore user awareness, perception, and experiences with the app’s algorithm in the context of privacy. The social media entertainment app TikTok collects user data to cater individualized video feeds based on users’ engagement with presented content which is regulated in a complex and overly long privacy policy. Our results demonstrate that participants generally have very little knowledge of the actual privacy regulations which is argued for with the benefit of receiving free entertaining content. However, participants experienced privacy-related downsides when algorithmically catered video content increasingly adapted to their biography, interests, or location and they in turn realized the detail of personal data that TikTok had access to. This illustrates the tradeoff users have to make between allowing TikTok to access their personal data and having favorable video consumption experiences on the app.
  4. Background Mobile health (mHealth) methods often rely on active input from participants, for example, in the form of self-report questionnaires delivered via web or smartphone, to measure health and behavioral indicators and deliver interventions in everyday life settings. For short-term studies or interventions, these techniques are deployed intensively, causing nontrivial participant burden. For cases where the goal is long-term maintenance, limited infrastructure exists to balance information needs with participant constraints. Yet, the increasing precision of passive sensors such as wearable physiology monitors, smartphone-based location history, and internet-of-things devices, in combination with statistical feature selection and adaptive interventions, have begun to make such things possible. Objective In this paper, we introduced Wear-IT, a smartphone app and cloud framework intended to begin addressing current limitations by allowing researchers to leverage commodity electronics and real-time decision making to optimize the amount of useful data collected while minimizing participant burden. Methods The Wear-IT framework uses real-time decision making to find more optimal tradeoffs between the utility of data collected and the burden placed on participants. Wear-IT integrates a variety of consumer-grade sensors and provides adaptive, personalized, and low-burden monitoring and intervention. Proof of concept examples are illustrated using artificial data. The results of qualitativemore »interviews with users are provided. Results Participants provided positive feedback about the ease of use of studies conducted using the Wear-IT framework. Users expressed positivity about their overall experience with the framework and its utility for balancing burden and excitement about future studies that real-time processing will enable. Conclusions The Wear-IT framework uses a combination of passive monitoring, real-time processing, and adaptive assessment and intervention to provide a balance between high-quality data collection and low participant burden. The framework presents an opportunity to deploy adaptive assessment and intervention designs that use real-time processing and provides a platform to study and overcome the challenges of long-term mHealth intervention.« less
  5. Organisations disclose their privacy practices by posting privacy policies on their websites. Even though internet users often care about their digital privacy, they usually do not read privacy policies, since understanding them requires a significant investment of time and effort. Natural language processing has been used to create experimental tools to interpret privacy policies, but there has been a lack of large privacy policy corpora to facilitate the creation of large-scale semi-supervised and unsupervised models to interpret and simplify privacy policies. Thus, we present the PrivaSeer Corpus of 1,005,380 English language website privacy policies collected from the web. The number of unique websites represented in PrivaSeer is about ten times larger than the next largest public collection of web privacy policies, and it surpasses the aggregate of unique websites represented in all other publicly available privacy policy corpora combined. We describe a corpus creation pipeline with stages that include a web crawler, language detection, document classification, duplicate and near-duplicate removal, and content extraction. We employ an unsupervised topic modelling approach to investigate the contents of policy documents in the corpus and discuss the distribution of topics in privacy policies at web scale. We further investigate the relationship between privacy policymore »domain PageRanks and text features of the privacy policies. Finally, we use the corpus to pretrain PrivBERT, a transformer-based privacy policy language model, and obtain state of the art results on the data practice classification and question answering tasks.« less