skip to main content


Title: Privacy, Permissions, and the Health App Ecosystem: A Stack Overflow Exploration
Abstract: Health data is considered to be sensitive and personal; both governments and software platforms have enacted specific measures to protect it. Consumer apps that collect health data are becoming more popular, but raise new privacy concerns as they collect unnecessary data, share it with third parties, and track users. However, developers of these apps are not necessarily knowingly endangering users’ privacy; some may simply face challenges working with health features. To scope these challenges, we qualitatively analyzed 269 privacy-related posts on Stack Overflow by developers of health apps for Android- and iOS-based systems. We found that health-specific access control structures (e.g., enhanced requirements for permissions and authentication) underlie several privacy-related challenges developers face. The specific nature of problems often differed between the platforms, for example additional verification steps for Android developers, or confusing feedback about incorrectly formulated permission scopes for iOS. Developers also face problems introduced by third-party libraries. Official documentation plays a key part in understanding privacy requirements, but in some cases, may itself cause confusion. We discuss implications of our findings and propose ways to improve developers’ experience of working with health-related features -- and consequently to improve the privacy of their apps’ end users.  more » « less
Award ID(s):
2055772
NSF-PAR ID:
10398420
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2022 European Symposium on Usable Security (EuroUSEC '22))
Page Range / eLocation ID:
117 to 130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background Mobile health technology has demonstrated the ability of smartphone apps and sensors to collect data pertaining to patient activity, behavior, and cognition. It also offers the opportunity to understand how everyday passive mobile metrics such as battery life and screen time relate to mental health outcomes through continuous sensing. Impulsivity is an underlying factor in numerous physical and mental health problems. However, few studies have been designed to help us understand how mobile sensors and self-report data can improve our understanding of impulsive behavior. Objective The objective of this study was to explore the feasibility of using mobile sensor data to detect and monitor self-reported state impulsivity and impulsive behavior passively via a cross-platform mobile sensing application. Methods We enrolled 26 participants who were part of a larger study of impulsivity to take part in a real-world, continuous mobile sensing study over 21 days on both Apple operating system (iOS) and Android platforms. The mobile sensing system (mPulse) collected data from call logs, battery charging, and screen checking. To validate the model, we used mobile sensing features to predict common self-reported impulsivity traits, objective mobile behavioral and cognitive measures, and ecological momentary assessment (EMA) of state impulsivity and constructs related to impulsive behavior (ie, risk-taking, attention, and affect). Results Overall, the findings suggested that passive measures of mobile phone use such as call logs, battery charging, and screen checking can predict different facets of trait and state impulsivity and impulsive behavior. For impulsivity traits, the models significantly explained variance in sensation seeking, planning, and lack of perseverance traits but failed to explain motor, urgency, lack of premeditation, and attention traits. Passive sensing features from call logs, battery charging, and screen checking were particularly useful in explaining and predicting trait-based sensation seeking. On a daily level, the model successfully predicted objective behavioral measures such as present bias in delay discounting tasks, commission and omission errors in a cognitive attention task, and total gains in a risk-taking task. Our models also predicted daily EMA questions on positivity, stress, productivity, healthiness, and emotion and affect. Perhaps most intriguingly, the model failed to predict daily EMA designed to measure previous-day impulsivity using face-valid questions. Conclusions The study demonstrated the potential for developing trait and state impulsivity phenotypes and detecting impulsive behavior from everyday mobile phone sensors. Limitations of the current research and suggestions for building more precise passive sensing models are discussed. Trial Registration ClinicalTrials.gov NCT03006653; https://clinicaltrials.gov/ct2/show/NCT03006653 
    more » « less
  2. The European General Data Protection Regulation (GDPR) mandates a data controller (e.g., an app developer) to provide all information specified in Articles (Arts.) 13 and 14 to data subjects (e.g., app users) regarding how their data are being processed and what are their rights. While some studies have started to detect the fulfillment of GDPR requirements in a privacy policy, their exploration only focused on a subset of mandatory GDPR requirements. In this paper, our goal is to explore the state of GDPR-completeness violations in mobile apps' privacy policies. To achieve our goal, we design the PolicyChecker framework by taking a rule and semantic role based approach. PolicyChecker automatically detects completeness violations in privacy policies based not only on all mandatory GDPR requirements but also on all if-applicable GDPR requirements that will become mandatory under specific conditions. Using PolicyChecker, we conduct the first large-scale GDPR-completeness violation study on 205,973 privacy policies of Android apps in the UK Google Play store. PolicyChecker identified 163,068 (79.2%) privacy policies containing data collection statements; therefore, such policies are regulated by GDPR requirements. However, the majority (99.3%) of them failed to achieve the GDPR-completeness with at least one unsatisfied requirement; 98.1% of them had at least one unsatisfied mandatory requirement, while 73.0% of them had at least one unsatisfied if-applicable requirement logic chain. We conjecture that controllers' lack of understanding of some GDPR requirements and their poor practices in composing a privacy policy can be the potential major causes behind the GDPR-completeness violations. We further discuss recommendations for app developers to improve the completeness of their apps' privacy policies to provide a more transparent personal data processing environment to users. 
    more » « less
  3. The transparency and privacy behavior of mobile browsers has remained widely unexplored by the research community. In fact, as opposed to regular Android apps, mobile browsers may present contradicting privacy behaviors. On the one end, they can have access to (and can expose) a unique combination of sensitive user data, from users’ browsing history to permission-protected personally identifiable information (PII) such as unique identifiers and geolocation. However, on the other end, they also are in a unique position to protect users’ privacy by limiting data sharing with other parties by implementing ad-blocking features. In this paper, we perform a comparative and empirical analysis on how hundreds of Android web browsers protect or expose user data during browsing sessions. To this end, we collect the largest dataset of Android browsers to date, from the Google Play Store and four Chinese app stores. Then, we developed a novel analysis pipeline that combines static and dynamic analysis methods to find a wide range of privacy-enhancing (e.g., ad-blocking) and privacy-harming behaviors (e.g., sending browsing histories to third parties, not validating TLS certificates, and exposing PII---including non-resettable identifiers---to third parties) across browsers. We find that various popular apps on both Google Play and Chinese stores have these privacy-harming behaviors, including apps that claim to be privacy-enhancing in their descriptions. Overall, our study not only provides new insights into important yet overlooked considerations for browsers’ adoption and transparency, but also that automatic app analysis systems (e.g., sandboxes) need context-specific analysis to reveal such privacy behaviors. 
    more » « less
  4. Online app search optimization (ASO) platforms that provide bulk installs and fake reviews for paying app developers in order to fraudulently boost their search rank in app stores, were shown to employ diverse and complex strategies that successfully evade state-of-the-art detection methods. In this paper we introduce RacketStore, a platform to collect data from Android devices of participating ASO providers and regular users, on their interactions with apps which they install from the Google Play Store. We present measurements from a study of 943 installs of RacketStore on 803 unique devices controlled by ASO providers and regular users, that consists of 58,362,249 data snapshots collected from these devices, the 12,341 apps installed on them and their 110,511,637 Google Play reviews. We reveal significant differences between ASO providers and regular users in terms of the number and types of user accounts registered on their devices, the number of apps they review, and the intervals between the installation times of apps and their review times. We leverage these insights to introduce features that model the usage of apps and devices, and show that they can train supervised learning algorithms to detect paid app installs and fake reviews with an F1-measure of 99.72% (AUC above 0.99), and detect devices controlled by ASO providers with an F1-measure of 95.29% (AUC = 0.95). We discuss the costs associated with evading detection by our classifiers and also the potential for app stores to use our approach to detect ASO work with privacy. 
    more » « less
  5. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less