Healthcare applications on Voice Personal Assistant System (e.g., Amazon Alexa), have shown a great promise to deliver personalized health services via a conversational interface. However, concerns are also raised about privacy, safety, and service quality. In this paper, we propose VerHealth, to systematically assess health-related applications on Alexa for how well they comply with existing privacy and safety policies. VerHealth contains a static module and a dynamic module based on machine learning that can trigger and detect violation behaviors hidden deep in the interaction threads. We use VerHealth to analyze 813 health-related applications on Alexa by sending over 855,000 probing questions and analyzing 863,000 responses. We also consult with three medical school students (domain experts) to confirm and assess the potential violations. We show that violations are quite common, e.g., 86.36% of them miss disclaimers when providing medical information; 30.23% of them store user physical or mental health data without approval. Domain experts believe that the applications' medical suggestions are often factually-correct but are of poor relevance, and applications should have asked more questions before providing suggestions for over half of the cases. Finally, we use our results to discuss possible directions for improvements.
more »
« less
This content will become publicly available on May 2, 2026
Expanding Perspectives on Data Privacy: Insights from Rural Togo
Passively collected big data sources are increasingly used to inform critical development policy decisions in low- and middle-income countries. While prior work highlights how such approaches may reveal sensitive information, enable surveillance, and centralize power, less is known about the corresponding privacy concerns, hopes, and fears of the people directly impacted by these policies --- people sometimes referred to asexperiential experts.To understand the perspectives of experiential experts, we conducted semi-structured interviews with people living in rural villages in Togo shortly after an entirely digital cash transfer program was launched that used machine learning and mobile phone metadata to determine program eligibility. This paper documents participants' privacy concerns surrounding the introduction of big data approaches in development policy. We find that the privacy concerns of our experiential experts differ from those raised by privacy and developmentdomain experts.To facilitate a more robust and constructive account of privacy, we discuss implications for policies and designs that take seriously the privacy concerns raised by both experiential experts and domain experts.
more »
« less
- Award ID(s):
- 1942702
- PAR ID:
- 10596456
- Publisher / Repository:
- ACM Digital Library
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 9
- Issue:
- 2
- ISSN:
- 2573-0142
- Page Range / eLocation ID:
- 1 to 29
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The eruption of big data with the increasing collection and processing of vast volumes and variety of data have led to breakthrough discoveries and innovation in science, engineering, medicine, commerce, criminal justice, and national security that would not have been possible in the past. While there are many benefits to the collection and usage of big data, there are also growing concerns among the general public on what personal information is collected and how it is used. In addition to legal policies and regulations, technological tools and statistical strategies also exist to promote and safeguard individual privacy, while releasing and sharing useful population-level information. In this overview, I introduce some of these approaches, as well as the existing challenges and opportunities in statistical data privacy research and applications to better meet the practical needs of privacy protection and information sharing.more » « less
-
Identifying privacy-sensitive data leaks by mobile applications has been a topic of great research interest for the past decade. Technically, such data flows are not “leaks” if they are disclosed in a privacy policy. To address this limitation in automated analysis, recent work has combined program analysis of applications with analysis of privacy policies to determine the flow-to-policy consistency, and hence violations thereof. However, this prior work has a fundamental weakness: it does not differentiate the entity (e.g., first-party vs. third-party) receiving the privacy-sensitive data. In this paper, we propose POLICHECK, which formalizes and implements an entity-sensitive flow-to-policy consistency model. We use POLICHECK to study 13,796 applications and their privacy policies and find that up to 42.4% of applications either incorrectly disclose or omit disclosing their privacy-sensitive data flows. Our results also demonstrate the significance of considering entities: without considering entity, prior approaches would falsely classify up to 38.4% of applications as having privacy-sensitive data flows consistent with their privacy policies. These false classifications include data flows to third-parties that are omitted (e.g., the policy states only the first-party collects the data type), incorrect (e.g., the policy states the third-party does not collect the data type), and ambiguous (e.g., the policy has conflicting statements about the data type collection). By defining a novel automated, entity-sensitive flow-to-policy consistency analysis, POLICHECK provides the highest-precision method to date to determine if applications properly disclose their privacy-sensitive behaviors.more » « less
-
The HandyTech's Coming Between 1 and 4: Privacy Opportunities and Challenges for the IoT HandypersonSmart homes are gaining popularity due to their convenience and efficiency, both of which come at the expense of increased complexity of Internet of Things (IoT) devices. Due to the number and heterogeneity of IoT devices, technologically inexperienced or time-burdened residents are unlikely to manage the setup and maintenance of IoT apps and devices. We highlight the need for a "HandyTech": a technically skilled contractor who can set up, repair, debug, monitor, and troubleshoot home IoT systems. In this paper, we consider the potential privacy challenges posed by the HandyTech, who has the ability to access IoT devices and private data. We do so in the context of single and multi-user smart homes, including rental units, condominiums, and temporary guests or workers. We examine the privacy harms that can arise when a HandyTech has legitimate access to information, but uses it in unintended ways. By providing insights for the development of privacy control policies and measures in-home IoT environments in the presence of the HandyTech, we capture the privacy concerns raised by other visitors to the home, including temporary residents, part-time workers, etc. This helps lay a foundation for the broad set of privacy concerns raised by home IoT systems.more » « less
-
Organisations disclose their privacy practices by posting privacy policies on their websites. Even though internet users often care about their digital privacy, they usually do not read privacy policies, since understanding them requires a significant investment of time and effort. Natural language processing has been used to create experimental tools to interpret privacy policies, but there has been a lack of large privacy policy corpora to facilitate the creation of large-scale semi-supervised and unsupervised models to interpret and simplify privacy policies. Thus, we present the PrivaSeer Corpus of 1,005,380 English language website privacy policies collected from the web. The number of unique websites represented in PrivaSeer is about ten times larger than the next largest public collection of web privacy policies, and it surpasses the aggregate of unique websites represented in all other publicly available privacy policy corpora combined. We describe a corpus creation pipeline with stages that include a web crawler, language detection, document classification, duplicate and near-duplicate removal, and content extraction. We employ an unsupervised topic modelling approach to investigate the contents of policy documents in the corpus and discuss the distribution of topics in privacy policies at web scale. We further investigate the relationship between privacy policy domain PageRanks and text features of the privacy policies. Finally, we use the corpus to pretrain PrivBERT, a transformer-based privacy policy language model, and obtain state of the art results on the data practice classification and question answering tasks.more » « less
An official website of the United States government
