skip to main content


Title: A Data-Driven Approach to Developing IoT Privacy-Setting Interfaces
User testing is often used to inform the development of user interfaces (UIs). But what if an interface needs to be developed for a system that does not yet exist? In that case, existing datasets can provide valuable input for UI development. We apply a data-driven approach to the development of a privacy-setting interface for Internet-of-Things (IoT) devices. Applying machine learning techniques to an existing dataset of users' sharing preferences in IoT scenarios, we develop a set of "smart" default profiles. Our resulting interface asks users to choose among these profiles, which capture their preferences with an accuracy of 82%---a 14% improvement over a naive default setting and a 12% improvement over a single smart default setting for all users.  more » « less
Award ID(s):
1640664 1640527
NSF-PAR ID:
10061086
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
23rd International Conference on Intelligent User Interfaces
Page Range / eLocation ID:
165 to 176
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fitness trackers are undoubtedly gaining in popularity. As fitness-related data are persistently captured, stored, and processed by these devices, the need to ensure users’ privacy is becoming increasingly urgent. In this paper, we apply a data-driven approach to the development of privacy-setting recommendations for fitness devices. We first present a fitness data privacy model that we defined to represent users’ privacy preferences in a way that is unambiguous, compliant with the European Union’s General Data Protection Regulation (GDPR), and able to represent both the user and the third party preferences. Our crowdsourced dataset is collected using current scenarios in the fitness domain and used to identify privacy profiles by applying machine learning techniques. We then examine different personal tracking data and user traits which can potentially drive the recommendation of privacy profiles to the users. Finally, a set of privacy-setting recommendation strategies with different guidance styles are designed based on the resulting profiles. Interestingly, our results show several semantic relationships among users’ traits, characteristics, and attitudes that are useful in providing privacy recommendations. Even though several works exist on privacy preference modeling, this paper makes a contribution in modeling privacy preferences for data sharing and processing in the IoT and fitness domain, with specific attention to GDPR compliance. Moreover, the identification of well-identified clusters of preferences and predictors of such clusters is a relevant contribution for user profiling and for the design of interactive recommendation strategies that aim to balance users’ control over their privacy permissions and the simplicity of setting these permissions. 
    more » « less
  2. Emerging smart home platforms, which interface with a variety of physical devices and support third-party application development, currently use permission models inspired by smartphone operating systems—the permission to access operations are separated by the device which performs them instead of their functionality. Unfortunately, this leads to two issues: (1) apps that do not require access to all of the granted device operations have overprivileged access to them, (2) apps might pose a higher risk to users than needed because physical device operations are fundamentally risk-asymmetric — “door.unlock” provides access to burglars, and “door.lock” can potentially lead to getting locked out. Overprivileged apps with access to mixed-risk operations only increase the potential for damage. We present Tyche, a secure development methodology that leverages the risk-asymmetry in physical device operations to limit the risk that apps pose to smart home users, without increasing the user’s decision overhead. Tyche introduces the notion of risk-based permissions for IoT systems. When using risk-based permissions, device operations are grouped into units of similar risk, and users grant apps access to devices at that risk-based granularity. Starting from a set of permissions derived from the popular Samsung SmartThings platform, we conduct a user study involving domain-experts and Mechanical Turk users to compute a relative ranking of risks associated with device operations. We find that user assessment of risk closely matches that of domain experts. Using this insight, we define risk-based groupings of device operations, and apply it to existing SmartThings apps. We show that existing apps can reduce access to high-risk operations by 60% while remaining operable. 
    more » « less
  3. Collecting, storing, and providing access to Internet of Things (IoT) data are fundamental tasks to many smart city projects. However, developing and integrating IoT systems is still a significant barrier to entry. In this work, we share insights on the development of cloud data storage and visualization tools for IoT smart city applications using flood warning as an example application. The developed system incorporates scalable, autonomous, and inexpensive features that allow users to monitor real-time environmental conditions, and to create threshold-based alert notifications. Built in Amazon Web Services (AWS), the system leverages serverless technology for sensor data backup, a relational database for data management, and a graphical user interface (GUI) for data visualizations and alerts. A RESTful API allows for easy integration with web-based development environments, such as Jupyter notebooks, for advanced data analysis. The system can ingest data from LoRaWAN sensors deployed using The Things Network (TTN). A cost analysis can support users’ planning and decision-making when deploying the system for different use cases. A proof-of-concept demonstration of the system was built with river and weather sensors deployed in a flood prone suburban watershed in the city of Charlottesville, Virginia. 
    more » « less
  4. With the acceleration of ICT technologies and the Internet of Things (IoT) paradigm, smart residential environments , also known as smart homes are becoming increasingly common. These environments have significant potential for the development of intelligent energy management systems, and have therefore attracted significant attention from both academia and industry. An enabling building block for these systems is the ability of obtaining energy consumption at the appliance-level. This information is usually inferred from electric signals data (e.g., current) collected by a smart meter or a smart outlet, a problem known as appliance recognition . Several previous approaches for appliance recognition have proposed load disaggregation techniques for smart meter data. However, these approaches are often very inaccurate for low consumption and multi-state appliances. Recently, Machine Learning (ML) techniques have been proposed for appliance recognition. These approaches are mainly based on passive MLs, thus requiring pre-labeled data to be trained. This makes such approaches unable to rapidly adapt to the constantly changing availability and heterogeneity of appliances on the market. In a home setting scenario, it is natural to consider the involvement of users in the labeling process, as appliances’ electric signatures are collected. This type of learning falls into the category of Stream-based Active Learning (SAL). SAL has been mainly investigated assuming the presence of an expert , always available and willing to label the collected samples. Nevertheless, a home user may lack such availability, and in general present a more erratic and user-dependent behavior. In this paper, we develop a SAL algorithm, called K -Active-Neighbors (KAN), for the problem of household appliance recognition. Differently from previous approaches, KAN jointly learns the user behavior and the appliance signatures. KAN dynamically adjusts the querying strategy to increase accuracy by considering the user availability as well as the quality of the collected signatures. Such quality is defined as a combination of informativeness , representativeness , and confidence score of the signature compared to the current knowledge. To test KAN versus state-of-the-art approaches, we use real appliance data collected by a low-cost Arduino-based smart outlet as well as the ECO smart home dataset. Furthermore, we use a real dataset to model user behavior. Results show that KAN is able to achieve high accuracy with minimal data, i.e., signatures of short length and collected at low frequency. 
    more » « less
  5. Abstract Abstract: Users trust IoT apps to control and automate their smart devices. These apps necessarily have access to sensitive data to implement their functionality. However, users lack visibility into how their sensitive data is used, and often blindly trust the app developers. In this paper, we present IoTWATcH, a dynamic analysis tool that uncovers the privacy risks of IoT apps in real-time. We have designed and built IoTWATcH through a comprehensive IoT privacy survey addressing the privacy needs of users. IoTWATCH operates in four phases: (a) it provides users with an interface to specify their privacy preferences at app install time, (b) it adds extra logic to an app’s source code to collect both IoT data and their recipients at runtime, (c) it uses Natural Language Processing (NLP) techniques to construct a model that classifies IoT app data into intuitive privacy labels, and (d) it informs the users when their preferences do not match the privacy labels, exposing sensitive data leaks to users. We implemented and evaluated IoTWATcH on real IoT applications. Specifically, we analyzed 540 IoT apps to train the NLP model and evaluate its effectiveness. IoTWATcH yields an average 94.25% accuracy in classifying IoT app data into privacy labels with only 105 ms additional latency to an app’s execution. 
    more » « less