<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>ChatterHub: Privacy Invasion via Smart Home Hub</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10298285</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the 2021 IEEE Conference on Smart Computing (SmartComp)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Omid Setayeshfar</author><author>Karthika Subramani</author><author>Xingzi Yuan</author><author>Raunak Dey</author><author>Dezhi Hong</author><author>Kyu Hyung Lee</author><author>In kee Kim</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Smart-home devices promise to make users’ lives more convenient. However, at the same time, such devices increase the possibility of breaching users’ privacy as they are tightly connected to the users’ daily lives and activities. To address privacy invasion through smart-home devices, we present ChatterHub. This novel approach accurately identifies smart-home devices’ activities with minimal monitoring of encrypted traffic in the home network. ChatterHub targets devices that can only connect to the Internet through a centralized smart-home hub (e.g., Samsung SmartThings) using Zigbee or Z-wave. Specifically, ChatterHub passively eavesdrops on encrypted network traffic from the huband leverages machine learning techniques to classify events and states of smart-home devices. Using ChatterHub, an adversary can identify smart-home devices’ specific activities without prior knowledge of the target smart home (e.g., list of deployed devices,types of communication protocols). We evaluated the accuracy and efficiency of ChatterHub in three real-world smart-home environments, and the evaluation results show that an attacker can successfully disclose smart-home devices’ behaviors with over 88% F1 score. We further demonstrate that ChatterHub successfully recognizes privacy-sensitive activities, including open and close of a smart door lock and turn on and off of smart LED. Additionally, to mitigate the threats posed by ChatterHub, we introduce two approaches, packet padding and random sequence injection. These mitigation approaches can effectively prevent threats from ChatterHub with only 9.2MB of additional network traffic per day.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>The blooming of the Internet of Things (IoT) promotes massive smart-home devices to become connected to the Internet, with an estimate of 10 smart devices per home on average in 2020 <ref type="bibr">[1]</ref>. We expect the number of installed smart-home devices to reach 75 billion by 2025. Smart-home devices promise to make the user's daily life more convenient. According to a recent study <ref type="bibr">[2]</ref>, the main reason for smart device purchase is convenience, as users can easily control and monitor smart-home devices over the Internet. Most smarthome devices can be accessed via smart apps on smartphones or smart-home platforms. e.g., "Front door unlocked at 13:52 by code A", "Motion detected in living room at 17:03 ".</p><p>However, this convenience comes at a cost. For example, an adversary with access to smart-home devices' state information (such as what is triggered or used and when), could acquire sensitive information about the users and their activities. These device states often contain the users' activities in their living space, and the adversary can exploit it to commit &#167; Equal contribution further offenses, such as burglary and aggravated robbery. Indeed, cybercriminals are increasingly targeting smart-home devices <ref type="bibr">[3]</ref>. Recent studies <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref> demonstrated privacy invasion problems present in smart-home devices. For example, Peek-a-Boo <ref type="bibr">[4]</ref> showed that attackers could identify smarthome devices' states and actions by passively listening to the wireless around a smart-home. Apthorpe et al. <ref type="bibr">[5]</ref> showed an Internet Service Provider (ISP) could learn privacy-sensitive information from smart-home devices by analyzing traffic.</p><p>This work presents a novel method to attack smart-homes, called ChatterHub, enabling an adversary to infer smart home events and user activities by sniffing encrypted network traffic to/from a target home, even though devices are hidden behind a smart-hub (e.g., Samsung SmartThings <ref type="bibr">[6]</ref>) and do not directly connect to the Internet. ChatterHub requires neither physical proximity to the target home nor prior knowledge of its setup (e.g., list or topology of smart-home devices), making attacks on smart-homes more feasible.</p><p>The intuition behind designing ChatterHub is that users' activity routine in a smart home can trigger smart devices, manifesting as distinct patterns in the network traffic, albeit encrypted, and hence the users' activities and smart devices' events are discoverable and learnable. To infer smart-home devices' events, ChatterHub employs a classification model trained with traffic patterns of popular smart-home devices and hubs. The adversary can further train ChatterHub with their own devices by providing network packet traces and event logs to the training platform. ChatterHub automatically partitions the network trace with our novel segmentation algorithms and feeds the segmented traces (with event labels parsed from the event logs) into machine learning models to detect smart home devices' events. This way, the attacker can infer the occupancy pattern of the home by analyzing the event timing and patterns.</p><p>We have evaluated the accuracy and effectiveness of Chat-terHub on real-world testbed environments with Samsung SmartThings hub and 14 smart-home devices. The results show ChatterHub can successfully discover the capabilities and events of the devices, e.g., lock, switch, or motion based on their encrypted traffic, and reveal users' daily routines by tracking devices' activity, including changes in lock's state, smart LED's state (i.e., on&#8594;off, off&#8594;on), and multi-purpose sensor's states (i.e., detecting motion on doors or windows).</p><p>In summary, this paper makes the following contributions: 1) We explore a new adversarial approach against smarthome devices hidden behind a smart-hub, which could leak critical user's privacy, including households' daily routine.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. ADVERSARY MODEL, ASSUMPTION, AND GOAL</head><p>We assume that an attacker only passively sniffs encrypted network packets from/to the target home. In this work, we consider three potential points at which the attacker can eavesdrop on network traffic. First, the attacker can gain access to the traffic from a compromised router. Second, the attacker can eavesdrop on network traffic from the home router's uplink traffic. Third, the attacker can be the one who can monitor the network traffic of the target home, Considering these scenarios, encryption remains the only form of protection for users' data. Nonetheless, our adversary model is a passive attacker who collects encrypted network traffic (e.g., TLS/SSL). The attacker can only observe the size of each incoming and outgoing packet, the source and destination IPs, and timestamps. In addition, the attacker does not rely on decoding or interpreting the information inside traffic packets.</p><p>We also assume the attacker has access to a trained model or can collect his data from a hub and desired devices to train a model. However, the attacker does not require prior knowledge of a targeted smart-home topology or devices deployed. The Goal of the Adversary. Once the network packet traces from the target home are obtained, the adversary proceeds to leverage a classification model, provided by ChatterHub or trained on the attacker's own hub and devices. By doing so, the adversary can understand the pattern of network traffic generated by the smart-home devices of interest. We consider that the attacker can achieve the following goals (but not limited to):</p><p>&#8226; Scout Attack. The attacker targets a range of IP addresses to find vulnerable home routers, similar to Mirai attack <ref type="bibr">[7]</ref>.</p><p>After gaining access to the routers, the attacker analyzes traffic either in the routers or through a virtual redirection The first type of device can directly connect to the access point. On the other hand, devices in the second category cannot connect to the Internet directly, so they require a smart-home hub to manage communications among devices. Additionally, since the second type of devices is hidden behind the hub, they are considered to be more secure against remote attackers <ref type="bibr">[10]</ref>.</p><p>A large body of work <ref type="bibr">[5]</ref>, <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref> studied security and privacy of the first type of devices, while the security of home automation network devices (the second type of devices) has gained little attention. This work focuses on the second type for their high market share and diversity <ref type="bibr">[14]</ref>, <ref type="bibr">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. SYSTEM DESIGN</head><p>Minimally intrusive monitoring is the most important goal of ChatterHub as the adversary only requires access to the network traffic from/to home. Obtaining access at this level is ascertained to be relatively simpler compared to using eavesdropping devices that have to be placed near the target devices <ref type="bibr">[5]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>.</p><p>Fig. <ref type="figure">1</ref> illustrates an overview and the control flow of ChatterHub. In ChatterHub's training, all the communication from the devices are transmitted through the hub. We collect these communication packets through 1) accessing the cloud backend logs and 2) monitoring the network traffic. Network traffic will be passed to a segmentation module, which separates network traces into sequences associated with events.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Training Data Collection</head><p>We first collect network packets to/from our smart-home setup and smart-devices' event logs, then label them for model  <ref type="table">II</ref> (capabilities) shows events associated with each capability. In our dataset, an event is represented as the combination of a capability and its event (e.g., switch-on, lockunlocked). We used the following setups for model training. 1) Single device. We connect a single device to the hub and observe the network traffic generated. This is to understand the unique traffic patterns generated by each device. 2) Multiple devices. We connect multiple devices to the hub and monitor traffic concurrently generated by all devices; we use this data to train our model with a more realistic setup. For example, we observed packets (generated by multiple device events) often overlapped each other. We connect not only smart-home devices to the hub, but also other home appliances (e.g., computers, tablets, smartphones) to the router to create more realistic traffic. 3) Only the hub. We also observe the network traffic from an isolated hub's (with no other devices attached) operations to understand the hub's behaviors (e.g., firmware update). We connect Wireshark installed on a laptop to Samsung SmartThings hub through a bridged network to monitor the network traffic. We obtain event labels from the logs delivered through the hub. Samsung SmartThings hub stores event logs (e.g., all events and commands sent to/by smart-home devices along with timestamps). We collect the logs regularly by using "Simple Event Logger" <ref type="bibr">[18]</ref> provided by the manufacturer. We have collected over 200,000 network packets from the smarthub with over 60,000 event logs and use them for training the classification model in ChatterHub.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Trace Segmentation, Labeling, and Feature Extraction</head><p>We first design a method to filter out network packets that are not related to the SmartThings hub (i.e., packets generated by PCs or tablets), and then we perform packet segmentation. In that case, traffic is segmented based on each session, and each device's traffic can be separated based on their unique destination and source IP addresses. However, the challenge we encounter is that all communications go through the hub, and there is a lack of discerning parameters. Thus, it is not possible to partition packets based on the network flow information. Also, the communication interval in the sequence of packets between two events is relatively large compared to the interval between packets sent for a single event. Thus, we apply a segmentation method to divide the network traces into small bursts of packets. To segment the network flows into separate bursts, we try to leverage approaches from previous studies <ref type="bibr">[16]</ref>, <ref type="bibr">[19]</ref> that use a fixed threshold of 4.5 seconds to segment network packets into multiple bursts. Previous works show that 4.5 seconds is enough for the communication between a client and server to complete packets exchange. However, we observed that the time gap between packet exchange for a single event could last longer than 4.5 seconds. Fig. <ref type="figure">2</ref> shows a case where a fixed-threshold approach fails to separate the level change event from other events. Also, events of the hub (e.g., ping, status) can occur along with other device events within an interval of shorter than 4.5 seconds.</p><p>As such, the segmentation based on a fixed threshold often fails to correctly segment the device events from other packets (e.g., ping and status). Therefore, we develop a dynamic segmentation technique using change point detection <ref type="bibr">[20]</ref> to segment these packets into bursts correctly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dynamic Change Point Detection (CPD).</head><p>A change point is a temporal point when the statistical properties of its previous and subsequent time points are different. In our smart-home Fig. <ref type="figure">2</ref>: Fixed Segmentation vs. Changepoint Segmentation setup, the network packets for a single event are issued in short intervals compared to the intervals between two distinct events. Therefore, a change point will be when a sequence of packets for a single event starts or ends. Since our logs are collected over a long time, multiple change points need to be identified to segment all events. CPD is an approach to find abrupt changes in time-series <ref type="bibr">[20]</ref>. CPD can also be used for estimating the temporal point when the statistical properties of a sequence change <ref type="bibr">[21]</ref>. ChatterHub employs PELT (Pruned Exact Linear Time ) <ref type="bibr">[21]</ref> because it is computationally efficient and outperforms other exact CPD search methods <ref type="bibr">[22]</ref>. We present detailed evaluation results of Dynamic CPD and fixed threshold segmentation algorithms on our dataset in &#167;IV-A.</p><p>Labeling. After we segment the packets from network traces into different bursts, we obtain event labels from the hub's logs. We use timestamps to align the labels and the segmented trace. However, we observe that slight time differences between generation of event log and packet capture can occur. Hence, we allow for &#177;5-the second difference between the two; then, we map the event to a specific burst of packets. We also observe special cases where a single user activity enables multiple events in a device. For example, a "switch on" user event from the app triggers two events (switch-on and level-change). Therefore, a single burst of packets could be mapped to multiple events. We also observe that a number of segments are not associated with any labels (i.e., no logs from the hub and Comm-Server) so that we label them as unknown.</p><p>To characterize the unknown packets, we further analyzed the source code of device handlers <ref type="bibr">[23]</ref> and found that the handler generates the event logs, and some of the handlers do not emit any logs. We found that most of the missing events are less important for the user (e.g., device refresh, device ping).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Feature Extraction.</head><p>For feature extraction, we begin by forming a signature via fetching the frame length of multiple packets in each segment. We then use this signature as the feature for our classifier. These signatures show a significant amount of collision across different classes. These collisions are the result of events happening in small intervals or events that some happen together, e.g., when a user opens a door, both contact-open and status-open events occur concurrently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Classification Models</head><p>We train classification models using extracted network features to classify smart home network traffic into smart home devices' capabilities and events. Given the dynamic nature of the data, we consider the following machine learning models: 1) Random Forest, 2) OneVsRest classifier, and 3) SEQ2SEQ. Random Forest (RF) Model. RF constructs an ensemble of decision trees by taking a random subset of the features to decide a node split in building each tree. We only use RF as a baseline to identify better algorithms because RF largely depends on the training data's completeness. OneVsRest Classifier. A key characteristic of our data is that a single traffic segment may contain the data related to multiple capabilities that usually occur together or were subsequently activated. Therefore, we need a multi-class classifier that can identify all the classes in segmented traffic. As a result, we follow the one-vs-rest strategy that uses a classifier for each class fitted against all other classes. This method ensures that each classifier is independently optimized to identify features for the corresponding class. As this entails a large number of classifiers, we use XGBoost (Extreme Gradient Boosting) . XGBoost is an ensemble that applies Gradient boosting on decision trees to boost the performance of the various models <ref type="bibr">[24]</ref>- <ref type="bibr">[26]</ref>. In this project, we use the XGBClassifier of XGBoost library <ref type="bibr">[27]</ref> with its default parameters. We use CountVectorizer as vectorizer, with ngram range from 1 to 4 so that the relationship between the packets in the sequence is maintained. The output of the vectorizer is directly fed into the XGBoost model. SEQ2SEQ Model. Sequence-to-sequence (SEQ2SEQ) model solves sequential problems. The input to SEQ2SEQ is a series of data units, and the output is also a sequence of data units <ref type="bibr">[28]</ref>. SEQ2SEQ model is applied to address various problems in multiple disciplines. Specifically, SEQ2SEQ caught our attention because of its application in natural language translation, for which the input is usually a sentence and the output is a sentence in a different language. In our model, we have a sequence of package lengths, and the output is a sequence of events. We use sequences of capabilities and events as labels. It is worth noting that SEQ2SEQ is a model to translate natural languages, so the order of the sequence will affect the result. We maintain the original order from ground truth, even if one label appears multiple times. The SEQ2SEQ framework contains two main components: an encoder and a decoder. The encoder reads the input, and the decoder translates the encoder's output to a final sequence of outputs <ref type="bibr">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EVALUATION RESULTS</head><p>We evaluate ChatterHub with real-world smart-home environments. In the smart-home setup, we deploy a set of smart-home devices and other Internet-connected devices (e.g., laptops, smartphones), and then we connect them to the hub. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Network Trace Segmentation</head><p>We perform trace segmentation on the captured traffic to partition the overall traffic flow between the hub and cloud servers (e.g., Comm-Server) into a set of small bursts, which map to specific commands. Therefore, ChatterHub first needs to identify the IP address of the target hub, and then it performs network trace segmentation, which will generate a set of proper packets related to a specific command/event from the devices at a time. An accurate segmentation will ensure that each packet burst contains a negligible amount of noise packets. Note that noise or noisy packets indicate unknown packets or packets for the hub's status report. The hub randomly sends these packets to the cloud servers. Identifying the IP address of the Hub. We monitor all network traffic from and to the target home router and identify the hub's IP using the pattern signature of "hub's ping" events. While the hub keeps changing the IP address of Comm-server from time to time (usually over days), we can successfully identify the IP address of Comm-Servers. Then, we can extract necessary traffic between the hub and Comm-Server (excluding the traffic from other devices in the home). Network Trace Segmentation. As we discussed in &#167;III-B, we develop a PELT-based Dynamic Change Point Detection (CPD) algorithm to segment the network traffic.</p><p>The PELT algorithm can be used with different cost functions, and it takes the output of the cost function as a penalty value, which affects the segmentation results. We compare the two most dominant cost functions, least squared deviation (L2) and kernalized mean change with radial basis function (RBF) kernel by running their output through our baseline model. Fig. <ref type="figure">3</ref> shows the results of classification for different parameters. We use PELT (RBF cost function and penalty value of 0.2), which achieved the best 1 and precision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Evaluation in Smart-home Environments</head><p>To evaluate ChatterHub in the real world, we set up three smart-home environments at three homes along with other devices and record the network traffic from their home router for a total of 10 days.</p><p>We train the classification models with data collected from the lab setting (explained in &#167;III-A) plus the data obtained from one of three home configurations. We then test the model on data from two remaining smart homes not used for training.</p><p>After the model training, we conduct two experiments; 1) an attacker tries to infer the capabilities of devices, and 2) an attacker tries to detect specific events of those capabilities. For example, the attacker will be made aware of "switch" being present and used in the first experiment's target home. The attacker will then infer if a "switch on" or "switch off" has happened in the second experiment.</p><p>It is worth noting that when we test the classification models, we add more sensors and devices (e.g., water sensor), which do not exist in the training dataset, to two test smarthomes to test the scenario where the attacker does not have a list of installed devices in the target home. Classification Accuracy: We generate the ground truth for two different sets of labels (capabilities and events) so that we can train our classification models on both data sets to classify capabilities and events separately. Table <ref type="table">III</ref> reports the classification accuracy (recall, and 1 -score) of events from each device, such as switch-on, switch-off, motion-active and motion-inactive. If some devices in the target home have not been used in the model training, ChatterHub categorizes the events and capabilities belonging to this device as unknown. This is also observed from our results in the case of water capability, as shown in Table <ref type="table">IV</ref>; where "0" for water sensor activities means water sensor was not used while training.</p><p>Overall, our classifiers generate multi-label outputs, indicating that a single segment of traffic packets can be classified into more than one class. Thus, our models identify multiple activities happening concurrently without an explicit time gap in the transmission of the network packets. The classification results reported in Table <ref type="table">III</ref> are the accuracy of each class. To decide the models' overall performance, we calculate the micro average score for 1 and recall ( ) <ref type="bibr">[29]</ref>, which takes into consideration the imbalanced class sizes. calculates a 1 -score across different classes by adding up their </p><p>, where , , , indicate true positive, true negative, false positive, and false negative, respectively.</p><p>denotes the number of false positives for the &#8462; class. We report this result as Average in Table <ref type="table">III</ref>. However, since the focus of our system is to detect known device activities, we calculate the for only the known classes and exclude unknown classes from the computation. The average results are reported as known-average.</p><p>Among three classification models, RF (the baseline model) shows the lowest recall, and precision III. Overall, SEQ2SEQ gives us the highest 1 -score (0.81) for known-average, compared to the XGBoost model's 1 result of 0.76. Although the XGBoost model has a higher individual 1 -score for some of the capabilities and events, the average 1 -score is lower because of higher false positive cases resulting in lower precision score. On other hand, SEQ2SEQ shows higher precision results, indicating that SEQ2SEQ's accurate performance for identifying the events of devices. Based on this observation, this limitation of SEQ2SEQ in identifying some activities is in overlapped packet sequences. But XGBoost is more resilient to such noise in the data <ref type="bibr">[30]</ref>. Hence, it shows higher accuracy in the presence of overlapped of packets. However, XGBoost's misclassification is a result of signature conflicts between multiple activities from a same device. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Effect of</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. In-Depth Analysis of Smart-Home Results</head><p>Our threat model is based on attackers' capabilities to monitor network traffic in a smart home to infer the smart home devices' activities and the user's behavior. Therefore, while our efforts are to create a model that works best in all scenarios, we demonstrate our model is useful for attackers to identify private information about the user and her home correctly. In this section, we explain such cases in detail. Target Classification Model for Specific Devices. We discuss how an attacker can use a specific model to obtain more accurate information on a targeted device from its capabilities. In this case, we train our XGBoost model only to detect three Identifying Recurring Patterns. Fig. <ref type="figure">6</ref> shows the activities of a smart lock at various times of day. We measured the device's activities for 5 consecutive days. The results show that at 11:00 and at 23:00, the lock had the events on multiple days at the same time. Based on this observation, the attacker can infer the homeowner's daily schedule. Hence, with further analysis of such patterns, the classification results could reveal information on the smart-home devices and the users.</p><p>Another example is the switch-on/off events reported in Table III. 1 -scores of these events by XGBoost are 0.38 (switch on) and 0.80 (switch-off). Although 1 -scores are less than 0.8, ChatterHub can still identify user actions with light switches (e.g., user turning lights on/off). Fig. <ref type="figure">5</ref> shows ChatterHub correctly identifies 20 out of 25 events. ChatterHub only has three misclassifications (i.e., event on recognized as off, and vice versa) and two false detections (i.e., non-switch events recognized as switch events but part of the switch device itself). Further, the patterns of on/off events provide more confidence in the actual presence of a smart light in the home.   V. DISCUSSIONS Mitigation Approach. ChatterHub identifies the events of devices by monitoring encrypted packets, including the size of each packet, the order of packets in a sequence, and the timing of sequences. Packet padding <ref type="bibr">[31]</ref> is an intuitive and effective mitigation method against ChatterHub. It generates packets with identical lengths by adding additional bytes at each packet's end, i.e., padding. Packet padding can effectively hinder ChatterHub and other similar attack methods. We implement packet padding in our testing router, and it pads each packet in a sequence to 1KB. We evaluate packet (space) overhead caused by the padding with three traces collected from our testbed. The result shows that only a negligible amount of traffic is generated by this method (on average, 9.2MB per day). Furthermore, we develop a random sequence insertion method for diluting the effect of sequence timing by irregularly generating 1Kb packets to the network. When deploying both packet padding and random sequence insertion methods, an additional 10MB of network traffic is additionally generated per day, and more than 80% collision is observed in the classification process; thus, the attacker will have a very low chance of learning patterns from the network traffic. Overlapped Packets. As we discussed in &#167;III-A, the overlapping of packet sequences is one of the major challenges to accurate classification. Suppose the target home has a larger number of smart-home devices than our experiment setup. In that case, there will be more chances for overlapping of packet sequences, implying that ChatterHub's classification results can be less accurate. However, our setup conservatively constitutes realistic smart-home setups as we deploy many devices (14+) that repeatedly generate network traffic, so we believe there will be minimal impact on the classification accuracy with more devices. Another potential limitation is when the target home has multiple devices of the same type, ChatterHub cannot tell which one contributes to the detected capability. For example, if the target home has two identical smart lock devices installed on two separate doors, the attacker would be able to recognize all the lock activities but cannot distinguish one lock from the other.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. RELATED WORK</head><p>Most of the research works that follow fingerprinting smarthome devices from network traffic focus on independent devices that directly connect to WiFi <ref type="bibr">[32]</ref>- <ref type="bibr">[37]</ref> unlike our setup where devices are connected to a central device (hub).</p><p>These works also requires tapping to the local network for information on individual devices <ref type="bibr">[38]</ref>, <ref type="bibr">[39]</ref> unlike our threat model where the network tapping can be acquired remotely.</p><p>Pingpong <ref type="bibr">[40]</ref> proposes packet-level network traffic analysis to identify activities of smart-home devices. Similar to ChatterHub, Pingpong analyzes packet-level traffic to create unique signatures for smart home devices' activities. However, they only study WiFi-connected devices, but our focus is smart-home devices hidden behind the hub, increasing the complexity of network traffic. We observe that many securitycritical devices (e.g., smart lock, motion sensor, smart switch) are hidden behind the hub to be more secure.</p><p>HoMonit <ref type="bibr">[16]</ref> is a smart-home monitoring system that identifies misbehaving smart apps. They have analyzed encrypted network traffic between the hub and smart-home devices to fingerprint each device. However, it requires tapping into the network between the hub and the devices. Similarly, Peeka-Boo <ref type="bibr">[4]</ref> focuses on capturing the traffic between devices and a hub. However, our work focuses on the communication between a hub and the cloud servers (e.g., Auth-Server and Comm-Server). Due to hub devices' inter-operability, fingerprinting devices by analyzing the hub and server communications becomes complicated and difficult, compared to the encrypted traffic analysis done on HoMonit. Also, Zhou et al. <ref type="bibr">[41]</ref> investigated potential security flaws in communications between the smart-home devices and the cloud servers, but this work does not focus on identifying smart-home devices' activities inferred from encrypted traffic.</p><p>A number of works focus on the security analysis and improvement for smart-home applications <ref type="bibr">[15]</ref>, <ref type="bibr">[42]</ref>- <ref type="bibr">[47]</ref> where they discovered security vulnerabilities, i.e., private data leakage, privilege abuse, and malicious activities. Other studies focus on the analysis of information flow among smart apps, cloud backend, and IoT devices to discover vulnerabilities in the chain of information transfer <ref type="bibr">[11]</ref>, <ref type="bibr">[48]</ref>- <ref type="bibr">[50]</ref>.</p><p>While existing solutions mainly focus on preventing the leak of sensitive data from the context of smart apps, cloud backend, and/or the smart-home platform, this work demonstrates that an adversary can still infer activities and states of smarthome devices by eavesdropping on encrypted network traffic.</p><p>Apthorpe et al. <ref type="bibr">[31]</ref> proposed an approach to mitigate network sniffing attacks in a smart-home environment and suggested routing the network traffic through VPNs and injecting fake packets to confuse the attackers. Yoshigoe et al. <ref type="bibr">[51]</ref> proposed to generate synthetic packets that prevent adversaries from fingerprinting smart-home devices. A more sophisticated method using differential privacy and adversarial machine learning has been suggested by <ref type="bibr">[52]</ref>. There are many studies to extract information from encrypted network traffic, such as extracting video content <ref type="bibr">[48]</ref>, demographic information <ref type="bibr">[53]</ref>, detecting packets generated from specific application <ref type="bibr">[54]</ref>, measuring the quality of service <ref type="bibr">[55]</ref>, analyzing smart-home tenant behavior <ref type="bibr">[56]</ref>, and extracting the location information <ref type="bibr">[57]</ref>. Also, there are approaches <ref type="bibr">[58]</ref>, <ref type="bibr">[59]</ref> to build multipurpose tools to facilitate analysis of encrypted network traces. These works are orthogonal to ChatterHub.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. CONCLUSION</head><p>In this paper, we present ChatterHub, a novel attack method that can correctly identify smart-home devices' capabilities with passive sniffing of encrypted home network traffic. With ChatterHub, an attacker does not need any prior knowledge of the target home. Our evaluation results from three realistic smart-home environments show that the attacker can successfully recognize smart-home devices' capabilities from the encrypted network traffic. This, in turn, leads the attacker to discover device behaviors, such as door being locked or motion in the room. Such information can be used to reveal a household's daily routine. We also demonstrate two mitigation techniquespacket padding and random sequence injectionthat can effectively protect the smart-home from ChatterHub.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>level On On Off On Off On Off On On Off On level Off On On</p></note>
		</body>
		</text>
</TEI>
