<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>LabelMerger: Learning Activities in Uncontrolled Environments</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/01/2019</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10141792</idno>
					<idno type="doi">10.1109/TransAI46475.2019.00019</idno>
					<title level='j'>2019 First International Conference on Transdisciplinary AI (TransAI)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Seyed Iman Mirzadeh</author><author>Jessica Ardo</author><author>Ramin Fallahzadeh</author><author>Bryan Minor</author><author>Lorraine Evangelista</author><author>Diane Cook</author><author>Hassan Ghasemzadeh</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[While inferring human activities from sensors embedded in mobile devices using machine learning algorithms has been studied, current research relies primarily on sensor data that are collected in controlled settings often with healthy individuals. Currently, there exists a gap in research about how to design activity recognition models based on sensor data collected with chronically-ill individuals and in free-living environments. In this paper, we focus on a situation where free-living activity data are collected continuously, activity vocabulary (i.e., class labels) are not known as a priori, and sensor data are annotated by end-users through an active learning process. By analyzing sensor data collected in a clinical study involving patients with cardiovascular disease, we demonstrate significant challenges that arise while inferring physical activities in uncontrolled environments. In particular, we observe that activity labels that are distinct in syntax can refer to semantically-identical behaviors, resulting in a sparse label space. To construct a meaningful label space, we propose LabelMerger, a framework for restructuring the label space created through active learning in uncontrolled environments in preparation for training activity recognition models. LabelMerger combines the semantic meaning of activity labels with physical attributes of the activities (i.e., domain knowledge) to generate a flexible and meaningful representation of the labels. Specifically, our approach merges labels using both word embedding techniques from the natural language processing domain and activity intensity from the physical activity research. We show that the new representation of the sensor data obtained by LabelMerger results in more accurate activity recognition models compared to the case where original label space is used to learn recognition models.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Activity recognition is a an active research area with the aim of automatically detecting physical activities performed by people in their daily living situations. The recognition of physical activities has become a task of significant interest within the field, in particular for medical and health-related applications such as in behavioral medicine. An application of activity recognition in behavioral medicine is to design interventions for individuals with, or at risk for, diabetes, obesity, or heart disease where the the individuals are often required to follow a well-defined exercise regimen as part of their treatment <ref type="bibr">[1]</ref>.</p><p>For activity recognition models to be reliable, it is critical to collect labeled sensor data in end-user settings. The process involves utilizing an active learning approach where end-users provide annotations/labels of the sensor data through a userinterface on their mobile device. However, labels provided by end-users in uncontrolled environments introduce unique challenges for learning reliable activity recognition models. Here we categorize those challenges into three broad groups:</p><p>&#8226; Spatial disparity: we recognize that different individuals can have different activity behaviors. When sensor data are labeled by end-users, the constructed activity vocabulary formed for one user can be different than that of another user. This inter-user (i.e., spatial) label disparity results in activity recognition models that cannot be used across different users. As a result, we need to construct an activity vocabulary for each user or aggregate labels gathered from a large group of users to account for crossuser behavior differences. &#8226; Temporal disparity: because we do not place any restrictions on the data collection and sensor annotation processes, users are not limited to expressing their activities according to a set of pre-defined labels. Therefore, a user can express the same activity differently at different times. This intra-user (i.e., temporal) disparity results in labels that are different in syntax but identical in semantic.</p><p>&#8226; Burden on user: we recognize that the process of data labeling is a burden on the user, in particular when the system in adopted by patients with chronic conditions. Therefore, it is important to develop activity recognition models using a small number of training instances labeled by users.</p><p>To deal with the challenges of label disparity, LabelMerger aims to restructure the label space of each user, or a group of users, by grouping labels that are semantically similar Fig. <ref type="figure">1</ref>: An example of restructured label space in LabelMerger. and are associated with activities of similar intensities. An example of such restructured label space in shown in Fig. <ref type="figure">1</ref> where 14 labels expressed by users are aggregated into three groups, shown in green, blue, and red in the new label space. The labels shown in this figure represent a subset of labels expressed by participants in our clinical study. For visualization, here a dimensionality reduction technique (e.g., PCA <ref type="bibr">[2]</ref>, t-SNE <ref type="bibr">[3]</ref>) is used to illustrate the clusters in a 2D coordinate. Because users use different expressions to describe their activity behavior, there exist a substantial amount of disparity in the data. As shown in Fig. <ref type="figure">1</ref>, users use words such as 'shop', 'buy', 'purchasing', 'at store', 'shopping', and 'buying' to express a particular activity behavior. Such label disparities not only occur across users but also exist within the same user at different times. Not addressing the problem of label disparity (i.e., treating each discrete label expressed by the user as a class label in the process of machine learning algorithm training) will result in an unnecessary increase in the number of classes and a decrease in the number of training instances within each class. This in turn will result in learning an activity recognition model that performs poorly because of the low quality training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. LABEL MERGER</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Problem Statement</head><p>Let D = {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x m ,y m )} be the data collected through the process of active learning where x i represents i-th input sensor data instance and y i represents the activity label associated with x i . The labels y i are drawn from the set L user = {a 1 , a 2 , ..., a n } of n discrete activity labels expressed by the user. Our goal is to construct a compact and meaningful label space</p><p>Having defined our input and desired output, we are interested in finding a mapping function &#934; : R n &#8594; R k that automatically transforms noisy labels in L user into k groups, each consisting of similar activity labels. Therefore, by applying our mapping function &#934; on the input labels L user , we will obtain k different groups of labels.</p><p>Since our machine learning task is activity recognition, a reasonable objective is to ensure that activities that reside in the same group in our final label space represent similar physical activities. This problem is naturally a clustering problem; however, we need to define appropriate features that quantify similarity/dissimilarity among various activity labels expressed by the user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Feature Design for Label Space</head><p>We propose to extract two broad sets of attributes in label space. The first set captures semantic meanings of the labels using word embedding while the second set incorporates physical attributes of human activities. Our feature vector uses word vectors to obtain meaning of each label as well as domain-specific measures such as metabolic equivalent of task (MET) value associated with each activity. The use of semantic meanings is motivated by spatial and temporal disparities among labels acquired by different users or/and at different time frames.</p><p>To construct the label space feature vector, instead of using atomic symbols to represent each word, we use their vector representations, which is a common approach to overcome limitations of using atomic symbols. This approach utilizes a window-based method where we count the number of times that each word appears within a window of a particular size centered around the word of interest. To this end, we use the GloVe algorithm <ref type="bibr">[4]</ref> and its available pre-trained vectors to convert words to vectors.</p><p>However, as depicted in Figure Fig. <ref type="figure">2</ref>(a), the GloVe algorithm, takes only the meaning of the labels into account and is not concerned about physical meaning/attributes of each activity. For example, it can be observed that 'swimming' and 'watching' (or 'swimming' and 'relaxing') belong to the same group while they are very different in terms of physical attributes, activity intensity, and their impact on physical health.</p><p>To address the limitation of using only semantic meaning when defining features in the label space, we propose to utilize a general form of 'domain knowledge' features which can be application-dependent. For example, when designing interventions for physical health, one may consider activity intensity as a measure of physical fitness and well-being. In contrast, activities such as 'reading', 'swimming', and 'watching' may need to be placed in the same group in the label space for such health interventions.</p><p>To incorporated the domain knowledge, we use a wellknown measure of human physical activities, namely MET (metabolic equivalent of task), as the sole feature used in our domain-knowledge feature portion of the feature vector computed in the label space. One motivation behind choosing MET is that there is already calculated values for nearly all common activities by Taylor Compendium of Physical Activity <ref type="bibr">[5]</ref>. However, our methodology presented in this research is </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Algorithm</head><p>Here, we introduce a formal procedure to transform noisy labels L user expressed by the user, to a target label set L merge in the new label space. For each activity label a i in L user , we perform the following tasks:</p><p>1) We obtain the equivalent word embedding of the activity labels in L user . 2) Because we might not have the MET value of the activity label in our MET database (e.g., there is no pre-defined MET value for 'at Walmart store'), we find semantically closest activity in the database and use its MET value during computation of the feature vector. In this study, we use cosine distance as a measure of similarity for two word vectors. Note that if we have the exact same to get K clusters. 12: return clustering labels as L merge activity in the MET database, the closest word will be the the given label itself.</p><p>3) We add the MET value of the label to our feature vector.</p><p>However, we use the factor &#955; to control the importance of domain-knowledge with respect to semantic meaning (i.e., word embeddings). A higher value of &#955; translates into a higher weight assigned to domain knowledge (e.g., physical activity information) factor while constructing a clustering of the labels.</p><p>After constructing feature vectors from all the noisy labels in L user , we use k-Means to obtain k clusters in the label space. Algorithm 1 shows the LabelMerger algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. EXPERIMENTS AND RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Data Collection</head><p>This study was reviewed and approved by the appropriate Institutional Review Boards. Participants were recruited from a single outpatient tertiary care clinic, as well as by word of mouth referrals. Participants were screened for study inclusion to ensure their eligibility. Each participant was trained about how to use smartphone device and respond to activity prompts. They were asked to charge the phone each night. The researchers sent an activity prompt to each participant as a test, and observed them demonstrate their ability to respond prior to beginning the data collection process. Participants were instructed to respond to as many prompts each day as possible, but to avoid responding or using the phone when driving or operating heavy machinery. They were also instructed how to add an activity to the list of activities in the Activity Learning application <ref type="bibr">[6]</ref>. Each participant was asked to provide labels in response to activity prompts for two weeks. The activity learning application was programmed to issue an activity prompt on the smartphone every 2 hours between 8:00am and 8:00pm daily. We used the data of 13 participants who had completed data collection by the time of conducting this data analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Learning Activity Recognition Model</head><p>For each acquired label, we assigned the label to a 5-second window of the signal segment. From each signal segment, we extracted various statistical features for gyroscope and accelerometer which has been shown effective in identifying daily living activities <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>. This allowed us to form a training dataset. To learn an activity recognition model using this dataset, we split the data into 80% for training and 20% for testing. Different classifiers that were used for classification include 'Random Forest' <ref type="bibr">[9]</ref>, 'Support Vector Machine' <ref type="bibr">[10]</ref>, and 'K-Nearest Neighbors' <ref type="bibr">[11]</ref> with K = 1 and K = 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Results</head><p>As shown in Table <ref type="table">I</ref>, by increasing the number of clusters in label merging, which translates into an increased number of classes for activity recognition, the machine learning task becomes more difficult. The hardest problem is the baseline approach where we do not perform any label merging and lean an activity recognition model to classify activities according to the initial labels expressed by each participant.</p><p>We compared the performance of the baseline approach to that of scenarios where the number of clusters are less that the number of initial classes (due to label merging). For each participant, we calculated all of the following different scenarios and reported the best performance:</p><p>&#8226; Using different number of clusters (2, 3 and 4) in addition to the baseline.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. CONCLUSION</head><p>We introduced several challenges that arise when deploying human activity recognition in real-world settings. In particular, we discussed that activity labels that are distinct in syntax can refer to semantically-identical behaviors when data collection occurs in uncontrolled environments. We proposed LabelMerger to restructure the label space by combining semantic meaning of activity labels with physical attributes of the activities to generate a flexible and meaningful representation of the labels. We showed that this approach is promising in improving activity recognition accuracy while maintaining a meaningful representation of the labels.</p></div></body>
		</text>
</TEI>
