<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Computational Framework for Modeling Biobehavioral Rhythms from Mobile and Wearable Data Streams</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/30/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10354150</idno>
					<idno type="doi">10.1145/3510029</idno>
					<title level='j'>ACM Transactions on Intelligent Systems and Technology</title>
<idno>2157-6904</idno>
<biblScope unit="volume">13</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>Runze Yan</author><author>Xinwen Liu</author><author>Janine Dutcher</author><author>Michael Tumminia</author><author>Daniella Villalba</author><author>Sheldon Cohen</author><author>David Creswell</author><author>Kasey Creswell</author><author>Jennifer Mankoff</author><author>Anind Dey</author><author>Afsaneh Doryab</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper presents a computational framework for modeling biobehavioral rhythms - the repeating cycles of physiological, psychological, social, and environmental events - from mobile and wearable data streams. The framework incorporates four main components: mobile data processing, rhythm discovery, rhythm modeling, and machine learning. We evaluate the framework with two case studies using datasets of smartphone, Fitbit, and OURA smart ring to evaluate the framework’s ability to (1) detect cyclic biobehavior, (2) model commonality and differences in rhythms of human participants in the sample datasets, and (3) predict their health and readiness status using models of biobehavioral rhythms. Our evaluation demonstrates the framework’s ability to generate new knowledge and findings through rigorous micro- and macro-level modeling of human rhythms from mobile and wearable data streams collected in the wild and using them to assess and predict different life and health outcomes.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The term biobehavioral rhythms introduced in <ref type="bibr">[19]</ref>, refers to the repeating cycles of physiological (e.g., heart rate and body temperature), psychological (e.g., mood), social (e.g., work events), and environmental (e.g., weather) that affect human body and life. Rooted in Chronobiology, "the scientific discipline that quantifies and explores the mechanisms of biological time structure and their relationship to the rhythmic manifestations in living matter" <ref type="bibr">[15]</ref>, biobehavioral rhythms aim at studying cyclic events observed in human data collected from personal and consumer level mobile and wearable devices <ref type="bibr">[19]</ref>. Such devices provide the capability of continuous tracking of biobehavioral signals of individuals in their daily life and outside of controlled lab settings which have been the standard method for studying biological rhythms.</p><p>Numerous research studies have shown the impact of understanding rhythms and their effect on human life and wellbeing. For example, studies in <ref type="bibr">[19,</ref><ref type="bibr">28,</ref><ref type="bibr">30]</ref> demonstrate the association between long-term disruption in biological rhythms and health outcomes such as cancer, diabetes, and depression. Other studies have shown the impact of shift work on the quality of life in shift workers such as nurses and doctors <ref type="bibr">[33,</ref><ref type="bibr">37]</ref>. These studies, however, have often been limited to controlled settings to observe certain behaviors and effects. With passive sensing of physiological and behavioral signals from mobile and wearable devices, it is now possible to study human rhythms more broadly and holistically in the wild through the collection of biobehavioral data from different sources. This opportunity, however, introduces new challenges. First, the longitudinal timeseries data collected from personal devices is massive, noisy, and incomplete requiring careful processing to extract and preserve useful fine-grained knowledge from data in various temporal granularity levels to be used for further modeling. Second, the fact that each data source (e.g., smartphone sensors) can capture different aspects of human rhythms (biological, behavioral, or both) requires exploration and incorporation of each signal to identify biological and behavioral indicators on the micro and macro level that may reveal a cyclic behavior. This process can be exhaustive and needs automation. Moreover, although the modeled rhythms by themselves can provide useful insights into human health and life, the exhaustive number of rhythm models generated by each source makes it difficult for manual interpretation of the models by researchers or experts. A further computational step should incorporate those models to provide further insights into different health and lifestyle outcomes both physical and mental.</p><p>We propose a computational framework to address the aforementioned challenges through a series of data processing and modeling steps. The framework first processes the raw sensor data collected from mobile and wearable devices to extract high-level features from those data streams. It then models biobehavioral rhythms for each sensor feature alone and in combination with other features to discover rhythmicity and other characteristics of cyclic behavior in the data. The biobehavioral rhythm models provide a series of characteristic features which are further used for measuring stability in biobehavioral rhythms and to predict different outcomes such as health status through a machine learning component. We evaluate the framework with two case studies. The first study uses mobile and Fitbit data collected from 138 college students over a semester to test the framework's ability to detect rhythmicity in students' data in different time frames over the course of the semester and to measure the stability and variation of rhythms among students with different mental health status. We then use the models of the rhythms to classify the mental health status of students at the end of the semester. The second study uses physio-behavioral data from 11 volunteers who wore OURA smart ring for 30 to 323 days. We test the framework's ability to detect long-term cycles in participants' biobehavioral data and to extract commonalities and differences in those cycles. We then use each person's significant cyclic periods in modeling individual rhythms and further predicting average daily readiness. Our research makes the following contributions:</p><p>A Computational Framework for Modeling Biobehavioral Rhythms 47:3</p><p>(1) We introduce a computational framework for modeling biobehavioral rhythms to the mobile and ubiquitous computing community that provides the ability to a) flexibly process massive sensor data in different time granularity thus providing the ability to model and observe short-and long-term rhythmic behavior; b) identify variation and stability in individual and groups of time series data; and c) help observe the impact of cyclic biobehavioral parameters in revealing and predicting different outcomes (e.g., health). <ref type="bibr">(2)</ref> We demonstrate the framework's ability to generate new knowledge and findings via rigorous micro-and macro-level modeling of human rhythms from mobile and wearable data streams collected in the wild and using them to assess and predict different life and health outcomes.</p><p>In the following sections, we describe related work in the domain of mobile health and behavior modeling and discuss the motivation for modeling cyclic human behavior and its potential role in revealing health status. We then present our computational framework followed by case studies in modeling biobehavioral rhythms and exploring the role of those models in predicting mental health and readiness. We discuss the feasibility and flexibility of the framework in incorporating different analytic approaches and providing insights for building rhythm-aware technology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND AND RELATED WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Biological Rhythms</head><p>The assessment of rhythmic phenomena in living organisms reveals the existence of events and behavior that repeat themselves in certain cycles and can be modeled with periodic functions <ref type="bibr">[15,</ref><ref type="bibr">54]</ref>. Each periodic function is specified by its average level, oscillation degree, and time of oscillation optimal. Biological rhythms, including patterns of activity and rest or circadian rhythms, have been extensively studied in Chronobiology and medicine <ref type="bibr">[19,</ref><ref type="bibr">28,</ref><ref type="bibr">30]</ref> mostly in controlled environmental settings.</p><p>The advancements in activity trackers have made it possible to study these phenomena outside of the labs and have demonstrated the reliability of such devices in capturing circadian disruptions, including sleep and physical and mental health conditions. For example, studies using research grade actigraphy devices have shown differences in circadian rhythms among patients with bipolar disorder, ADHD, and schizophrenia <ref type="bibr">[50]</ref>. Other studies have used the same type of data to explore circadian disruption in cancer patients undergoing chemotherapy <ref type="bibr">[50]</ref>. Commercial devices such as Fitbits are now able to infer sleep duration and quality reasonably accurately. Two brief studies with healthy young adults have used activity data from Fitbit devices to quantify restactivity rhythms and found that rhythm measurement compared well relative to research-grade actigraphy <ref type="bibr">[5,</ref><ref type="bibr">38]</ref>. Studies in <ref type="bibr">[64]</ref> and <ref type="bibr">[42]</ref> have also explored the capability of personal tracking devices to measure sleep compared to gold standards such as polysomnography.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Behavior Modeling in the Wild via Mobile Sensing</head><p>The study of biobehavioral rhythms also relates to research in understanding human behavior from passive sensing data collected via smartphones and wearable devices. Only few studies have actually used mobile data for understanding the circadian behavior of different chronotypes (e.g., <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref>). Abdullah et al. <ref type="bibr">[1]</ref> analyzed patterns of phone usage to demonstrate differences in the sleep behavior of early and late chronotypes. In a similar study using the same type of data, they showed the capability of using mobile data to explore daily cognition and alertness <ref type="bibr">[2,</ref><ref type="bibr">3]</ref> and found that body clock, sleep duration, and coffee intake impact alertness cycles.</p><p>Data from smartphones and wearable devices has extensively been used for modeling daily behavior patterns such as movement <ref type="bibr">[17]</ref>, sleep <ref type="bibr">[45]</ref>, and physical and social activities <ref type="bibr">[47]</ref> to understand their associations with health and wellbeing. For example, Medan et al. <ref type="bibr">[41]</ref> found that decreases in call, SMS messaging, Bluetooth-detected contacts, and location entropy (a measure of the popularity of various places) were associated with greater depression. Wang et al. <ref type="bibr">[63]</ref> monitored 48 students' behavior data for one semester and demonstrated significant correlations between data from smartphones and students' mental health and educational performance. In addition, Saeb et al. <ref type="bibr">[56]</ref> extracted features from GPS location and phone usage data and applied a correlation analysis to capture relationships between features and level of depression. They find that circadian movement (regularity of the 24h cycle of GPS change), normalized entropy (mobility between favorite locations), location variance (GPS mobility independent of location), phone usage features, usage duration, and usage frequency, were highly correlated with the depression score. Doryab et al. <ref type="bibr">[20]</ref> studied loneliness detection through data mining and machine learning modeling of students' behavior from smartphone and Fitbit data and showed different patterns of behavior related to loneliness, including less time spent off-campus and in different academic facilities as well as less socialization during evening hours on weekdays among students with the high level of loneliness.</p><p>Recent tools such as Rhythomic <ref type="bibr">[29]</ref> and ARGUS <ref type="bibr">[31]</ref> use visualization to analyze human behavior. Rhythomic is an open-source R framework tool for general modeling of human behavior, including circadian rhythms. ARGUS, on the other hand, focuses on visual modeling of deviations in circadian rhythms and measures their degree of irregularity. Through multiple visualization panes, the tool facilitates the understanding of behavioral rhythms. This work is related to our computational framework for modeling human rhythms. However, in addition to the underlying assumption of, and a focus on, circadian rhythms only, these tools primarily enable understanding of rhythms through visualization, whereas in our framework, we provide means for processing different data sources, extracting information from them, and discovering and modeling rhythms for each biobehavioral signal with different periods other than 24 hours. To our knowledge, this is the first computational framework to extract and incorporate the parameters obtained from rhythm models in a machine learning pipeline to predict different outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">COMPUTATIONAL FRAMEWORK FOR MODELING BIOBEHAVIORAL RHYTHMS</head><p>Our proposed framework (Figure <ref type="figure">1</ref>) incorporates data streams from mobile and wearable devices, including behavioral signals such as movement, audio, Bluetooth, WiFi, and GPS and logs of phone usage and communication (calls and messages); and biosignals such as heart rate, skin temperature, and galvanic skin response. These signals are processed, and granular features that characterize biobehavioral patterns such as activity, sleep, social communication, work, and movements are extracted. The data streams of biobehavioral sensor features are segmented into different time windows of interest and sent to a rhythm discovery component that applies periodic functions on each windowed stream of the sensor feature to detect their periodicity. The detected periods are then used to model the rhythmic function that represents the time series data stream for that sensor feature. The parameters generated by the rhythmic function are used in two ways. First, they are aggregated and further processed to characterize the stability or variation in rhythms over a certain time segment. Second, they are used as features in a machine learning pipeline to predict an outcome of interest (e.g., health status). The following sections provide details on the methods used in different components of the framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Time Series Segmentation</head><p>Windowing is one of the most frequently used processing methods for streams of data. A time series of length L is split into N segments based on certain criteria such as time. Our framework allows different ways to segment the time series, including the widely used tumbling windows,  which are a series of fixed-sized, non-overlapping and contiguous time intervals. We call each segment a time window (tw) which is a time series of length l, where l = L/N .</p><p>We also add a second segmentation layer to the time series where at each round k and starting point s (s = 1...N ), we allow to combine a sequence of k consecutive time windows (k = 1...N ) starting from time window s (tw s ) to generate time series of length k. We call these segments time chunks (tc). For example, in round k = 1, the tc 11 is a time chunk of length one and starting point of tw 1 and tc 12 is a time chunk of length one and starting point tw 2 , whereas for k = 3, the tc 32 is a time chunk of length three and starting point of tw 2 . Time chunks allow flexible modeling of rhythms in different time periods over the length of the time series. Figure <ref type="figure">2</ref> illustrates the time segmentation process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Detection of Rhythmicity</head><p>One of the first steps in modeling biobehavioral rhythms is identifying rhythmicity in time series data. We use two main methods for detecting and observing cyclic behavior: Autocorrelation and Periodogram.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Autocorrelation.</head><p>Autocorrelation is a reliable analytical method for recognizing periodicities <ref type="bibr">[21]</ref>. It calculates the correlation coefficient between a time series and its lagged version to measure their similarity over consecutive time intervals. Formally, the autocorrelation function (ACF) between two values y t , y t k in a time series y t is defined as</p><p>where k is the time gap and is called the lag <ref type="bibr">[46]</ref>. In each iteration, the two time series are shifted by k points until one third of the data is parsed. If the time series is rhythmic, the coefficient values increase and decrease in regular intervals, and significant correlations indicate strong periodicity in data. The autocorrelation sequence of a periodic signal has the same cyclic characteristics as the signal itself. Thus, autocorrelation can help verify the presence of cycles and determine the  periods. It has been empirically applied on various types of time series data from different fields and was shown to be dependable and exact in the tested situations <ref type="bibr">[48,</ref><ref type="bibr">57]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Periodogram.</head><p>A key step in the rhythm discovery process is the estimation of the length of the period for each rhythm. Many different techniques and algorithms for determining the period of a cycle have been developed, including the Fourier-transform based methods such as Fast Fourier Transform <ref type="bibr">[6]</ref>, Non-Linear Least Squares <ref type="bibr">[60]</ref>, and Spectrum Resampling <ref type="bibr">[14]</ref>. Other frequently used methods are Enright and Lomb-Scargle periodograms <ref type="bibr">[24,</ref><ref type="bibr">40]</ref>, mFourfit <ref type="bibr">[23]</ref>, Maximum Entropy Spectral Analysis <ref type="bibr">[11]</ref>, and Chi-Square periodograms <ref type="bibr">[59]</ref>. All of these methods come with different assumptions and with different levels of complexity <ref type="bibr">[53]</ref>. For example, Spectrum Sampling has outperformed the usual Fourier approximation methods and has shown more robustness towards non-sinusoidal and noisy cycles <ref type="bibr">[66]</ref>. It has also been used to detect changes in period length, which allows for the estimation of variance in different periods, as frequently observed in practice. These functionalities, however, have made the algorithm slow and computationally expensive <ref type="bibr">[66]</ref>.</p><p>Arthur Schuster used Fourier analysis to evaluate periodicity in meteorological phenomena and introduced the term 'periodogram' <ref type="bibr">[58]</ref>. The method was first applied to the study of circadian rhythms in the early 1950s to quantify free-running rhythms of mice after blinding <ref type="bibr">[35]</ref>. Periodograms provide a measure of strength and regularity of the underlying rhythm through the estimation of the spectral density of a signal. For a time series y t , t = 1, 2, . . . ,T , the spectral energy P k of frequency k can be calculated as <ref type="bibr">[52]</ref>:</p><p>The periodogram uses a Fourier Transform to convert a signal from the time domain to the frequency domain. A Fourier analysis is a method for expressing a function as a sum of periodic components and recovering the time series from those components. The dominant frequency corresponds to the periodicity in the pattern.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Modeling Rhythms</head><p>The next step in our framework is modeling the rhythmic behavior of a time series data, which is done via a periodic function. Each periodic function is among others specified by its period, average level (MESOR), oscillation degree (Amplitude), and time of oscillation optimal (Phase) <ref type="bibr">[34]</ref>. The following rhythm parameters can be extracted from the model generated by the periodic function (Figure <ref type="figure">3</ref>) <ref type="bibr">[13,</ref><ref type="bibr">25,</ref><ref type="bibr">38]</ref>:</p><p>&#8226; Fundamental period: Periodic sequences are usually made up of multiple periodic components. The fundamental period measures the time during an overall cycle. &#8226; MESOR is the midline of the oscillatory function. When the sampling interval is equal, the MESOR is equal to the mean value of all cyclic data points. &#8226; Amplitude (Amp) refers to the maximum value a single periodic component can reach. The amplitude of a symmetrical wave is half of its range of up and down oscillation. &#8226; Magnitude refers to the difference between the maximum value and the minimum value within a fundamental period. If a periodic sequence only contains one periodic component, amplitude equals half of the magnitude. &#8226; Acrophase (PHI) refers to the time distance between the defined reference time point and the first time point in a cycle where the peak occurs with a period of a single periodic component. &#8226; Orthophase refers to the time distance between the defined reference time point and the first time point in a cycle where the peak occurs with a fundamental period. When the time sequence only contains one periodic component, orthophase equals to acrophase. &#8226; Bathyphase refers to the time distance between the defined reference time point and the first time point in a cycle where the trough occurs with a fundamental period. &#8226; P-value (P) indicates the overall significance of the model fitted by a single period and comes from the F-test comparing the built model with the zero-amplitude model.</p><p>representing the proportion of overall variance accounted for by the fitted model. &#8226; Integrated p-value (IP) represents the significance of the model fitted by the entire periods.</p><p>&#8226; Integrated percent rhythm (IPR) is the R 2 of the model fitted by the entire periods.</p><p>&#8226; The longest cycle of the model (LCM) equals to the least common multiple of all single periods. The most fundamental method for modeling rhythms with known periods is Cosinor, a periodic regression function first developed by Halberg et al. <ref type="bibr">[32]</ref> that uses the least-squares method to fit one or several cosine curves with or without polynomial terms to a single time series. It uses the following cosine function to model the time series <ref type="bibr">[25]</ref>:</p><p>where y i is the observed value at time t i ; M presents the MESOR; t i is the sampling time; C is the set of all periodic components; A c , &#969; c , &#981; c respectively presents the amplitude, frequency, and acrophase of each periodic component; and e i is the error term. In addition to the parameters described above, Cosinor outputs the standard error (SE) for MESOR, amplitude, and acrophase, respectively.</p><p>The Cosinor models can be generated for one time series (single Cosinor -individual model) or for a group of time series (population-mean Cosinor -population model) through the aggregation of rhythm parameters obtained from single Cosinor. Cosinor models have been used to characterize circadian rhythms and compute relevant parameters with confidence limits. The model outputs the significance of the period, and it is proved that if P &#8804; 0.05, the assumed period actually exists. Our Cosinor framework allows for different periodic functions to be applied to the time series data using the detected periods from the previous step. We then use the rhythmic parameters measured by the Cosinor model in our machine learning pipeline as described in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Machine Learning Method</head><p>The machine learning component of the framework uses the parameters obtained from modeling the rhythm of each sensor feature to generate datasets for training and testing of an outcome of  </p><p>return the imputed dataset end interest, e.g., health. The pipeline processes and handles missing values both in sensor and rhythm features across different time windows, selects important rhythm features as part of the training process, and builds machine learning models for the prediction of the outcome. The following sections describe the details of each step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.1">Handling Missing Values.</head><p>Given the streams of data from multiple sources, the framework handles missing data for each sensor stream and each time window. We remove any sensor features if the percent of its missing data is greater than a threshold (e.g., 30%). For the remaining sensor features, we perform nearest-neighbor linear interpolation <ref type="bibr">[8]</ref> to fill in missing values. For example, if there are three missing data points between 10 and 50, then those three missing points are filled with 20, 30, and 40, respectively. Given that the first and last data points cannot be imputed using this method, we remove the sensor feature if the first or the last data point in the time window is missing.</p><p>We apply the same process for handling missing rhythmic features in consecutive time windows. For each rhythmic feature, we fill the value of the missing time window with nearest-neighbor linear interpolation. Let v i be the value of feature in time window tw i . If v 1 and v 5 , the values of features in time windows tw 1 and tw 5, are present and v 2 , v 3 , and v 4 , the feature values of tw 2 , tw 3 and tw 4 are missing, then di f f</p><p>For each missing time window, if none of the time windows before it has value, or none of the time windows after it has value, then this time window is not filled. After imputation, we remove any rhythmic feature with missing values more than a threshold (e.g., 30%). Algorithm 1 describes the process in more detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.2">Feature Selection.</head><p>As mentioned in previous sections, for each type of sensor feature, a single period or a multi-frequency Cosinor model is generated which outputs a list of rhythm parameters. These parameters are entered into the training process for building machine learning models.</p><p>Let M be the number of sensors (s 1 ...s m ), F N i be the number of features for sensor i and RN j the corresponding number of rhythmic features for feature j in sensor i. The resulting feature space will be of M * F N * RN which is high dimensional compared to the relatively few data samples for training. As such, a reduction in the number of features is prevalent. The framework allows for the integration of different feature selection methods such as Lasso, Randomized Logistic Regression (RLR), and Information Gain (IG) in the machine learning component.</p><p>Lasso is a linear regression model penalized with the L1 norm to fit the coefficients <ref type="bibr">[10]</ref>. The Lasso regression prefers solutions with fewer non-zero coefficients and effectively reduces the number of features independent of the target variable. Through cross-validation, the lasso regression can output the importance level for each feature in the training dataset. We use a threshold value of 1e-5 to select features with Lasso, which is the default threshold in the scikit-learn library of Python <ref type="bibr">[49]</ref>. Features with importance greater or equal to the threshold are kept, and the rest are discarded.</p><p>Randomized Logistic Regression is developed for stability selection of features. The basic idea behind stability selection is to use a base feature selection algorithm like logistic regression to find out which features are important in bootstrap samples of the original dataset <ref type="bibr">[43]</ref>. The results on each bootstrap sample are then aggregated to compute a stability score for each feature in the data. Finally, features with a higher stability score than a threshold are selected. We use 0.25, the default threshold value in the scikit-learn library <ref type="bibr">[49]</ref>.</p><p>Information Gain (also referred to as Mutual Information in feature selection) measures the dependence between the features and the dependent variable (predicted outcome) <ref type="bibr">[36]</ref>. Mutual information is always larger than or equal to zero, where the larger the value, the greater the relationship between the two variables. If the calculated result is zero, then the variables are independent. We set our algorithm to select 10 (the default value in the scikit-learn library <ref type="bibr">[49]</ref>) features with highest information gain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.3">Model Building and Validation.</head><p>The step for building machine learning models using rhythm features of k consecutive time windows and for a population of D data samples is flexible in the framework and can incorporate different supervised and unsupervised machine learning methods such as regression, classification, and clustering. In the current version of the framework, we implement three classification methods, including Logistic Regression (LR), Random Forest (RF), and Gradient Boosting (GB). The choice of algorithms is simply based on our empirical evidence of their performance on this type of data. Logistic regression <ref type="bibr">[44]</ref> uses the logistic function to build a classifier. Random forest and Gradient Boosting are two branches of ensemble learning <ref type="bibr">[16]</ref> which use the idea of bagging and boosting <ref type="bibr">[9]</ref>, respectively. Their common feature is to use the decision tree as the basic classifier and to get a robust model by combining multiple weak models. Bagging is short for boost strapped aggregation. Boost strapping is a repeated sampling method with replacement and random sampling <ref type="bibr">[27]</ref>. In boosting, the training set of each iteration is unchanged, but the weight of samples is changed. At each iteration, the training samples with high error rates are given higher weights, so they get more attention in the next round of training.</p><p>We built two types of machine learning models: single sensor modeling and multiple sensor modeling. The single sensor model was built with rhythmic features extracted from a single sensor feature alone to better understand the contribution of each sensor feature in prediction. The multiple sensor model on the other hand was used to evaluate the combined power of multiple sensor features. We used a baseline of the majority class to measure the classifiers' performance in predicting the outcome. Again, the flexibility of the framework allows for the incorporation of different baseline measures. Both feature selection process and building machine learning models are done within a cross-validation setting, e.g., leave one sample out <ref type="bibr">[65]</ref>. The machine learning 47:10 R. Yan et al.</p><p>component can measure basic performance measures of accuracy, precision, recall, F1, and MCC scores to evaluate the algorithms' performance. From those measures, we choose the results above baseline for each combination of feature selection and learning algorithm to further explore the prediction outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EVALUATION</head><p>To demonstrate the capability of our framework in building rhythm models from micro-and macrolevel sensor features and utilizing them in prediction tasks, we present two different cases. The first case, utilizes data from smartphones and Fitbit to explore the relationship between biobehavioral rhythms and mental health status. The second case investigates long-term biobehavioral rhythms of data from OURA smart ring and their ability to predict readiness. We choose different analysis approaches to showcase the flexibility of the framework in handling different types of data and measuring various outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Case 1: Classification of Mental Health via Rhythm Models Using Data from</head><p>Smartphone and Fitbit We utilized a dataset of smartphones, Fitbit, and survey data collected from 138 first-year undergraduate students at an American university who were recruited for a health and well-being research study. The dataset was previously used in <ref type="bibr">[20]</ref> to detect loneliness among college students. Smartphone data was collected through the AWARE framework <ref type="bibr">[26]</ref> and included calls, messages, screen usage, Bluetooth, Wi-Fi, audio, and location. In addition, a Fitbit Flex2 wearable fitness tracker tracked steps, distances, calories burned, and sleep; and survey questions gathered information about physical and mental health including loneliness and depression. The survey data was collected at the beginning and at the end of the semester.</p><p>Our analysis was performed in two steps: First, we explored the potential of modeling and detecting rhythmicity in passively collected data from students' mobile and wearable data streams. Then, we used the built rhythm models to extract features that were fed into machine learning models to explore the relationship between students' biobehavioral rhythms and their mental health. We aimed to answer the following questions:</p><p>(1) Can we observe rhythmicity in students' biobehavioral data over the course of the semester? If so, are those rhythms consistent throughout the semester or do they change during different periods? (2) Do we observe any difference in biobehavioral rhythms among students with different health status? If so, do healthy students have more stable rhythms? (3) How accurately can models of biobehavioral rhythms predict mental health status? (4) What are the most important characteristics and rhythmic features that reveal change in health status?</p><p>Note that our framework provides the ability to generate a large number of observations on the micro-(sensor feature) and macro-level (sensor), but in this paper, we only focus on observations related to our analysis questions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1">Sensor Data</head><p>Processing. The dataset collected from smartphones and Fitbits consisted of time series data from multiple sensors, including Bluetooth, calls, SMS, Wi-Fi, location, phone usage, steps, and sleep. We grouped this time series data into hourly bins and processed it following the approach in <ref type="bibr">[18]</ref> to extract features related to mobility and activity patterns, communication and social interaction, and sleep. Examples of such features include travel distance, sleep efficiency, and movement intensity. We then split the semester data into tumbling cyclic time windows of A Computational Framework for Modeling Biobehavioral Rhythms 47:11 14 days or two weeks based on empirical evaluation of different lengths of time windows. The university semester in the studied population was roughly 16 weeks long, which could be divided into eight time windows of two weeks, except for the last time window that contained only ten days of data (Figure <ref type="figure">4</ref>). We built a model of rhythm for each student and for each time window.</p><p>We handled missing sensor data on a per-participant per-time window basis. For each participant and each time window, we removed sensor features with more than 30% missing data. For the remaining sensor features, we performed nearest-neighbor linear interpolation as described previously to fill in missing values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2">Ground Truth Measures for Loneliness and Depression.</head><p>In our evaluation, we focused on two mental health outcomes, namely depression and loneliness. These two measures were chosen because of their longitudinal aspect, i.e., lasting for at least a few weeks to enable the investigation of 1) how biobehavioral rhythms of students with mental health conditions would differ from other students, and 2) how accurately the state of those mental health conditions could be predicted from extracted rhythms.</p><p>Loneliness data was collected using the UCLA Loneliness Scale, a well-validated and commonly used measure of general feelings of loneliness <ref type="bibr">[55]</ref>. The questionnaire contains 20 questions about feeling lonely and isolated using a scale of 1 (never) to 4 (always). The total loneliness scores range from 20 to 80, with higher scores indicating higher levels of loneliness. As there is no standard cutoff for loneliness scores in the literature, we followed the same approach in <ref type="bibr">[20]</ref> to divide the UCLA scores into two categories where the scores of 40 and below were categorized as 'low loneliness', and the scores above 40 were categorized as 'high loneliness'.</p><p>Depression was assessed using the Beck Depression Inventory-II (BDI-II) <ref type="bibr">[4,</ref><ref type="bibr">22]</ref>, a widely used psychometric test for measuring the severity of depressive symptoms that have been validated for college students <ref type="bibr">[22]</ref>. The BDI-II contains 21 questions, with each answer being scored on a scale of 0-3 where higher scores indicate more severe depressive symptoms. For college students, the cut-offs on this scale are 0-13 (no or minimal depression), 14-19 (mild depression), 20-28 (moderate depression), and 29-63 (severe depression) <ref type="bibr">[22]</ref>. For simplicity and to be consistent with the loneliness categorization, we divided these scores into two categories where the BDI-II scores &lt;14 were labeled as 'not having depression' and all BDI-II scores &gt;= 14 were labeled as 'having depression'.</p><p>Our machine learning pipeline used these loneliness and depression categories as ground truth labels to classify students' depression and loneliness levels using rhythmic features. Each student filled out the surveys both at the beginning (Pre) and the end of the semester (Post). To capture relationships between biobehavioral rhythms and changes in students' mental health, we categorized students into five groups according to the survey measures for depression and loneliness. For simplicity of representation, we further label low loneliness and no depression categories as 1, and high loneliness and high depression as 2. The five mental health categories are as follows:</p><p>&#8226; All students &#8226; Pre1_ Post1: not having a mental health condition in both pre-semester and post-semester surveys &#8226; Pre1_ Post2: not having a mental health condition in the pre-semester survey, but having it in the post-semester survey &#8226; Pre2_ Post2: having a mental health condition in both surveys &#8226; Pre2_ Post1: having a mental health condition in the pre-semester survey, but not in the post-semester survey</p><p>The following sections describe our observations and findings. To distinguish the mental health groups in the two conditions, we add an L and D to the mental health group for loneliness (e.g., L_Pre1_Post2) and depression (e.g., D_Pre1_Post2), respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3">Detection of Rhythmicity and Regularity in Student Data.</head><p>To investigate whether rhythmicity exists in data collected from students' smartphones and Fitbits (Question 1) and whether students' rhythms remain stable throughout the semester (Question 2), we used Autocorrelation and Fourier Periodogram to model students' rhythms in each time window for each sensor feature.</p><p>We first applied the Autocorrelation on a sleep feature which indicates that students with high loneliness have less stable sleep rhythms. Figure <ref type="figure">5</ref> shows the correlogram of the number of restless sleep bouts in two students from different groups, one with low loneliness throughout the semester and the other with high loneliness at the end of the semester. The figure visually depicts differences in the rhythms of these two students where the correlogram belonging to the student with high loneliness projects a less stable rhythm towards the end of the time series. To further quantify such differences in cyclic rhythms of students, we applied Periodogram to (1) detect dominant periods in students' data, and (2) measure variability in those periods among students with different health statuses.</p><p>To identify the dominant periods, the Fourier periodogram is used to detect all significant periods for each sensor feature. The results of the periodogram show that the most dominant cyclic periods in each time window are 24-and 12-hours for all sensor features. For example, for sleep duration feature in the depression category, this trend is consistent in all students regardless of the mental health condition where on average 97.6% and 69.6% of students have 24-and 12-hours as dominant periods in their data across time windows (Tables <ref type="table">1</ref> and<ref type="table">2</ref>). The percentages, however, have a declining trend starting from TW4 (around midterms) towards the end of the semester. This trend can be expected because of the increase in students' workload that causes irregularity in sleep duration. The lowest percentages across all time windows (46.3% on average) are observed in the 12-hour period of students in group D_Pre2_Post2, i.e., students who were depressed throughout the semester. In particular, there is no 12-hour period observed for this group in TW1 (the first two weeks) and TW8 (the last two weeks). The 12-hour or half-day period relates to diurnal/nocturnal activities, and this trend may be indicative of higher irregularity in sleep behavior N the number of students in the group. P1 is the most dominant period (i.e., the percentage of students that have the period is highest among all periods). The percentage in parenthesis is the percentage of students with that period. P2 is the second dominant period. N is the number of students in the group. P1 is the most dominant period (i.e., the percentage of students that have this period is highest among all periods). The percentage in parenthesis is the percentage of students that have the period. P2 and P3 are the second and third dominant periods.</p><p>among students with depression throughout the semester especially at the beginning and towards the end of the semester. Our observations are consistent with other studies. <ref type="bibr">[51]</ref> observed that older adults with depression have a lower sleep regularity index in a study of 138 participants. <ref type="bibr">[62]</ref> observed that irregular sleepers showed more negative moods, including depression, in a study of male college students. We picked the sleep duration to further analyze changes in periodicity in students who started the semester with normal health status but developed depression or loneliness towards the end (D_Pre1_Post2 or L_Pre1_Post2). Table <ref type="table">2</ref> shows that the dominant periods of 24-and 12-hours are preserved for the sleep duration feature in all time windows for both loneliness and depression groups. While the same declining trend towards the end of the semester exists for both loneliness and depression groups, a sharper slope is observed for the 12-hour period. The lowest percentage of students in this group with 24-and 12-hour periods are in time windows 4 and 5 with 73% in loneliness category (24-hour), 91% in depression category (24-hour), 53% in loneliness category <ref type="bibr">(12-hour)</ref>, and 57% in depression category <ref type="bibr">(12-hour)</ref>. Given that time windows 4 and 5 intersect with midterm and spring break, these observations point to changes in sleep patterns among students whose mental health worsens over the semester.</p><p>The third dominant periods for sleep duration across all time windows include 312-hour (13 days), 156-hour (6.5 days), and 78-hour (3.25 days). This is an interesting observation as these numbers are multiplies of the 78-hour period. In other words, it seems the sleep duration of roughly one third of the population in these groups follows a weekly pattern that may be imposed by class schedules.   Overall and across all sensor features, we observe the 24-hour as the dominant period for over 52% of the student population with the highest percentages belonging to steps (95%), calories (92%), Wi-Fi (83%), and sleep (68%). Table <ref type="table">3</ref> presents the overall percentages for each sensor. Calories and steps relate to physical activity. The high percentage of students with 24-hour cycles in these two sensor categories is indicative of regular daily exercise and movement. While there is a low percentage of students with regularity in their cyclic location patterns and visited places (Location Map features), it seems a large number of students have regular daily patterns of using Wi-Fi. This pattern could be expected given that the first-year students live in dorms and are mostly on campus. Interestingly, a low percentage of students seem to have regular cyclic patterns of phone usage (Screen, 36%; Call &amp; Messages, 18%; Battery 13%). While phone use especially battery charging patterns are expected to be cyclic (e.g., charging the phone at night), these observations present the possibility of different phone use behavior among students.</p><p>To measure the variability of the dominant periods among students with different health statuses, we look at the percentage of participants in each mental health group that had 24-hours as one of their dominant rhythms for each time chunk. This would help observe the extent to which students preserved their normal circadian rhythm over the semester. Recall that time chunks consist of k consecutive time windows, there were 36 different time chunks in total for eight time windows of length 2 in the dataset. In each time chunk, a participant had 24-hour as a dominant rhythm if and only if this participant had 24-hour as a dominant rhythm in all time windows in that time chunk. Figure <ref type="figure">6</ref> shows the percentage of participants with 24-hour as the dominant rhythm (y-axis) in each mental health group for each time chunk of length 3 (x-axis). We chose one representative feature from each sensor stream, i.e., Bluetooth (abbreviated as blue in the figure), location (loc), sleep (slp), calories (calor), screen, and steps for further analysis. As shown in Figure <ref type="figure">6</ref>, the trend in the percentage of 24-hour rhythms varies a lot in mental health groups We first calculated the variance per mental health group in each sensor feature shown in Figure <ref type="figure">6</ref>, and then averaged these variance values across sensor features of loneliness or depression. The aggregated variance can represent the stability of rhythms of each mental health group.</p><p>and across time chunks in each sub-figure. To understand the significance of these variations, we 1) applied K-W ANOVA (Kruskal-Wallis one-way analysis of variance) <ref type="bibr">[12]</ref> to test the variance of trends across mental health groups, and 2) calculated the variance in the percentage of 24-hour rhythms for each mental health group across time chunks. For loneliness, the trends for all features show significant differences among mental health groups (the average/median of p-value across sensor features is 0.02/0.03). For depression, mental health groups have more similar trends.</p><p>In contrast to Bluetooth, calorie, and step features that have significant differences in their trends (p-values of 0.05, 0.001, and 0.001), location, sleep, and screen features do not show any significant differences (p-values 0.94, 0.26, and 0.67). This is visually demonstrated in Figure <ref type="figure">6</ref>, e.g., the trend for location is similar for all four depression groups. We also calculated the average variance for each mental health group across sensor features. As shown in Table <ref type="table">4</ref> for loneliness, most changes in the 24-hour rhythms were observed in the group with high loneliness at the beginning and low loneliness at the end of the semester (pre2_pre1) group whereas for depression, the group with depression throughout the semster (pre2_pre2) had the largest fluctuations. For loneliness, the group with low loneliness at the beginning and high loneliness at the end of the semester (L_Pre1_Post2) shows an overall higher percentage of 24-hour rhythms for features of sleep, location, and Bluetooth across time windows. The opposite group with high loneliness at the beginning and low loneliness at the end of the semester (L_Pre2_Post1) shows a lower percentage of 24-hour rhythms for features of calories and steps but higher percentages for screen features. The Bluetooth feature in the top left of Figure <ref type="figure">6</ref>(a) which represents the cyclic patterns of the scanned devices belonging to the person is a proxy of social isolation, i.e., the person not being around other people (and their devices) and being mostly by themselves. Starting from TW3 (week 3, 4, and 5), the percentage of students with regular daily cycle for this features in L_Pre1_Post2 and L_Pre2_Post1 groups sharply increase and decrease, respectively. In other words, while more students with low loneliness at the beginning and high loneliness at the end of the semester start having a regular social isolation pattern on a daily basis towards the end of the semester, fewer students in the opposite group with high loneliness at the beginning and low loneliness at the end of the semester experience this trend. A very similar pattern is observed for another socially relevant feature namely the length of stay in significant locations. The trend is relatively stable and slightly decreasing in students with no loneliness which reflects the stability of behavior in this group. For sleep, steps, and calorie burn, we observe an almost counterintuitive opposite cyclic behavior among L_Pre1_Post2 and L_Pre2_Post1 groups. It seems more students with loneliness toward the end of the semester engage in regular physical activities as projected by calories and steps features and have more regular sleep duration cycles. A relatively similar behavior is observed for the burned calories feature in depression groups (Figure <ref type="figure">6</ref> top right). While regularity in physical activities slightly increases in students with depression (D_Pre2_Post2), it appears to decrease in students with no depression (D_Pre1_Post1) across time windows. While existing studies, e.g., <ref type="bibr">[7,</ref><ref type="bibr">20,</ref><ref type="bibr">61]</ref> point to negative associations of physical activities and mental health, we believe the increase in regular physical activities towards the end of the semester may be a coping attempt by students with mental health problems.</p><p>But trends generally look different for depression groups in Figure <ref type="figure">6</ref>(b). All groups except D_Pre2_Post1 had similar percentages of regular 24-and 12-hour periods for Bluetooth, location, and screen across time windows. While the group with no depression at the beginning and with depression at the end of the semester (D_Pre1_Post2) shows the highest percentage of normal 24-hour rhythms for features of calories and steps across all time windows, the group that was depressed throughout the semester (D_Pre2_Post2) shows the lowest percentages for steps, sleep, and calories. In particular, the regularity of sleep in these students seems to decline drastically across time windows. Although expected, this sharp trend is a valuable observation for further exploration of relationships between change in sleep cycles and depression status. The previous study in <ref type="bibr">[51]</ref> also observed that sleep irregularity is indicative of depression, but no existing study has analyzed the relationship between change in sleep cycles and change in depression status. Our observations provide new findings and insights that call for further and more rigorous investigations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.4">Prediction of Mental Health Status with Rhythmic Features.</head><p>The third and fourth questions in our analysis relate to the feasibility of using biobehavioral rhythm parameters to predict students' mental health status. In our framework, we utilize dominant periods that were detected using Fourier Periodogram described in Section 4.1.3 to build Cosinor models of biobehavioral data. This process generates rhythmic features fed into the machine learning process to classify postsemester loneliness and depression categories (low loneliness vs. high loneliness and no depression vs. with depression) of the students. We build two types of datasets, one with single sensors only and one with multiple sensors. In the following paragraph, we will evaluate the performance of single sensor modeling and multiple sensor modeling to find out what types of sensor features and rhythmic features contribute most to the prediction.</p><p>For Single Sensor datasets, we use the rhythmic features of each sensor feature separately, i.e., for each sensor feature and each time window (with time windows of two weeks), we take the rhythmic features of this sensor feature and time window to form the input dataset. We remove datasets with more than 30% missing instances (80 training instances) as we consider it too small to generate a reliable and generalizable model. For Multiple Sensors datasets, we select the sensor features that provide accuracy above baseline in models built with single sensors. For both approaches, we use the majority class ratio i.e., the category that has the highest percentage of labels for that category as the comparison baseline. We then repeat the same process we followed for single sensor datasets, but this time for the combination of sensor features, i.e., for each combination of sensor features and each time window, we take the rhythmic features of the selected sensor features of those sensors and time window to form the input datasets. Other than the difference in the input dataset, the machine learning pipeline is the same for the two types of datasets.</p><p>Given the imbalanced datasets for both health conditions i.e., the different number of samples in the two classes (e.g., 59% of samples in category 1 vs. 41% in category 2 of depression), using the accuracy will not be adequate for performance evaluation and needs to be accompanied by other measures such as F1. For every combination of time window and sensor, the F1 score is used to select the model with the best performance. We build models with single sensor and multiple sensors datasets for both mental health conditions. The results of all combinations are shown in Figures <ref type="figure">7</ref> and<ref type="figure">8</ref>. The heatmaps use the depth of color to represent the F1 score. Given a large number of features, we only report results with accuracy above the baseline (majority class percentage). Through the single sensor modeling, we can judge which type of sensor is most effective in predicting mental health. Overall, we find that the models with multiple sensors improve the prediction performance. A summarization of the results are listed in Table <ref type="table">5</ref>. Single Sensor Modeling. The F1 scores of machine learning models with single sensor features are shown in Figure <ref type="figure">7</ref>. Overall, the models for loneliness prediction obtain higher accuracy (F1) scores than depression models (Table <ref type="table">5</ref>) which may be due to more sparsity in depression datasets. Rhythm parameters obtained from Cosinor models built for features related to Bluetooth, calories, location, sleep, and steps perform better in predicting both loneliness and depression levels. Although the best model to classify post-semester loneliness is built using Gradient Boosting on rhythm parameters of calorie data from tw 1 to tw 3 with an F1 score of 0.76, more models built on rhythms of location and locationMap provide high performance. The best model for post-semester depression with an F1 score of 0.7 is also built using Gradient Boosting but on the locationMap data from tw 3 to tw 5 . Compared to other sensors, models using rhythmic parameters from loca-tionMap features show better performance for predicting post-semester depression (six out of ten models with the highest F1 score use locationMap features). Although the F1 scores of models with a single time window are generally lower than models with multiple time windows, there are some exceptions in the heatmaps of both loneliness and depression. For example, the loneliness model using sleep features in tw 1 achieves an F1 score of 0.75, and the F1 score of the depression model using sleep features in tw 5 equals 0.68. Interestingly and somewhat counter-intuitively, across all sensors, the majority of models (avg. 57.5% for single sensors and 53.5% for multiple sensors) using early semester time windows (tw 1 to tw 4 ) appear to have higher F1 scores for post-semester loneliness and depression prediction than late semester time windows. We believe this observation provides initial evidence for the possibility of early detection of mental health status via monitoring of changes in biobehavioral rhythms.</p><p>Multiple Sensor Modeling. We do the same analysis for the combination of sensor features. From Figure <ref type="figure">8</ref>, we observe that the combination of multiple sensor features contributes to the improvement of the F1 score. For example, the combinations related to steps, sleep, location, calorie, and Bluetooth end with better results. For predicting loneliness, the best model is built with Logistic Regression, which uses the Bluetooth and steps data from tw 5 to tw 8 and obtains an F1 score of 0.91. For predicting depression, the best model is obtained from Logistic Regression using the rhythm parameters from Bluetooth, calorie, location, screen, and steps features. The model only uses tw 6 to predict depression with an F1 score of 0.89. The best model predicting depression has a lower F1 score than the best model predicting loneliness, which is the same as the single sensor model and may be due to sparsity in sensor data.</p><p>Table <ref type="table">5</ref> summarizes the mean and max of F1 scores for models built with each combination of the feature selection and machine learning methods. In single sensor modeling, the combinations of Logistic Regression with Lasso and Randomized Logistic Regression perform best for predicting loneliness with the mean and max F1 score of 0.7 and 0.76, respectively. The combination of Gradient Boosting and Information Gain provides the highest F1 score for the prediction of depression.  The bold values are either the biggest mean value of F1 scores, or the biggest maximal values of F1 scores.</p><p>For the multiple sensor modeling, we observe that the maximum F1 scores of predicting loneliness and depression are 0.91 and 0.89, which are obtained from the combination of Logistic Regression and Lasso. Overall, for the majority of approaches, the combination of Gradient Boosting and Information Gain provides the best performance. This combination should be further evaluated with other similar datasets to replicate and confirm their superior performance over other algorithm combinations.</p><p>Dominant rhythm parameters that predict mental health. We count the frequency of rhythmic features selected by machine learning models to measure the contribution of each rhythm parameter in predicting mental health. Orthophase and Magnitude appeared on top of the list as the most frequently selected parameters. Although we used three feature selection methods in our evaluation, we observed that the Information Gain method provided a more reliable and complete list of features during the training. Table <ref type="table">6</ref> shows the rhythm features that are selected most frequently by Information Gain during depression prediction for each sensor feature in each time window. The vertical dominant feature (VDominant) is the most commonly selected feature for most of the sensors at a given time window, and the horizontal dominant feature (HDominant) is the most commonly selected feature in most time windows for a given sensor.</p><p>The overall dominant feature (the feature at the bottom right corner in bold font) is the most commonly selected feature for all sensors and time windows. If two features are the most commonly selected features for the same number of sensors/time windows, we break the tie by taking the feature with a higher frequency. Overall, Orthophase is selected most frequently for all sensors and time windows. Magnitude comes in second. Given that Phase and Magnitude reflect duration and intensity of biobehavioral features, frequent selection of these parameters suggests an important relationship with mental health status.</p><p>In addition to the main rhythmic features, i.e., Mesor, Amplitude/Magnitude, and Ortho/Bathyphase, we observe frequent selection of features related to the fit of Cosinor models including the significance level of the fit (P), Standard Errors (SE) and Percent Rhythm (PR and IPR), i.e. the proportion of the overall variance accounted for by the fitted model. Higher levels of these parameters reflect higher variation in data. Therefore, frequent selection of these parameters indicates the power of regularity/irregularity of biobehavioral rhythms in predicting mental health status.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.5">Comparison with Models Built Without Rhythm Parameters.</head><p>To better understand the capability of our framework in utilizing rhythmic features to predict an outcome, we compare the prediction performance of the models with rhythm modeling against the models without rhythm modeling. Specifically, we select the best performing sensor feature in each time window, run exactly the same machine learning pipeline on the raw feature data without rhythm modeling, and compute the F1 score. Table <ref type="table">7</ref> shows that the pipeline with rhythm modeling outperforms the one without by a large margin on most of the features. This observation is consistent with both loneliness and depression predictions.</p><p>Table <ref type="table">10</ref> lists the best RMSE achieved by single sensor models along with the most frequently selected features. Among single sensor models, the model built with the rhythmic feature of sleep data with an RMSE of 4.08 is a stronger predictor of readiness than others. In comparison, the combination of sleep, calories, and steps obtain an RMSE of 3.54, the lowest RMSE among all multiple sensor models, as shown in Table <ref type="table">11</ref>. This combination considers both the activity of the human body during the day (calories) and the sleep quality at night (sleep). These observations are expected and confirm the impact of both sleep and physical activity on the body's daily functioning. Interestingly but not surprisingly, the frequently selected features across all sensors are standard errors of the rhythm parameters (i.e., PHI SE, MESOR SE, and Amp SE) as well as percent rhythm (PR), all of which are indicative of variation in the actual data. MESOR SE is the most dominant feature among both single and multiple sensor models. These results suggest that the level of variability and potentially irregularity in biobehavior may be most predictive of fluctuations in readiness.</p><p>Tables 10 and 11 also summarize the RMSE for models using each combination of feature selection and machine learning methods. The Gradient Boosting model with Lasso regression achieves the best performance for both single sensor and multiple sensor modeling, with an RMSE of 3.54. Using the same prediction model, the Information Gain performs better in single sensor modeling, and the results are reversed in multiple sensor modeling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">DISCUSSION</head><p>In the Introduction section, we identified several challenges in processing and modeling biobehavioral time series data from mobile and wearable devices that motivated the development of our novel computational framework. These challenges include 1) automated handling and processing of massive multimodal sensor data, 2) granular and fine-grained exploration of all signals to We presented two case studies using different datasets, sensors, populations, and prediction tasks to demonstrate the capabilities of our proposed computational framework in addressing the aforementioned challenges. Both cases demonstrated the ability of the framework to automatically process longitudinal multimodal sensor mobile data; extract fine-grained and granular features; detect periodicity in the data and use it to study rhythm stability and variation over time; build micro-rhythm models for each biobehavioral feature; and use those models to incorporate different analytic approaches to predict various health outcomes. We were able to build massive prediction models for both single sensors and different combinations of sensors and to compare the results. We observed that the combination of multiple sensor features contributed to the improvement of prediction results. We also showed that the models built with rhythmic features outperform models built with the raw sensor features further demonstrating the feasibility of biobehavioral rhythms in prediction tasks.</p><p>Although our primary goal was to showcase the capabilities and flexibility of the framework, our analyses provided interesting and novel observations, some of which can be used as initial evidence for further investigation. For example, although we used different datasets and population groups in cases 1 and 2, we observed near-weekly sleep cycles in both populations. We also observed a drastic decline in sleep duration cycles of depressed students throughout the semester. Even though existing research has repeatedly shown relationships between sleep and mental health, we believe our observation is unique in identifying relationships between change in cyclic patterns of sleep and mental health status. Our micro machine learning models of sensor features provided evidence that changes in biobehavioral rhythms in the early weeks of the semester were predictive of post-semester depression and loneliness. This finding suggests monitoring biobehavioral rhythms may serve as a useful tool for early prediction of change in mental health status. We also observed that rhythmic parameters of Phase and Magnitude that reflect duration and intensity of biobehavioral features as well as parameters related to variability in the cyclic time series models (e.g., SEs and PR) were frequently selected in the machine learning process indicating the power of the intensity, duration, and regularity/irregularity of biobehavioral rhythms in the prediction of health outcomes. Since there is no comparable study in biobehavioral rhythms for the prediction of health and wellness, we only compared our observations with the closest studies of loneliness and depression. We hope our initial findings opens up for more studies using our framework to replicate the results.</p><p>The central theme of this paper was introducing the computational framework and its main functionality. However, the framework can be adapted and extended to include more functionalities and features. The advancements include 1) adding more data sources such as weather, environment, work schedules, and social engagements to draw a more holistic picture of biobehavioral rhythms in individuals and groups of people, 2) adding a conclusive set of periodic functions and methods with diverse characteristics that provide the possibility of uncovering different cyclic aspects in data, 3) developing novel methods for measuring the stability of rhythms, and 4) advancing the machine learning component to incorporate a comprehensive selection of analytic methods that further enhances the capabilities of the framework to be used for predictive modeling of cyclic biobehavior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Limitations and Future Work</head><p>While the proposed computational framework is easily extendable to various types of data, the current implementation has a few constraints. First, the input sensor signals should be equidistantly sampled for the rhythm modeling methods to work. Second, the input sensor signals need to have significant cyclic patterns. Finally, the length of input sensor signals should be substantially longer than the expected periods to ensure stable modeling. For the current implementation, we also limited our periodic functions to Autocorrelation, Periodogram, and Cosinor. In future work, we hope to build an ensemble system incorporating different types of rhythm detection algorithms and design a voting algorithm for aggregating the outputs of period detection algorithms. For example, the most frequently detected period by various detection algorithms will be treated as the dominant period. We also plan to extend the framework by adding and evaluating novel methods to quantify the collective stability of individual and group rhythms.</p><p>For handling missing values, we used nearest-neighbor linear interpolation as one of the fundamental missing data imputation methods. However, we acknowledge that missing data is a daunting issue in sensor data processing, and the strategy for handling missing data requires careful consideration. For example, each sensor stream may have a certain distribution pattern that requires a different handling method. In cases of large continuous missing blocks (e.g., 7 or 10 straight days), interpolation can result in smoothed distributions that do not reflect the actual data and lead to misinterpretations of the built models. In our cases, we set a threshold of 30% to eliminate features with large blocks of continuous missing values and to avoid the above-mentioned problem. The threshold was decided based on our calculation of the length of missing blocks. While this strategy can be useful for many types of data, it may not serve as optimum for all.</p><p>Finally, although we presented two cases to demonstrate the capability of the framework in modeling different types of data, more evaluations are needed to verify its generalizability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION</head><p>We designed and presented a computational framework for modeling biobehavioral rhythms from mobile and wearable data streams that rigorously process sensor streams, detect periodicity in data, model rhythms from that data and use the cyclic model parameters to predict an outcome. Our evaluation of the framework using two different case studies showed that in addition to detection of rhythmicity, the framework can reliably discover various periods of different lengths in data, extract cyclic biobehavioral characteristics through exhaustive modeling of rhythms for each sensor feature; and provide the ability to use different combinations of sensors and data features to predict an outcome. The machine learning analyses for predicting mental health and readiness demonstrated the ability of our framework to process massive numbers of data streams to build and analyze micro-rhythmic models for each sensor feature and combinations of features and highlighted dominant rhythmic features for prediction of the outcome of interest. The case studies also provided novel findings that were not observed in similar studies. These results show the feasibility of our computational modeling framework for studying different outcomes and extracting new knowledge through modeling biobehavioral rhythms. Further evaluations can verify the generalizability of the framework.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>ACM Transactions on Intelligent Systems and Technology, Vol. 13, No. 3, Article 47. Publication date: March 2022.</p></note>
		</body>
		</text>
</TEI>
