<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>"Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection</title></titleStmt>
			<publicationStmt>
				<publisher>ACM</publisher>
				<date>10/15/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10481694</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies</title>
<idno>2474-9567</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Yi Xiao</author><author>Harshit Sharma</author><author>Zhongyang Zhang</author><author>Dessa Bergen-Cico</author><author>Tauhidur Rahman</author><author>Asif Salekin</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Stress impacts our physical and mental health as well as our social life. A passive and contactless indoor stress monitoring system can unlock numerous important applications such as workplace productivity assessment, smart homes, and personalized mental health monitoring. While the thermal signatures from a user’s body captured by a thermal camera can provide important information about the “fight-flight” response of the sympathetic and parasympathetic nervous system, relying solely on thermal imaging for training a stress prediction model often lead to overfitting and consequently a suboptimal performance. This paper addresses this challenge by introducing ThermaStrain, a novel co-teaching framework that achieves high-stress prediction performance by transferring knowledge from the wearable modality to the contactless thermal modality. During training, ThermaStrain incorporates a wearable electrodermal activity (EDA) sensor to generate stress-indicative representations from thermal videos, emulating stress-indicative representations from a wearable EDA sensor. During testing, only thermal sensing is used, and stress-indicative patterns from thermal data and emulated EDA representations are extracted to improve stress assessment. The study collected a comprehensive dataset with thermal video and EDA data under various stress conditions and distances. ThermaStrain achieves an F1 score of 0.8293 in binary stress classification, outperforming the thermal-only baseline approach by over 9%. Extensive evaluations highlight ThermaStrain’s effectiveness in recognizing stress-indicative attributes, its adaptability across distances and stress scenarios, real-time executability on edge platforms, its applicability to multi-individual sensing, ability to function on limited visibility and unfamiliar conditions, and the advantages of its co-teaching approach. These evaluations validate ThermaStrain’s fidelity and its potential for enhancing stress assessment.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Stress is an intense emotional phenomenon that can be triggered by external stressors or stimuli <ref type="bibr">[36,</ref><ref type="bibr">53,</ref><ref type="bibr">71]</ref>. It elicits spontaneous physiological responses governed by our autonomic nervous system's "fight or flight" response <ref type="bibr">[12]</ref>. These responses may include changes in skin temperature <ref type="bibr">[26,</ref><ref type="bibr">106]</ref>, skin conductance <ref type="bibr">[76]</ref>, and other indicators. Chronic stress poses significant risks to both physical and mental health, emphasizing the importance of monitoring and managing stress <ref type="bibr">[89,</ref><ref type="bibr">94]</ref>.</p><p>Smart wearables have been explored for stress sensing <ref type="bibr">[9,</ref><ref type="bibr">79,</ref><ref type="bibr">84]</ref>. However, such wearable-based solutions typically require close proximity to the user's body or skin for capturing different physiological parameters (e.g., electrodermal activities -EDA, skin temperature), which can be burdensome and invasive for the users. Furthermore, their sensing scope is limited to the wearer, restricting their applicability in indoor environments with multiple occupants. Passive and contactless indoor stress monitoring, on the other hand, offers the potential to unlock numerous applications that are challenging to achieve through EDA or other wearable-based solutions. For instance, passive and contactless stress monitoring for elderly dementia patients <ref type="bibr">[28,</ref><ref type="bibr">59,</ref><ref type="bibr">98]</ref> or employees in smart workplaces <ref type="bibr">[5,</ref><ref type="bibr">64,</ref><ref type="bibr">66,</ref><ref type="bibr">74]</ref> can provide valuable insights. These approaches can facilitate feedback on stress, including bio-feedback <ref type="bibr">[117]</ref>, interventions for stress management, enhancements in well-being, and customization of user experiences based on stress-related data <ref type="bibr">[65]</ref>. Further discussion on specific application scenarios for contactless passive stress sensing is provided in Section 8.1.</p><p>To address the challenge, several RGB camera-based stress sensing systems <ref type="bibr">[30,</ref><ref type="bibr">32,</ref><ref type="bibr">118]</ref> have been developed; however, their efficacy depends on lighting conditions and is fraught with privacy concerns <ref type="bibr">[35,</ref><ref type="bibr">83]</ref>. Similarly, remote photoplethysmography (PPG) based on RGB cameras fails to work well under varying lighting conditions <ref type="bibr">[19]</ref>. Balancing accuracy, privacy, and adaptability to environmental variations remains a significant challenge for developing stress sensing systems.</p><p>Infrared thermography, which utilizes thermal cameras for stress sensing, can offer a viable solution. Thermal cameras can capture changes in skin temperature <ref type="bibr">[47]</ref>, heart rate through facial skin blood flow <ref type="bibr">[56]</ref>, which are indicative of physiological stress responses <ref type="bibr">[1,</ref><ref type="bibr">90]</ref>. Unlike RGB cameras, thermal cameras are robust to different light conditions <ref type="bibr">[3]</ref>. Prior works have shown promising human sensing results using thermal imaging in poor lighting and even at night <ref type="bibr">[27,</ref><ref type="bibr">55,</ref><ref type="bibr">83]</ref>. Additionally, thermal videos/imaging are typically considered more privacy-preserving compared to RGB imaging, preventing the inadvertent exposure of environmental/contextual information like personal items, addresses, displayed documents, and content within photo frames, among others <ref type="bibr">[18,</ref><ref type="bibr">35,</ref><ref type="bibr">83]</ref>. These attributes enhance the appeal of thermal cameras for stress sensing.</p><p>Several studies <ref type="bibr">[3,</ref><ref type="bibr">23,</ref><ref type="bibr">87]</ref> have explored the use of thermal sensing to assess stress. However, the efficacy of stress detection achieved solely through thermal sensing is lower compared to leveraging other single modalities such as EEG <ref type="bibr">[107]</ref>, ECG <ref type="bibr">[51]</ref>, and PPG <ref type="bibr">[41]</ref>. Recognizing this limitation, recent studies <ref type="bibr">[20,</ref><ref type="bibr">31,</ref><ref type="bibr">119]</ref> are focusing on multi-modality approaches to increase the efficacy of thermal stress sensing. These approaches typically require combining thermal data with other physiological signals from different modalities, such as EEG, ECG, or PPG, which increases computational cost, user burden and thus limits scalability.</p><p>This paper explores "whether a system that utilizes stress-indicative physiological signals (from wearable sensors) during model development but relies solely on thermal sensing during evaluation or deployment can outperform uni-modal thermal camera-based stress sensing approaches. " -Uni-modal approaches use only thermal information in all model development and evaluation phases, discussed in Section 3.2.1.</p><p>To address the above-mentioned question, we introduce ThermaStrain, a first-of-its-kind end-to-end coteaching framework that enhances the efficacy of infrared thermography-based stress sensing. By incorporating electrodermal activity (EDA) sensing (collected from wearable) in the model training phase, which is a reliable method for measuring human stress response in real-time <ref type="bibr">[4,</ref><ref type="bibr">49]</ref>, ThermaStrain improves the accuracy of stress detection. During training, the model utilizes EDA sensing to generate a stress-indicative latent representation from thermal videos, emulating the stress-indicative signal patterns obtained from a real wearable EDA sensor. During test/evaluation time, when only thermal sensing is available, the stress-indicative information extracted from the thermal videos using the emulated EDA representation is used for stress detection. By integrating EDA sensing and learning to extract EDA-guided stress-indicative information from thermal videos, ThermaStrain offers an accurate and non-intrusive solution. It is a pioneering approach that addresses the research question while maintaining the simplicity and effectiveness of thermal sensing in stress assessment.</p><p>The main contributions of this work are:</p><p>&#8226; The paper presents ThermaStrain, a novel co-teaching-based solution that surpasses existing uni-modal and co-teaching baselines in thermal stress sensing (Sections 6.3 and 6.5). What sets ThermaStrain apart is its ability to achieve superior performance (Section 6.8) using only thermal sensing during evaluation/testing, thus maintaining the non-intrusive appeal of thermal sensing. ThermaStrain solution opens up new possibilities for enhancing ubiquitous computing applications where non-intrusive stress assessment plays a crucial role in promoting well-being and optimizing experiences, such as workplace productivity assessment, smart home systems, and personalized and passive mental health monitoring, including depression <ref type="bibr">[38]</ref>. &#8226; To our knowledge, no existing public/available dataset contains variable distance, full or partial body thermal camera data, and physiological parameters in different stress conditions. To overcome this limitation, we collected a comprehensive dataset consisting of infrared thermography sensing (thermal video data) and electrodermal activity (EDA) physiological parameter sensing data. The dataset was gathered from 32 individuals who performed four distinct stress-inducing tasks, each associated with different stressors. Importantly, data was collected from varying distances of 5-11 feet. This dataset's unique characteristics, including the diverse set of stressors and variable distances, allow the paper to develop and evaluate models that are capable of generalizing to different distances (Section 6.7)) and stress situations (Section 6.4). De-identified data will be made public. By addressing the gap in available datasets, this work enables more robust and applicable research in thermal stress sensing. &#8226; The paper presents thorough evaluations and discussions (Sections 7 and 8) that delve into the benefits of co-teaching (i.e., ThermaStrain's approach) in developing an improved stress-sensing solution. Section 7's evaluations encompassed various aspects, including understanding how co-teaching facilitates better solution development and the effectiveness of ThermaStrain in extracting stress-indicative information from thermal frames. Furthermore, Section 8 delves deeply into the potential applications and challenges of deploying ThermaStrain in real-world scenarios. This encompasses scenarios with multiple individuals, limited visibility, and conditions that are unseen during training, such as camera angles, distances, postures, stress conditions, backgrounds, and ethical concerns. The evaluations establish the fidelity of ThermaStrain to the co-teaching paradigm and validate its ability to enhance stress sensing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">MOTIVATION AND VISION MODEL OF CO-TEACHING-BASED THERMAL STRESS SENSING</head><p>This section discusses the vision model of thermal sensing and the motivation or justification of the presented co-teaching solution.</p><p>Vision Model of the Infrared Thermography Sensing. While tracking the thermal signatures of a target object, the thermal energy reaching the thermal camera sensors is formulated by Kylili et al. <ref type="bibr">[63]</ref> as:</p><p>Here &#119868; &#119864;&#119872; is the energy emitted by the object, &#119868; &#119877;&#119864;&#119865; is the energy reflected by the surrounding and intercepted by the object, and &#119868; &#119860;&#119879; &#119872; is a term that accounts for atmospheric influence due to attenuation of thermal radiation.</p><p>Here, the camera determines the &#119868; &#119877;&#119864;&#119865; and &#119868; &#119860;&#119879; &#119872; during calibration. This allows the sensor to get information about the amount of thermal energy emitted by the target object, which is human body in this paper's scope. Thermal-based Stress Sensing. While facing a stressful situation, our autonomic nervous system (ANS) changes the blood perfusion to the skin surface, which changes the skin temperature <ref type="bibr">[26,</ref><ref type="bibr">106]</ref>. Prior works <ref type="bibr">[86,</ref><ref type="bibr">106]</ref> have demonstrated that human body thermal signatures, i.e., skin temperature, particularly those from the face and neck regions <ref type="bibr">[106]</ref>, can provide an insight into human physiology under stressful conditions, that thermal camera captures through &#119868; &#119864;&#119872; .</p><p>EDA-based Stress Sensing. Prior studies in psychophysiology have shown that electrodermal activity (EDA) or skin conductance is a gold standard to measure human stress response in real-time <ref type="bibr">[4,</ref><ref type="bibr">49]</ref>. While experiencing a stressful situation, there is an increase in the skin conductance levels, which is caused by the activation of the eccrine sweat glands <ref type="bibr">[49,</ref><ref type="bibr">61]</ref>.</p><p>Limitation: Latency in Thermal Sensing. Thermal sensing can help to model the understanding of the stress response but cannot outperform the EDA-based assessment since skin conductance provides a rapid response profile (having a delay of 1-3 seconds from the stimulus onset) <ref type="bibr">[10]</ref>. In contrast, thermal responses have a relatively higher latency of 4 -5 seconds from the stimulus onset <ref type="bibr">[80,</ref><ref type="bibr">106]</ref>.</p><p>Co-teaching Goal. The Co-teaching goal is to learn the rapid patterns emerging through skin conductance, i.e., activation of eccrine sweat glands through the thermal energy measurement from the human body &#119868; &#119864;&#119872; . The eccrine sweat glands are composed of a single tubular structure, and the volume of liquid in the tubular part of the eccrine glands increases when activated <ref type="bibr">[44]</ref>. Studies <ref type="bibr">[81,</ref><ref type="bibr">92]</ref> have shown that an increase in sweating, i.e., water on the skin surface, results in the perception of lower temperature by the infrared thermography sensing (i.e., thermal camera information) than the thermal contact sensor (or actual skin temperature) <ref type="bibr">[81,</ref><ref type="bibr">92]</ref>. Meaning there exists a latent thermal signature of skin conductance. The co-teaching goal of the ThermaStrain approach is to teach thermal sensing modality to extract such latent patterns with the guidance of EDA modality, resulting in better stress assessment performance. Our presented end-to-end co-teaching approach effectively teaches such patterns, resulting in higher stress assessment performance from thermal sensing alone during testing or evaluation.</p><p>Thermal imaging has shown promising results for physiology-based affective state and stress detection techniques in recent years <ref type="bibr">[3,</ref><ref type="bibr">20,</ref><ref type="bibr">23,</ref><ref type="bibr">87,</ref><ref type="bibr">96,</ref><ref type="bibr">119]</ref>. ThermaStrain solution builds upon two components of prior work <ref type="bibr">(1)</ref> Thermal imaging for understanding electrodermal activity (EDA) responses. (2) Stress detection using thermal responses. These are discussed below:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Thermal Imaging for Measuring EDA Response</head><p>Recent works <ref type="bibr">[19,</ref><ref type="bibr">61,</ref><ref type="bibr">86]</ref> showed that thermal imaging of skin areas with a high density of sweat glands like the palm or the perinasal regions <ref type="bibr">[19]</ref> could be used to monitor the EDA response or the activation of the sweat glands. The activation of the sweat glands leads to a change in the skin temperature <ref type="bibr">[3,</ref><ref type="bibr">19]</ref>, which is captured using the thermal camera. A recent study <ref type="bibr">[86]</ref> found that there were high correlations between the galvanic skin response (GSR) extracted from the electrodermal activity (EDA) and the thermal signals extracted from the finger and perinasal region (correlation coefficient r=0.94 and r=0.96 respectively). Another work <ref type="bibr">[61]</ref> studied the active pores on the skin surface using high-resolution thermal imaging and found a high correlation (correlation coefficient r=0.7) between the pore activation index measured using the thermal images <ref type="bibr">[61]</ref> and skin conductance response measured from the finger. These works show the potential of thermal imaging for studying human physiological processes. Our work leverages the correlations between thermal responses and skin conductance, i.e., EDA, which these prior works have established.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Thermal Imaging for Stress Detection</head><p>Studies have utilized different physiological modalities like PPG <ref type="bibr">[20,</ref><ref type="bibr">41]</ref>, ECG <ref type="bibr">[51,</ref><ref type="bibr">119]</ref>, RGB image or video data <ref type="bibr">[112,</ref><ref type="bibr">119]</ref>, EDA <ref type="bibr">[102,</ref><ref type="bibr">123]</ref>, and thermal imaging <ref type="bibr">[20,</ref><ref type="bibr">23,</ref><ref type="bibr">87,</ref><ref type="bibr">119]</ref> to detect human stress. Recently, researchers have successfully used a combination of thermal imaging and different physiological sensors to detect individuals' affective state <ref type="bibr">[3,</ref><ref type="bibr">96]</ref>, cognitive load <ref type="bibr">[1]</ref>, stress <ref type="bibr">[20,</ref><ref type="bibr">23,</ref><ref type="bibr">87,</ref><ref type="bibr">119]</ref> and even deception <ref type="bibr">[85]</ref>.</p><p>This section focuses on the prior literature on human stress detection, with a particular focus on thermal imaging based studies. These works can be divided into three strands based on their methodology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Uni-Modal Approaches to Stress Detection:</head><p>Uni-modal approaches in the literature use a single physiological modality like EEG <ref type="bibr">[107]</ref>, ECG <ref type="bibr">[51]</ref>, PPG <ref type="bibr">[41]</ref> to detect human stress. In uni-modal thermal stress sensing scope, works like Cross et al. <ref type="bibr">[23]</ref> used only thermal imaging to track regions of interest in the facial area to detect human stress using an LDA classifier, achieving 89.3% accuracy. Another work <ref type="bibr">[87]</ref> used a thermal imaging-based approach to extract thermal maps corresponding to the facial, neck, and shoulder region for detecting positive and negative affective states with 90% accuracy by using statistical descriptors like average, minimum, maximum, and standard deviation and the difference between the minimum and the maximum temperature for the face, neck, and shoulder region in the thermal image frames. However, these approaches require the thermal cameras to be up close to an individual's face, hence have limited practical use in non-intrusive stress monitoring.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Multi-Modal Approaches to Stress Detection:</head><p>Multi-modal approaches use two or more modalities for the human stress detection task. These techniques have also been widely studied in the literature.</p><p>Cho et al. <ref type="bibr">[20]</ref> proposed a human stress measurement system using smartphone camera-based PPG and thermal video, achieving an average classification accuracy of 78.3% which outperformed the single modality baselines. Walambe et al. <ref type="bibr">[112]</ref> explored early fusion and late fusion techniques to predict stress from posture, physiological, and video data. Their evaluation showed that early fusion outperformed late fusion by 5%. Can et al. <ref type="bibr">[14]</ref> tried different schemes for modality fusion and found that using multiple modalities improved the performance of their stress detection systems in all scenarios. Ghosh et al. <ref type="bibr">[31]</ref> converted data into Gramian Angular Field (GAF) before fusing multi-modality data, which can represent temporal correlations between each timestamp. They achieved significantly better performance than using raw data. Zhang et al. <ref type="bibr">[119]</ref> fused ECG, voice, and RGB facial video for acute stress detection. Their ablation study showed that the overall performance was improved using ResNet 50 and Inflated 3D-CNN.</p><p>Finally, multi-modal machine learning approaches leverage information from multiple modalities and often achieve higher accuracy than uni-modal approaches. However, not all modalities may be available in real-life scenarios and can be costly and comparatively more invasive, limiting their practical use <ref type="bibr">[120]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3">Co-Teaching Approaches.</head><p>To address the limitation of multi-modality, many studies have attempted to reconstruct missing modalities from existing ones to address the issue of missing modalities at inference time. These approaches fall under the domain of Co-Teaching. E.g., Zheng et al. <ref type="bibr">[120]</ref> trained a prototype network to learn meta-sensory representations by modeling knowledge retention mechanisms. Rajan et al. <ref type="bibr">[93]</ref> proposed a modality translator to translate the weak modality of strong modality, so that weak modality alone can achieve better performance during evaluation. Fortin et al. <ref type="bibr">[29]</ref> proposed a multi-task learning framework that prepares multiple classifiers depending on the availability of modalities. Wang et al. <ref type="bibr">[113]</ref> designed a Generative Adversarial Network (GAN) to reconstruct the missing modality. Li et al. <ref type="bibr">[68]</ref> trained a Visual Hallucination Transformer that maps text to images and showed that visualizing scenes from the text can improve machine translation systems. This paper considers the multi-task learning <ref type="bibr">[29]</ref> and 'Hallucination Transformer' <ref type="bibr">[68]</ref> as baselines.</p><p>None of the above-discussed studies are on thermal imaging or stress. The closest state-of-the-art work to co-teaching on thermal imaging is StressNet <ref type="bibr">[62]</ref>, which obtained ECG attributes from thermal input and utilized the extracted ECG-relevant thermal embedding to predict stress. The study utilizes only closed facial thermal frames, extracts ECG-relevant embedding using a ResNet, and captures temporal dynamics through an LSTM backbone. Even though it is not exactly co-teaching, due to its similarity to the concept, it is considered one of the baselines of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">DESCRIPTION OF THE DATASET AND OUR DATA COLLECTION PROCEDURE</head><p>To address the lack of an available dataset containing variable distance, full or partial body thermal camera data, and physiological parameters in different stress conditions, the paper collected data in an indoor setting. Participants were engaged in various non-stress and stress-inducing tasks. The tasks were carefully designed in collaboration with a behavioral psychologist and approved by the X University Institutional Review Board (IRB) to ensure ethical compliance.</p><p>Participants: The participants in the dataset were 32 undergrad and graduate students enrolled at X University. They comprised 12 male and 20 female participants (22 -32 years of age). All data were collected in a single laboratory visit, and participants signed informed consent before initiating the study. This limited age group may not be generalizable to older adults or children.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Sensing Modalities</head><p>During the experiment, we collected thermal videos and electrodermal activity (EDA) physiological parameters. The following devices were utilized for the data collection:</p><p>Thermal Imaging: The Seek Thermal CompactPRO thermal camera<ref type="foot">foot_3</ref>  <ref type="bibr">[57]</ref> was used to capture thermal video data (i.e., sequence of thermal frames). Thermal frames were captured with a 240&#215;320 pixel resolution, a 32-degree field of view, and at 5 frames per second (fps). We use libseek_thermal <ref type="bibr">[110]</ref> API for data collection. Though the thermal camera can capture 10 fps, our observation showed that at 5 fps, frame rates are the most stable.</p><p>The Empatica E4 Wristband: The Empatica E4 Wristband<ref type="foot">foot_4</ref> is a wearable device designed to monitor physiological signals and gather data about an individual's physical and emotional well-being. E4 wristband encompasses an EDA sensor and a PPG (Photoplethysmography) sensor, where EDA data is collected at a sampling rate of 4 fps, while the PPG data is captured at 64 fps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Data-Collection Experimental Procedure</head><p>This section discusses the experimental procedure we followed during the data collection session. Upon arrival, the participants were given time to read and sign the consent form. Next, participants were asked to stand in front of a computer screen and a Seek Thermal CompactPRO camera. Additionally, they wore an Empatica E4 wristband on their left hand, which recorded their EDA responses during the data collection. Figure <ref type="figure">2</ref> illustrates the data collection setup in the laboratory. Distances: Three distance lines were marked from the thermal camera at 5, 7, and 9 feet. Each participant was randomly asked to stand and encouraged to limit their movement within 1 -2 feet behind (i.e., farther from the sensor) one of these lines. In total, 10 participants stood at the 5 feet line, 12 participants at the 7 feet line, and 10 participants at the 9 feet line.</p><p>Ambient Temperature: Data collection was performed in indoor rooms of the X University over two years, across all seasons. However, indoor temperatures were regulated. During the heating season (September 15 -May 15), the AC was set to 68 degrees Fahrenheit; during the cooling season (May 16 -September 14), it was set to 76 degrees Fahrenheit.</p><p>Study Protocol: Thermal and EDA physiological response data were collected for non-stress and stressful conditions. Notably, this study's data collection protocol did not adhere to a 1-to-1 non-stress vs. stress challenge design. Instead, it followed a protocol similar to the Trier Social Stress Test (TSST) <ref type="bibr">[34,</ref><ref type="bibr">46]</ref>, where one or more non-stress-inducing baselines are used for comparison with stress challenges. For instance, Iqbal et al. 2022's <ref type="bibr">[46]</ref> protocol had a single non-stress-inducing baseline task followed by three stress-inducing tasks in a fixed sequence. Similarly, in this paper's protocol, participants performed two non-stress tasks first, followed by four stress-inducing tasks known to elicit physiological responses <ref type="bibr">[11,</ref><ref type="bibr">34,</ref><ref type="bibr">46]</ref>. While two non-stress tasks enable establishing baselines from various conditions and participants' activities, each stress-inducing task presented unique stimuli to the participants, eliciting stress responses across various conditions and activities, as discussed below. The collected data from this protocol enables the development of a non-stress vs. stress detection approach applicable across various conditions. Moreover, human physiological responses, including skin temperature changes, are not momentary concerning the onset of a stressor; rather, they may persist for several minutes <ref type="bibr">[42]</ref>. Following the literature <ref type="bibr">[11,</ref><ref type="bibr">34,</ref><ref type="bibr">46]</ref>, to avoid any bias from residual stress effects, non-stress-inducing tasks are conducted initially, and later, a fixed sequence of stress-inducing tasks are performed sequentially. Additionally, as identified in <ref type="bibr">[34]</ref>, no interfering activities, such as questionnaires, occurred at least 15 mins before introducing the four stress-inducing tasks.</p><p>The study protocol is presented in Figure <ref type="figure">3</ref>. On average, the data collection session was 15 minutes incorporating the gaps between the tasks. Participants self-reported their subjective stress levels on a scale from 0 (no stress) to 5 (extreme stress). Participants self-reported their subjective stress levels every 30 seconds during the watching calm video and stress-inducing video tasks. However, for tasks involving counting task, preparing a song, playing a number game, and recalling a negative memory, self-reporting was only performed at the end of each task. This approach was chosen to prevent potential disruption during task execution, which could affect the stressor's effectiveness. The details of the (1) Non-Stress Inducing and (2) Stress-inducing tasks are below. (2) Counting task: We asked the participants to slowly count from 0 to 59. This task was designed to reflect a non-stressful speaking condition. This task took, on average, 1 min for each participant. At the end of this task, the self-reporting mean score was 0.83.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Stress Inducing Tasks:</head><p>The participants performed four stress-inducing tasks in this stressful condition data collection phase. The tasks were designed based on prior studies <ref type="bibr">[11,</ref><ref type="bibr">34,</ref><ref type="bibr">100]</ref> in psychology and behavioral science.</p><p>(1) Passive stress induce video: Participants watched four stress-inducing video clips (for a total 3min) from the emotional stimuli database <ref type="bibr">[100]</ref>, which contains movie clips with labels: stressful, scary, fearful, and disgust, and are annotated and ranked by 50 film experts and 364 volunteers. During this task, the self-reporting mean score was 2.99.</p><p>(2) Sing-a-Song Stress Test (SSST): During the SSST <ref type="bibr">[11]</ref> task, participants were asked to prepare a song in 30 seconds without any prior notification in the presence of the task by the experiment coordinators. Subsequently, they sing a song for up to 30 seconds. It's important to note that this study used only the 30-second preparation phase data, excluding any data during the actual singing. This is due to research indicating that the SSST task, like song preparation, induces stress through social evaluation and uncertainty in the confederate's reaction to the participant's performance <ref type="bibr">[24]</ref>. At the end of this task, the self-reporting mean score was 1.42.</p><p>(3) Trier Stress Task (TST). This task follows the TST task <ref type="bibr">[58]</ref>, where the participants were given a surprise arithmetic task to count backward from a large number by '17'. For example, if the starting number is 1000, the participant should say: 983, 966, 949,..., etc. Every time the participants made a mistake or took longer, they were asked to start from the beginning. Studies <ref type="bibr">[34,</ref><ref type="bibr">40]</ref> have reported that TST is the gold standard protocol that leads to a reliable high-stress response. The task took, on average, about 3 mins for each participant. At the end of this task, the self-reporting mean score was 3.33. (4) Recalling a Bad memory: According to literature <ref type="bibr">[22]</ref>, when we reminisce about negative events, our bodies respond as if we are experiencing those events again, activating the fight or flight response and releasing stress hormones such as cortisol and adrenaline. This can lead to high-stress response <ref type="bibr">[60]</ref>. The task took, on average, about 2 mins for each participant. At the end of this task, the self-reporting mean score was 1.8.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Validating Stress-Response Due to the Stress-Inducing Tasks</head><p>To verify induced stress through the stress-inducing tasks, we assessed Heart Rate Variability (HRV) using Empatica E4 wristband data. Literature <ref type="bibr">[13,</ref><ref type="bibr">54,</ref><ref type="bibr">99,</ref><ref type="bibr">111]</ref> link HRV changes to stress, often tied to reduced parasympathetic activity, seen as decreased High Frequency (HF) and increased Low Frequency (LF). We compute LF/HF ratio by dividing LF power by HF power, which rises under stress <ref type="bibr">[54]</ref>. Following <ref type="bibr">[13]</ref>; we extract photoplethysmogram (PPG) features from Empatica E4, with 3-minute windows and 1-second steps, aligning with recommended HRV analysis window sizes <ref type="bibr">[73]</ref>. First, we separate the non-stress-inducing (i.e., 3-min windows belonging to the first two tasks in Figure <ref type="figure">3</ref>) and stress-inducing task windows (i.e., 3-min windows belonging to the last four tasks in Figure <ref type="figure">3</ref>), then filter all using a Chebyshev II order-4 filter (20 dB stopband attenuation, 0.5 -5 Hz passband). PPG signals become heartbeat intervals, removing outliers beyond 500 -1200 ms (heart rates 50 -120 bpm) and filling gaps with linear interpolation. Finally, we extract LF/HF HRV values from processed features. Figure <ref type="figure">4</ref> shows the LF/HF ratio during the non-stress and stress-inducing tasks. We applied a one-way ANOVA, which yielded a statistically significant effect of stress on heart rate variability, i.e., LF/HF ratio (p-value=0.0077). Consistent with findings by <ref type="bibr">[54]</ref>, an elevated LF/HF ratio is noticeable during stress-inducing tasks, indicating heightened participant stress levels resulting from the introduced challenges through these four tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Data Preprocessing</head><p>Notably, throughout our assessment of the ThermaStrain, we refrained from excluding any data segments based on self-reported information (as described in Section 4.2) or stress validation through HRV (discussed in Section 4.3). This decision was made because the physiological stress response may persist for several minutes concerning the onset of a stressor <ref type="bibr">[42]</ref>, and in each of the stress-inducing tasks, both self-reports and HRV analysis indicated heightened stress levels, confirming the efficacy of the stressors. Given that our experimental protocol entails stress-inducing tasks lasting between 30 seconds and 3 minutes each, assuming a participant is stressed, only a portion of that duration may introduce bias.</p><p>In this section, we discuss our data preprocessing steps for the raw thermal and EDA data.</p><p>Normalizing the EDA data: Research shows that individuals from different populations may exhibit different levels of skin conductance due to various factors such as genetics, skin thickness, and environmental factors. Therefore, following previous research <ref type="bibr">[10]</ref>, z-score normalization was applied to each participant to control for individual differences in EDA level.</p><p>Extraction of stress event detection windows: A window size of 5 seconds with 2 seconds of overlap was used for real-time stress detection. We determined this window size empirically (through grid search), aiming to balance high-stress assessment efficacy with the real-time usability of the sensing system. Larger window sizes yielded similar effectiveness, while smaller sizes compromised stress-assessment performance. Notably, this aligns with the discussion in Section 2. Given that thermal response latency is under 5 seconds, and the latency gap between EDA and thermal is approximately 2 seconds, a 5-second detection window can capture the physiological stress response at the onset of the stressor and facilitate knowledge transfer from EDA to Thermal during training.</p><p>During data collection, each thermal and EDA data point was marked with an absolute global time to facilitate synchronization between modalities. We synchronized the thermal and EDA data by selecting 5 seconds of thermal data and retrieving simultaneous EDA data according to the absolute global timestamp. After pre-processing, 5 seconds of thermal data had a dimension of [25 &#215; 1 &#215; 240 &#215; 320] (an individual thermal frame was of the shape [1&#215;240&#215;320]), and the EDA data had a dimension of <ref type="bibr">[5 &#215; 4]</ref>.</p><p>Human body detection: The human body constitutes a fraction of the thermal frames. Since only the human body thermal information is pertinent to the stress, we applied a human body segmentation (i.e., body-region identification) algorithm named DetectorRS <ref type="bibr">[91]</ref> on thermal frames. We pre-trained the DetectorRS model on Microsoft COCO dataset <ref type="bibr">[70]</ref> and evaluated its performance on our manually labeled (with pixel-wise body area and bounding box labels) thermal dataset. We used IOU as the evaluation metrics <ref type="bibr">[95]</ref> that represent the ratio of overlap vs. union of the predicted and ground truth image segmentation regions. The DetectorRS body segmentation model achieves an average of 85.02% IOU score, which is reasonably high. After identifying the body region, the thermal frame pixel values outside the human body segmentation mask are zeroed out. Finally, a window-wise z-score normalization is applied to the thermal images. In all of the evaluations, background masked-out thermal frames are utilized during training and testing.</p><p>Such background masking enables ThermaStrain to be generalizable and readily deployable in unknown scenarios. E.g., as discussed in Section 8.2, such masking allows for identifying high stress in scenarios when multiple individuals are present in front of the thermal camera.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">PROPOSED CO-TEACHING APPROACH 5.1 Problem Statement</head><p>Given a sequence of thermal frames, i.e., thermal video &#119905; &#8712; &#119879; , where &#119905; = (&#119905; 1 , ..., &#119905; &#119896; ) and synchronized EDA values, &#119890; &#8712; &#119864;, where &#119890; = (&#119890; 1 , ..., &#119890; &#119897; ), our goal is to train a model that can predict &#119910; &#8712; &#119884; , where &#119910; =('stress' or 'non-stress'), from only thermal video &#119905; without requiring the EDA values at inference time. Since thermal video and EDA sampling rates are not necessarily the same, for a fixed stress detection window, &#119896; &#8800; &#119897;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Approach Overview</head><p>This section presents a novel co-teaching approach named ThermaStrain. Since EDA is a strong indicator of stress <ref type="bibr">[102]</ref>, the ThermaStrain approach simultaneously learns separate stress-indicative embeddings from EDA and thermal video; however, enforcing them to be similar for the same objective, inferring stress vs. non-stress class. Such enforcement enables the extraction of knowledge from thermal video similar to stress-indicative EDA physiological parameters alongside other thermal attributes indicative of stress, resulting in a higher stress inference performance.</p><p>The ThermaStrain model, shown in Figure <ref type="figure">5</ref> </p><p>Task losses: One of the training objectives is to teach the model to infer &#119910; &#119905; and &#119910; &#119890; as close as the target output &#119910;. To attain the objective, the proposed approach introduces two task losses &#119897; &#119905; and &#119897; &#119890; , which are cross-entropy losses between the target output &#119910; and the generated inferences &#119910; &#119905; and &#119910; &#119890; through the two streams, thermal, and EDA. Minimizing &#119897; &#119905; and &#119897; &#119890; ensures better embeddings &#119911; &#119905; and &#119911; &#119890; extraction encompassing the respective mortality's stress-indicative markers.</p><p>Similarity loss &#119897; &#119904; : Since the ThermaStrain approach also aims to learn the extraction of knowledge from thermal video similar to the knowledge from EDA information that is highly predictive of stress, it is important to enforce the embeddings &#119911; &#119905; and &#119911; &#119890; to be similar. Hence, ThermaStrain introduces a similarity loss &#119897; &#119904; to maximize the joint likelihood of &#119911; &#119905; and &#119911; &#119890; during training. Here we use the mean square error to measure the similarity of embeddings.</p><p>Consistency loss &#119897; &#119888; : Another training objective is to encourage consistency between inferences &#119910; &#119905; and &#119910; &#119890; . Considering that with the utilization of similarity loss &#119897; &#119904; during training, the thermal and EDA embeddings &#119911; &#119905; and &#119911; &#119890; would be similar; however, not the same, classifier module &#119865; &#119862; may generate mismatched &#119910; &#119905; and &#119910; &#119890; . This may result in inferior performance during testing when the EDA sequence &#119890; would not be available. Hence, to enforce consistency between &#119910; &#119905; and &#119910; &#119890; , We define a consistency loss &#119897; &#119888; .</p><p>Here, Equation <ref type="formula">5</ref>is the Kullback-Leibler divergence between the two conditional distributions &#119910; &#119905; and &#119910; &#119890; .</p><p>Overall Training Loss &#119871;: The overall optimization objective, i.e., overall training loss &#119871; of the ThermaStrain approach, is finally defined as a weighted sum of the two task losses, similarity loss, and consistency loss:</p><p>Where &#120572; and &#120573; are hyperparameters that control the weight of co-teaching and consistency objectives during training.   <ref type="figure">5</ref>, during testing, the stress vs. non-stress output is inferred through the thermal stream, generating inference &#375; taking &#119905; as input using the equation below:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Discussion of the Modules</head><p>The modules of the ThermaStrain approach are discussed below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.1">Thermal</head><p>Encoder &#119865; &#119879; . It takes a thermal video in the form of a sequence of thermal frames, &#119905; = (&#119905; 1 , ..., &#119905; &#119896; ) as input, and generates an aggregated embedding &#119911; &#119905; comprising stress indicative thermal markers/information from each frame, and the temporal thermal attributes depicted through the frame sequence. The module comprises a ResNet followed by a Transformer network. As shown in Figure <ref type="figure">5</ref>, ResNet takes each frame &#119905; &#119894; to generate a framewise embedding &#119911; &#119894; &#119905; , representing the stress indicative thermal information from the respective frame. Later, the Transformer takes all the framewise embeddings (&#119911; 1 &#119905; ,. . . ,&#119911; &#119879; &#119905; ) to aggregate the temporal information and generate the thermal video embedding &#119911; &#119905; . 5.3.2 EDA Encoder &#119865; &#119864; . The input EDA values &#119890; within a detection window have a [5&#215;4] dimension representation. First, we extract the 6 features from EDA values, including mean, min, max, median, variability, and standard deviation. Then these six features was fed into two linear layers followed by a ReLU activation function to generate the EDA embeddings &#119911; &#119890; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.3.3</head><p>Classifier &#119865; &#119862; . In the ThermaStrain model, the &#119865; &#119862; module is represented by a simple network comprising linear layers with a ReLU activation function. Notably, the generated thermal and EDA embeddings &#119911; &#119905; and &#119911; &#119890; have the same dimension &#119889;. During training, the &#119865; &#119862; takes both the embeddings separately to predict &#119910; &#119905; and &#119910; &#119890; , respectively. Finally during test/evaluation, &#119865; &#119862; predicts the stress vs. non-stress inference &#375; during the test or evaluation.</p><p>This section discusses the efficiency and applicability of ThermaStrain by investigating some key questions. The presented network parameter configurations were optimized by performing a grid search of the possible parameter values. Presented evaluation results are end-to-end, incorporating the inaccuracy due to pre-processing errors. Finally, evaluations are presented with metrics: sensitivity, specificity, accuracy (%), and F1 scores.</p><p>Evaluation Dataset-split: For each of the evaluations, we followed the person-disjoint hold-out method <ref type="bibr">[15]</ref>. Our collected data includes 32 sessions (each with a different participant). In this study, we performed a 5-fold evaluation where-in each fold, we split the dataset into validation (5 sessions) and training set (rest of the sessions). Training and validation sets were disjoint concerning participants and sessions. Our dataset is imbalanced, having more stress samples than non-stress. Therefore, in each fold, we performed under-sampling on the stress samples to make the dataset balanced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Implementation of the Presented Approach</head><p>As discussed in Section 5.2, presented approach has three modules, discussed below:</p><p>&#8226; The Thermal Encoder: Comprises a ResNet and a Transformer Encoder. The ResNet starts with a 7 &#215; 7 convolutional layer, followed by three residual blocks. Each residual block includes two convolutional layers, followed by batch normalization and ReLU activation. At the end of the ResNet, an adaptive pooling layer pools the feature map into a size of 2 &#215; 2. Finally, we set the channel of the last convolutional layer as 64, resulting in an output embedding of shape 2 &#215; 2 &#215; 64, which is 256 dimensions. The embedding of all frames is then fed into a transformer encoder to aggregate information over time. After the transformer, we perform mean pooling on the sequential dimension and use the resulting embeddings for downstream tasks. &#8226; The EDA Encoder: Considering the input EDA data has a simple feature dimension of 6. We design a simple EDA encoder that comprises two linear layers with ReLU activation and dropout. &#8226; The Classifier: The classifier module comprises two linear layers with a ReLU activation function and dropout.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Optimization</head><p>The weights &#120572; and &#120573; in Equation <ref type="formula">6</ref>, and the learning rate are hyper-parameters that were identified through the python toolkit Optuna <ref type="bibr">[2]</ref>. It uses a Bayesian Optimization algorithm called Tree-Structured Parzen estimator to identify the optimum set of values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Comparison of Co-Teaching with Uni-and Multi-Modal Approaches</head><p>This section compares the ThermaStrain approach with uni-modal and multi-modal approaches leveraging the modalities in hand, thermal video, and EDA. To ensure a fair comparison among the approaches being compared, we set the complexity of each component to be the same, including the number of neurons in linear layers, the number of layers in the classifier, and the number of kernels in the CNN layers. We then use Optuna, a hyperparameter optimization framework, to comprehensively evaluate hyperparameters such as the learning rate to determine the optimal settings. Finally, we present the best results obtained for each model. Implementations: The thermal baseline takes only a 5-second thermal video as input and predicts stress. It uses ResNet as a feature extractor, a transformer to aggregate information over time, and a multi-layer Perceptron classifier to make the classification.</p><p>The EDA baseline takes 5 seconds of EDA data and predicts stress. It uses a multi-layer perceptron followed by a transformer to extract features and a multi-layer perceptron classifier to make the classification.</p><p>The multimodal baseline shares a similar structure to our ThermaStrain implementation, with a thermal feature extractor, EDA encoder, and classifier. The only difference is that, instead of enforcing the &#119911; &#119879; and &#119911; &#119864; embeddings to be similar, classifier taking each of them separately, the classifier takes the concatenated embedding of &#119911; &#119879; and &#119911; &#119864; to make inferences.</p><p>Evaluation Result Discussion: Table <ref type="table">1</ref> presents the results of our stress vs. non-stress binary classification evaluation. The ThermaStrain approach achieved an accuracy of 83.17% and an F1-score of 0.8293. In comparison, the uni-modal thermal baseline model only achieved 76.2% accuracy and 0.7592 F1 scores. The ThermaStrain model outperforms the thermal baseline model by over 9%.</p><p>The EDA baseline and multi-modality model also achieved 0.8568 and 0.8897 F1 scores, respectively. These scores are higher than our thermal baseline and ThermaStrain model.</p><p>The evaluation coherent with EDA is a strong indicator of stress. The co-teaching approach successfully extracts EDA-relevant embedding, i.e., skin conductance-relevant information from thermal video, resulting in a significant performance improvement over the uni-modal thermal baseline. However, such extraction is lossy; hence co-teaching still cannot outperform multi-modality or EDA-based stress assessment approaches. Notably, compared to these approaches, EDA is not required by the ThermaStrain approach at the inference time, enabling contactless deployment, i.e., less obtrusive sensing. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Generalizability Evaluation</head><p>A generalizable stress-sensing solution needs to be robust and perform similarly in previously unseen stress conditions, meaning in the stressful situations that were not present in the training dataset.</p><p>To evaluate the generalizability of the ThermaStrain approach in unseen stress conditions, we performed a stress-task disjoint evaluation over the 5-folds (discussed in Section 6). As mentioned in Section 4.2, each participant performed four distinct stress-inducing tasks in each data collection session, simulating four stress conditions. In this evaluation, during training, only the 'Passive stress induce video' and 'TST' task data were used as stress samples, while during evaluation, only the 'SSST' and 'recalling bad memories' task data were used.</p><p>Like the previous section, ThermaStrain's performance is compared with EDA and thermal video-based uni-modal and multi-modal approaches; the results are shown in table <ref type="table">2</ref>.</p><p>The uni-modal thermal baseline's performance decreased by 3% in accuracy and F1 scores compared to the baseline evaluation in table 1. In contrast, ThermaStrain model's performance decreased by 2%, but it is still significantly better than the thermal baseline.</p><p>The EDA and multi-modality baselines achieved even higher accuracy. These evaluations indicate that during the 'SSST' and 'recalling bad memories' tasks, EDA is an even stronger indicator of stress than thermal modality.</p><p>It is important to note that, according to our study protocol discussed in Section 4.2, the stress responses for each task are not completely disjoint. This is due to the potential partial influence of residual physiological stress responses from stress-inducing tasks on one another. Nonetheless, the evaluation in this section demonstrates the relatively greater generalizability of the ThermaStrain approach compared to the unimodal thermal baseline, thereby showcasing its improved utility. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5">Comparsion with Other Co-Teaching Baselines</head><p>This section compares ThermaStrain approach with the existing co-teaching baselines. Following the state-ofthe-art literature <ref type="bibr">[29,</ref><ref type="bibr">62,</ref><ref type="bibr">68]</ref>, we implemented three co-teaching approaches as baselines. Since none of them have leveraged thermal and EDA modalities, we followed structure and design but made necessary changes to fit our dataset.</p><p>The results are shown in table <ref type="table">3</ref>. As discussed in Section 3, the multi-task learning approach <ref type="bibr">[29]</ref> has multiple classifiers that fit different missing modality scenarios. In StressNet <ref type="bibr">[62]</ref>, thermal data was used to predict the EDA modality and then used the predicted EDA to predict stress. In the vision hallucination model <ref type="bibr">[68]</ref>, there is a hallucination network that mimics the EDA embedding. During inference, pseudo-embedding replaces the EDA embedding and concatenates with the thermal-independent embedding.</p><p>As shown in table 3, all models achieved lower performance than ThermaStrain. The reason is that the multi-task learning approach doesn't have similarity loss and consistent loss that force each modality to learn joint patterns. The StressNet only takes the reconstructed EDA modality to predict stress, which loses some independent information about the thermal modality. The inferior performance of StressNet further emphasizes the impact of presented co-teaching approach rather than just simulating physiological parameters from thermal sensing and using the simulated physiological features to assess stress. Finally, the vision hallucination model is too complex, leading to overfitting in our limited dataset. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.6">Ablation Study</head><p>Table <ref type="table">4</ref> presents the results of the ablation study, where various components of ThermaStrain are modified while keeping the rest of the network constant. We discuss the evaluation results and observations below: Using Central Moment Discrepancy (CMD) as &#119897; &#119904; : Both Mean Square Error (MSE) and CMD loss are popular distance metrics that measure the discrepancy between the distribution of two representations. We evaluate them thoroughly. As shown in table 4, the MSE achieves better performance. Therefore, we choose MSE as our &#119897; &#119904; .</p><p>Not using &#119897; &#119888; : As there is a similarity loss &#119897; &#119904; that forces the two embeddings &#119911; &#119879; and &#119911; &#119864; to be similar to each other, it may be questioned whether we need the consistency loss &#119897; &#119888; . Hence, we evaluated not using the &#119897; &#119888; in the ThermaStrain implementation. The results show a drastic performance drop when we remove the consistency loss, demonstrating the importance of &#119897; &#119888; in encouraging consistency between inferences &#119910; &#119905; and &#119910; &#119890; , which leads to better performance while EDA modality, consequently, &#119910; &#119890; is unavailable during evaluation. <ref type="bibr">[25]</ref> is a transformer-based image feature extractor and outperforms ResNet-based models in RGB image-based literature. We attempt to use the Vision Transformer as our feature extractor for each frame. However, accuracy and F1 scores decrease by approximately 5%. Considering the limited size of our dataset, the complex transformer-based feature extractor may lead to overfitting. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Use Vision Transformer to replace ResNet: The Vision Transformer</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.7">Distance Evaluation</head><p>This section breaks the performance the ThermaStrain based on the distance between the participant and the thermal It is observed that the model performs better for closer distances. Specifically, the model achieves the highest performance in the 5-7 ft range with an accuracy of 91.36% and an F1 score of 0.9126. However, the performance drops for larger distances, especially in the 9-11 ft range, where the accuracy is 72.58%, and the F1 score is 0.6902. The performance deterioration for larger distances can be attributed to each pixel's reduced quality of thermal energy perception and the reduction in the quantity of thermal pixels covering the important body regions at the longer ranges. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.8">Benchmarking for Real Time Execution</head><p>To evaluate our approach's real-time executability, we performed a run-time evaluation on an Nvidia Jetson Nano. The program reads 5-second data at a time, analyzes and infers the class results, and waits for the next 5-second data. The binary classifiers take 0.324s to process one 5-second video window on Jetson Nano. The average CPU usage is 19.38%, and the average GPU usage is 9.64%. The average RAM usage is 1.85 GB. According to this evaluation, our presented approach is capable of real-time execution on a Jetson Nano module. Note that the times reported are when only the stress assessment program is running. Running additional programs will affect/change these times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">DISCUSSION ON THERMASTRAIN'S EFFICACY</head><p>This section further investigates the ThermaStrain's capability in identifying effective thermal information extraction (Section 7.1) and how co-teaching enables better stress sensing solution development through loss landscape analysis (Section 7.2).</p><p>Fig. <ref type="figure">6</ref>. The SHAP interpretation to Visualize Models' Information Extraction Efficacy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">Inference Interpretation Discussion: Visualizing ThermaStrain's Effective Information Extraction</head><p>Many studies have highlighted that stress-induced changes in temperature are primarily concentrated in the forehead, eyehole, and cheekbone regions <ref type="bibr">[82,</ref><ref type="bibr">88]</ref>. Therefore, these regions are more critical for improving the accuracy of stress detection models. To evaluate the developed models' capability in capturing information from the critical body regions, we utilized the KernalSHAP model-agnostic interpretation framework <ref type="bibr">[72]</ref>. The SHAP values <ref type="bibr">[103]</ref> indicate the contribution of each input attribute in driving the model inference closer or farther away from the true/correct inference. We divided the human body region into an 8 &#215; 8 grid to compute the Shapley value each grid. Figure <ref type="figure">6</ref> shows generated explanations of the baseline and ThermaStrain classifiers' inference for three thermal frames belonging to three different individuals. The baseline model's Shapley value appears more normally distributed, indicating that it had to select information from the entire frame. In contrast, ThermaStrain effectively learns the critical regions to focus on, illustrated by having darker colors in the face, neck, and hand regions that are established as crucial stress-indicative body areas according to literature <ref type="bibr">[82,</ref><ref type="bibr">88]</ref>.</p><p>This visualization demonstrates that the EDA modality effectively guides the ThermaStrain model to extract better thermal embeddings. This results in perceiving high-stress-indicative and physiologically relevant information by focusing on crucial visible body regions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Loss Landscape Visualization: Understanding How Co-teaching Facilitates Effective Model</head><p>Our evaluation in Section 6.3 shows that the presented Co-teaching approach outperforms the uni-modal thermal sensing baseline approach. This section investigates how the co-teaching approach enables better performance.</p><p>Li et al. <ref type="bibr">[67]</ref> showed that visualizing the loss landscape for neural network models provides a richer understanding of how the different approaches' design choices influence the optimization of the loss function. We used the loss-landscapes library <ref type="bibr">[75]</ref> to generate the 3D loss landscape plots of ThermaStrain and the uni-modal thermal baseline model, as shown in Figure <ref type="figure">7</ref>. A detailed discussion on the plot generation is in Appendix A.1.</p><p>Several prior works <ref type="bibr">[17,</ref><ref type="bibr">43,</ref><ref type="bibr">52,</ref><ref type="bibr">67]</ref> investigating the loss landscapes to understand the ability of neural networks to optimize better (i.e., obtaining better performance) emphasized that the 'flatness' of the loss landscape is a property of interest. Hochreiter et al. <ref type="bibr">[43]</ref> define the 'flatness' of the loss landscape as the region around the minima where the loss remains low. Literature <ref type="bibr">[43,</ref><ref type="bibr">50,</ref><ref type="bibr">67]</ref> suggests that the model with flatter loss surface optimizes better, i.e., effectively identifies the minima in the loss space, hence achieves better performance.</p><p>As shown in Figure <ref type="figure">7</ref>, the loss landscape is significantly 'flatter' in the blue regions (near minima) for the ThermaStrain than the baseline approach. Meaning co-teaching enables easier propagation through the loss landscape and identification of minima, resulting in a more effective stress sensing performance by the ThermaStrain model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">DISCUSSION ON APPLICATIONS AND DEPLOYMENT OF THE THERMASTRAIN</head><p>This section discusses the scenarios where ThermaStrain will be more effective than wearable-based stress sensing solutions (Discussed in Section 8.1). Additionally, it expounds upon the deployment challenges associated with deploying ThermaStrain in real-world settings (Discussed in Section 8.2). This discussion encompasses scenarios involving multiple individuals, constrained visualization, and deployment conditions, camera angles, distances, backgrounds, participant's postures, etc., that were not encountered during the training phase. Finally, the Section 8.3 discusses the ethical and practical considerations for real-world deployment of ThermaStrain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.1">Application Scenarios of ThermaStrain</head><p>As shown in Section 6.3, while ThermaStrain outperforms the thermal-video-based state-of-the-art solutions, its efficacy is relatively lower than the EDA-based stress sensing solutions. Nevertheless, contactless thermal video-based stress detection presents distinct applications and deployment possibilities that are challenging to achieve through EDA or other wearable-based solutions. Two such example scenarios are discussed below.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.1.1">Application in Smart</head><p>Health. While wearable sensors find extensive application in healthcare monitoring scenarios <ref type="bibr">[8]</ref>, contactless sensors present distinct advantages over wearables in specific situations.</p><p>For instance, the non-intrusive characteristics of contactless sensors render them especially suitable for assessing stress in vulnerable populations, such as elderly individuals who may have impaired memory function, as observed in cases of Dementia <ref type="bibr">[28,</ref><ref type="bibr">59,</ref><ref type="bibr">98]</ref>. Wearable sensors necessitate patients to wear or carry batterypowered devices, which can lead to discomfort and inconvenience due to frequent recharging of batteries <ref type="bibr">[115]</ref>. In the case of Dementia, patients might forget to wear or charge these devices. Additionally, wearable sensors can pose risks to the safety of elderly individuals, e.g., a recent incident involved an elderly woman strangled by her fall detection pendant <ref type="foot">3</ref> .</p><p>Consequently, contactless sensing solutions present an effective continuous stress assessment alternative. While RGB video-based solutions are already making their way into commercial use for monitoring elderly health <ref type="bibr">[97]</ref>, privacy concerns limit their widespread adaptation <ref type="bibr">[124]</ref>. Thermal imaging offers relatively higher privacy protection compared to RGB-based alternatives. For instance, unlike RGB imaging, ThermaStrain ensures contextual or environmental information protection by not capturing non-human body content <ref type="bibr">[18,</ref><ref type="bibr">35]</ref>. Hence, ThermaStrain holds the potential as a highly suitable option for such vulnerable populations. This approach requires no active participation from patients, such as device recharging or wearing, and enhances safety due to its passive contactless operation.</p><p>8.1.2 Application in Smart Work-Place. Stress monitoring in smart workplaces, whether in manufacturing contexts <ref type="bibr">[66]</ref> or smart offices <ref type="bibr">[5]</ref>, is crucial for safeguarding employee well-being, optimizing productivity, enhancing workplace safety, reducing staff turnover, and fostering a positive work environment.</p><p>Presently, prevalent techniques often involve the measurement of EDA through disc electrodes <ref type="bibr">[9]</ref> or through the utilization of wearable devices like the Empatica E4 or the Apple Watch <ref type="bibr">[7]</ref>. Positioning disc electrodes at the most sensitive bodily sites, such as the feet or fingers <ref type="bibr">[109]</ref>, can impose significant inconvenience on users or may even be impractical, such as in office settings where hands are engaged in typing or other activities. Additionally, since workplaces involve multiple individuals, using wearables can incur substantial costs. Furthermore, some individuals find it uncomfortable to wear such wearable devices for extended durations consistently <ref type="bibr">[48]</ref>.</p><p>Hence, contactless but relatively privacy preserving ThermaStrain can be a suitable alternative capable of simultaneously assessing stress in multiple individuals at a relatively affordable cost. Notably, stress assessments aimed at quantifying employee well-being within smart workplaces <ref type="bibr">[64,</ref><ref type="bibr">74]</ref> often occur in an aggregated manner rather than being conducted in real-time, such as on a per-minute basis. This characteristic ensures that the efficacy of the use case remains unaffected, even in scenarios where employees might be momentarily obscured due to occlusion. Significantly, thermal camera-based solutions are already being incorporated into workplaces for tasks like employee health screening <ref type="bibr">[16]</ref> and security measures <ref type="bibr">[101]</ref>. This trend paves the path for smoother integration of thermal-video-based stress assessment solutions like ThermaStrain into smart workplaces.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.2">Deployment Procedure of ThermaStrain in Real-World Scenarios</head><p>Three challenges require attention for the successful practical implementation of ThermaStrain.</p><p>(1) Achieving effective body segmentation and accurately identifying segments corresponding to the target users undergoing stress assessment in scenarios involving multiple individuals is crucial (Discusses in Section 8.2.1). ( <ref type="formula">2</ref>) Due to real-world occlusion scenarios in multi-person settings, only partial body segments might be accessible. In such cases, ThermaStrain must demonstrate superior performance compared to the thermal and co-teaching baselines outlined in Sections 6.3 and 6.5 when dealing with partial body segment information (Discussed in Section 8.2.2). (3) ThermaStrain needs to maintain its stress-assessment performance while evaluating on distances, angles, indoor settings, and scenarios that are not present during its training (Discussed in Section 8.2.3).</p><p>Evaluation and discussion on these challenges are below:</p><p>8.2.1 Segmentation and Person-Identification in Multi-Person Scenarios. We developed an integrated framework combining pre-trained human body segmentation and re-identification models to achieve simultaneous human body segmentation and identification. Initially, thermal frames are inputted into the segmentation model to generate pixel-wise human segmentation, producing disjoint object segments. Subsequently, these object segments are fed into the human re-identification model to assess if it belongs to one of the target individuals whose stress is being assessed.</p><p>As discussed in Section 4.4, for human body segmentation, we use the pre-trained DetectorRS <ref type="bibr">[91]</ref> model on Microsoft COCO dataset <ref type="bibr">[70]</ref>, that identifies the human body regions in the thermal frame and masks all other parts of the background. We use pre-trained Omni-Scale Network (OSNet) <ref type="bibr">[122]</ref> for human re-identification. The OSNet comprises a residual block composed of multiple convolutional feature streams, each detecting features at a certain scale. This enables OSNet to learn omni-scale feature learning. We take the pre-trained checkpoint provided by the author <ref type="bibr">[121]</ref>.</p><p>We conducted a small evaluation for a five-person indoor stress assessment scenario to assess the framework's effectiveness. Initially, we gathered a few seconds of data from each of the five participants separately for finetuning the human re-identification model. Subsequently, we conducted three sessions where varying subsets of the five individuals appeared simultaneously in front of the camera. Two sessions involved three different participants, while the remaining session had four participants. For the evaluation of body segmentation and person identification, the data were annotated by two graduate students, achieving an inter-rater reliability rate of over 94.91% (0.94+), specifically measured using Cohen's kappa statistic <ref type="bibr">[78]</ref>. Frames featuring multiple individuals were processed by the DetectorRS human body segmentation model to extract object segments. We fine-tuned the Omni-Scale Network using the initial data collected individually, then evaluated the model's performance on the three multi-person sessions. The fine-tuned Omni-Scale Network achieved an impressive 97.29% accuracy in re-identifying participants during concurrent appearances.</p><p>Figure <ref type="figure">8</ref> illustrates instances of the integrated framework in action. When multiple individuals are present, the framework distinguishes and separates their respective body segments, detects corresponding identifications, and generates distinct frames for each identified body region along with the person's identification label. Each individual's specific frame, containing only their body region's thermal data, is inputted into the ThermaStrain model for stress assessment.</p><p>Given that in multi-person scenarios, only partial body segments might be accessible, the subsequent section delves into the discussion about the effectiveness of the ThermaStrain model, as well as different thermal and co-teaching baselines, when dealing with partially occluded body segments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.2.2">Stress Assessment when Human Body</head><p>Segment is Partially Masked. Occlusion presents a challenge in stress detection. This section discusses a pseudo-partially-masked-body-segment data augmentation to address this challenge. In this approach, during the training process, a portion of the participant's body is randomly masked with zeros. To facilitate this, a pre-trained multi-person human body part segmentation model named CDCL (Cross-Domain Complementary Learning) <ref type="bibr">[69]</ref> is adapted, utilizing which we segment the human body into eight distinct sections. The CDCL recognizes pixel-wise human body part segmentations, such as the head, torso, upper arms, and forearms, from thermal frames. Subsequently, these segmentations were further refined into left and right divisions for each body part, resulting in eight subdivisions: the left face, the right face, the left torso, the right torso, the left upper arm, the right upper arm, the left forearm, and the right forearm.</p><p>We trained the ThermaStrain model alongside the baselines outlined in Sections 6.3 and 6.5, where there was a 23% chance of masking one out of the eight parts, a 23% chance of masking two parts, a 24% chance of masking three parts, and a 30% chance of keeping all parts unmasked. An example of pseudo one-body-part masked human-body segment frames, alongside the eight distinct body sections identified by CDCL from the original frame, is shown in Figure <ref type="figure">9</ref>. The evaluation outcomes for the pseudo-augmented trained ThermaStrain and the baselines are presented in Table <ref type="table">6</ref>. This evaluation encompasses scenarios where different body segments are masked/occluded. Comparing these results with those in Table <ref type="table">3</ref>, it becomes apparent that the training process involving significant partial noise causes a reduction of approximately 2% in ThermaStrain's F1 score for 'without any masking body segments.' Nevertheless, the model sustains a satisfactory performance level across scenarios involving masking different body parts, consistently outperforming all the baselines. Notably, according to Table <ref type="table">6</ref>, masking the head and body exerts a more pronounced impact on performance than other parts. Additionally, masking any two body sections simultaneously reduces ThermaStrain's accuracy and F1 score to on avg. 72% and 0.69 F1 scores, outperforming the thermal baseline (on avg. 68% accuracy and 0.65 F1 score) and best co-teaching baseline Hallucination (on avg. 69.7% accuracy and 0.66 F1 score).</p><p>More robust strategies are available to tackle the challenge of partial body masking due to occlusion. These methods include reconstructing the missing segments <ref type="bibr">[114]</ref> and incorporating pyramid perception [? ], which can potentially enhance the partial body stress sensing performance. However, considering this paper's primary focus is on co-teaching, the evaluation in this section was confined to pseudo-data augmentation. This evaluation demonstrates the viability of ThermaStrain in scenarios involving occlusion of partial body segments and its consistent outperformance compared to the baselines.</p><p>Finally, this section's evaluations and discussions demonstrate the ThermaStrain's robustness against diverse occlusion scenarios compared to the baselines, thereby showcasing its improved utility in real-world settings.  <ref type="figure">10b</ref>. The participant in this study was a male who did not appear in the training set or validation set of the ThermaStrain model. In contrast to the data collection setup described in Section 4, the thermal camera was positioned at a forty-five-degree angle to the participant's left front and maintained a distance of about three feet from the participant. Also, the background environment setting (e.g., a whiteboard in the background) was different. The experiment spanned two consecutive days with a single individual: on the first day, the participant engaged in a LeetCode contest, simulating stress conditions, while on the following day, the participant relaxed by watching random YouTube videos of his choice. Each session lasted for approximately an hour. Data from the participant were captured using both the thermal camera and the Empatica E4 device. Following the procedure outlined in Section 4.3, we calculated the LF/HF ratio of the participant's data collected with Empatica E4. The average LF/HF ratio for the first day was 1.71, which exceeded the ratio of 1.57 observed on the second day. This indicates higher stress was experienced by the participant on the first day. To statistically confirm this observed trend, we conducted a one-way ANOVA test, revealing a highly significant impact of stress (p-value=6.397e-07). No data segments were excluded from this section's analysis. We treated all data from the first day as stress instances and all data from the second day as non-stress instances.</p><p>The data collected from this experiment represents a real-deployment scenario for ThermaStrain, wherein the thermal camera angle, distance, participant's posture (i.e., seated behind a desk instead of standing in Section 4), stress-inducing task conditions, indoor environment, and background were unobserved during the model's training. Without any further adjustments (i.e., re-training), we evaluated the previously trained ThermaStrain model from Section 8.2.2 on this collected indoor one-person workplace scenario dataset. The evaluation yielded an accuracy of 84.59% and an F1 score of 0.8398, aligning with ThermaStrain's performance in Table <ref type="table">1</ref>.</p><p>While a comprehensive study involving numerous participants, diverse scenarios, and varied indoor settings would be essential to establish ThermaStrain's robustness to unseen data scenarios, this section's evaluation showcases its potential for real-world deployments across various applications. Before gathering and processing thermal information, the application will secure informed consent from individuals whose data will be utilized. This ensures that participants understand data usage's purpose and potential consequences, allowing them to opt out. Clear communication regarding thermal information processing's aims, methods, and potential outcomes will foster trust and facilitate informed choices.</p><p>For those who opt out or do not pertain to the target group of individuals for stress assessment, their data will remain unused and unprocessed. As discussed in Section 8.2.1, it is possible to identify the body segments of the target users accurately. Our proposed deployment procedure (discussed in Sections 4.4 &amp; 8.2) zeros out all other content of the except target body segment, hence other individuals' will be masked-out (i.e., zeroed out), and no processing will be performed on their information. Similarly, no information from the indoor environment will be processed, such as furniture, personal items, books, addresses, displayed documents, content within photo frames, and similar items, safeguarding against the leakage of environmental information and preserving privacy <ref type="bibr">[18,</ref><ref type="bibr">21]</ref>.</p><p>Additionally, on-device computation offers superior security and privacy <ref type="bibr">[116]</ref> compared to cloud-based alternatives. Our real-time evaluation in Section 6.8 verifies that the ThermaStrain approach can operate efficiently on resource-constrained edge platforms like Nvidia Jetson Nano, ensuring secure and privacy-preserving stress assessment in real-world settings.</p><p>The proliferation of thermal and image-based human-centric applications is driven by advancements in sensors and AI. Collaboration among experts spanning ethics, law, social sciences, and technology is imperative for a comprehensive ethical approach. While this paper adheres to prevailing ethical standards, future multidisciplinary collaboration endeavors will yield more well-rounded solutions for thermal human-centric sensing applications. Nevertheless, such endeavors remain beyond the scope of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">DISCUSSION ON STUDY LIMITATIONS</head><p>This section discusses the study limitations and future research scopes of ThermaStrain approach.</p><p>Physiological Signals as Aiding Modality in Co-teaching: It is important to note that the presented solution showed that co-teaching enhances thermal stress sensing performance with the aid of EDA during training, a physiological sensing modality, during training. However, we also evaluated ECG and HR as aiding modalities. However, EDA outperformed others. Hence the presented paper includes only the EDA co-teaching solution and corresponding results. However, we cannot conclude inclusion of or HR as a co-teaching aiding modality would not be beneficial. Exhaustive analysis with larger datasets and scenarios is needed to make such a conclusion, which was out of the scope of this paper.</p><p>In-the-wild Evaluation: A limitation of this study is that we specifically analyzed indoor data derived from laboratory environments. Given the primary focus on co-teaching, a comprehensive evaluation in real-world conditions was not within the study's scope. It is worth noting that Section 8.2 offers extensive analysis and discourse regarding the deployment of ThermaStrain in real-world multi-person scenarios, highlighting its superior viability in comparison to the baselines. However, numerous real-world factors remain unexamined. For instance, the impact of ambient temperature on thermal stress assessment was not assessed, as the data collection took place solely in temperature-controlled indoor settings with no recording of ambient temperatures for each session. Therefore, future work would benefit from sampling data from a broader range of in-the-wild situations to determine the boundaries of the ThermaStrain model's predictive validity.</p><p>Study Protocol: As detailed in Section 4.2, this study's data collection approach adhered to established literature in behavioral science and psychology <ref type="bibr">[11,</ref><ref type="bibr">34,</ref><ref type="bibr">46]</ref>. For instance, to avoid any bias from residual stress effects, non-stress-inducing tasks were followed by stress-inducing tasks <ref type="bibr">[11,</ref><ref type="bibr">34,</ref><ref type="bibr">46]</ref>. Additionally, as highlighted by <ref type="bibr">[34]</ref>, no interfering activities, like questionnaires, occurred at least 15 minutes before introducing the stress-inducing tasks. Our analysis with HRV on Section 4.3 confirms heightened participant stress levels during stress-inducing tasks, indicating the protocol's effectiveness. Nonetheless, a more comprehensive assessment, such as randomizing the order of the stress-inducing tasks, would unveil the most efficacious stress-inducing protocol. This falls within the domain of behavioral science/psychology research and is beyond the scope of this paper.</p><p>Age, Sex, and Demography: With respect to sex, our dataset was relatively balanced (12 male, 20 female). Our analysis showed that ThermaStrain achieves similar F1 scores (with 1% higher in females than males) and accuracy (almost the same). However, with respect to age and demography, the dataset was limited. Future studies involving a larger population with diverse ages, sex, and demographic distributions would be highly beneficial. It will allow for a more comprehensive understanding of how the thermal signature of stress manifests across different populations and demographic groups, potentially uncovering any variations or patterns. However, such analysis was out of the scope of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="10">CONCLUSION</head><p>Existing studies have examined uni-modal and multimodal thermal stress sensing solutions, each with its advantages and limitations. While uni-modal thermal solutions offer non-intrusive sensing, they may lack effectiveness. On the other hand, multimodal approaches can improve performance but may compromise the non-intrusive nature. ThermaStrain combines the benefits of both approaches, providing enhanced stress-sensing performance in a non-intrusive and passive manner. The study collected a comprehensive multimodal thermal stress sensing dataset with diverse stressors and variable distances. Extensive evaluations demonstrated ThermaStrain's ability to generalize and adapt to unknown scenarios, conditions, and environments. These evaluations validated ThermaStrain's fidelity to the co-teaching paradigm and its capacity to enhance stress sensing. hyperspace. To address this challenge prior works <ref type="bibr">[33,</ref><ref type="bibr">45,</ref><ref type="bibr">67]</ref> choose a starting point in the parameter subspace &#120579; and choose two random Gaussian directions vectors given by &#120575; and &#120578; and plot the graph for: &#119891; (&#120572;, &#120573;) = &#119871;(&#120579; + &#120572;&#120575; + &#120573;&#120578;)</p><p>This equation generates 3D visualization of the loss landscape with XY region bounded by two scalar quantities or step sizes &#120572; (x-axis) and &#120573; (y-axis) and corresponding loss for the &#119871;(&#120579; + &#120572;&#120575; + &#120573;&#120578;) as the z-axis. Furthermore, Li et al. <ref type="bibr">[67]</ref> suggest using filter normalized direction vectors &#120575; and &#120578; helps to capture the natural distance scale of the loss surfaces (details can be found in the original work <ref type="bibr">[67]</ref>). We used the loss-landscapes library <ref type="bibr">[75]</ref> which also uses filter-normalization appraoch <ref type="bibr">[67]</ref> to generate the 3D loss landscape plots for the best thermal baseline, and our ThermaStrain approach which is shown in the paper (Figure <ref type="figure">7</ref>). We used the cross-entropy loss for the graph generation, and both the graphs were generated for a randomly selected participant from the validation set for step sizes (&#120572; = 40, &#120573; = 40).</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:3</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:5</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_3"><p>https://www.thermal.com/compact-series.html</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_4"><p>https://www.empatica.com/research/e4/ Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:7</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_5"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:9</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_6"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:11</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_7"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:13</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_8"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:15</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_9"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:17</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_10"><p>https://www.huffpost.com/entry/medical-necklace-strangles-woman_n_56d75817e4b0871f60edbb47 Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_11"><p>4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:19</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_12"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:21</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_13"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:23</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_14"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:25</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_15"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:27</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_16"><p>Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 7, No. 4, Article 189. Publication date: December 2023. "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection &#8226; 189:29</p></note>
		</body>
		</text>
</TEI>
