NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multimodal Neurophysiological Transformer for Emotion Recognition

https://doi.org/10.1109/EMBC48229.2022.9871421

Koorathota, Sharath; Khan, Zain; Lapborisuth, Pawan; Sajda, Paul (July 2022, 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC))

Understanding neural function often requires multiple modalities of data, including electrophysiogical data, imaging techniques, and demographic surveys. In this paper, we introduce a novel neurophysiological model to tackle major challenges in modeling multimodal data. First, we avoid non-alignment issues between raw signals and extracted, frequency-domain features by addressing the issue of variable sampling rates. Second, we encode modalities through “cross-attention” with other modalities. Lastly, we utilize properties of our parent transformer architecture to model long-range dependencies between segments across modalities and assess intermediary weights to better understand how source signals affect prediction. We apply our Multimodal Neurophysiological Transformer (MNT) to predict valence and arousal in an existing open-source dataset. Experiments on non-aligned multimodal time-series show that our model performs similarly and, in some cases, outperforms existing methods in classification tasks. In addition, qualitative analysis suggests that MNT is able to model neural influences on autonomic activity in predicting arousal. Our architecture has the potential to be fine-tuned to a variety of downstream tasks, including for BCI systems.
more » « less
Full Text Available
Predictive Power of Pupil Dynamics in a Team Based Virtual Reality Task

https://doi.org/10.1109/VRW55335.2022.00147

Qin, Yinuo; Zhang, Weijia; Lee, Richard; Sun, Xiaoxiao; Sajda, Paul (March 2022, 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW))

Assessing and tracking physiological and cognitive states of multiple individuals interacting in virtual environments is of increasing interest to the virtual reality (VR) community. In this paper, we describe a team-based VR task termed the Apollo Distributed Control Task (ADCT), where individuals, via the single independent degree-of-freedom control and limited environmental views, must work together to guide a virtual spacecraft back to Earth. Novel to the experiment is that 1) we simultaneously collect multiple physiological measures including electroencephalography (EEG), pupillometry, speech signals, and individual's actions, 2) we regulate the the difficulty of the task and the type of communication between the teammates. Focusing on the analysis of pupil dynamics, which have been linked to a number of cognitive and physiological processes such as arousal, cognitive control, and working memory, we find that pupil diameter changes are predictive of multiple task-related dimensions, including the difficulty of the task, the role of the team member, and the type of communication.
more » « less
Full Text Available
Integrating neural and ocular attention reorienting signals in virtual reality

https://doi.org/10.1088/1741-2552/ac4593

Lapborisuth, Pawan; Koorathota, Sharath; Wang, Qi; Sajda, Paul (January 2022, Journal of Neural Engineering)

Abstract Objective.Reorienting is central to how humans direct attention to different stimuli in their environment. Previous studies typically employ well-controlled paradigms with limited eye and head movements to study the neural and physiological processes underlying attention reorienting. Here, we aim to better understand the relationship between gaze and attention reorienting using a naturalistic virtual reality (VR)-based target detection paradigm.Approach.Subjects were navigated through a city and instructed to count the number of targets that appeared on the street. Subjects performed the task in a fixed condition with no head movement and in a free condition where head movements were allowed. Electroencephalography (EEG), gaze and pupil data were collected. To investigate how neural and physiological reorienting signals are distributed across different gaze events, we used hierarchical discriminant component analysis (HDCA) to identify EEG and pupil-based discriminating components. Mixed-effects general linear models (GLM) were used to determine the correlation between these discriminating components and the different gaze events time. HDCA was also used to combine EEG, pupil and dwell time signals to classify reorienting events.Main results.In both EEG and pupil, dwell time contributes most significantly to the reorienting signals. However, when dwell times were orthogonalized against other gaze events, the distributions of the reorienting signals were different across the two modalities, with EEG reorienting signals leading that of the pupil reorienting signals. We also found that the hybrid classifier that integrates EEG, pupil and dwell time features detects the reorienting signals in both the fixed (AUC = 0.79) and the free (AUC = 0.77) condition.Significance.We show that the neural and ocular reorienting signals are distributed differently across gaze events when a subject is immersed in VR, but nevertheless can be captured and integrated to classify target vs. distractor objects to which the human subject orients.
more » « less
A Recurrent Neural Network for Attenuating Non-cognitive Components of Pupil Dynamics

https://doi.org/10.3389/fpsyg.2021.604522

Koorathota, Sharath; Thakoor, Kaveri; Hong, Linbi; Mao, Yaoli; Adelman, Patrick; Sajda, Paul (February 2021, Frontiers in Psychology)
null (Ed.)
There is increasing interest in how the pupil dynamics of the eye reflect underlying cognitive processes and brain states. Problematic, however, is that pupil changes can be due to non-cognitive factors, for example luminance changes in the environment, accommodation and movement. In this paper we consider how by modeling the response of the pupil in real-world environments we can capture the non-cognitive related changes and remove these to extract a residual signal which is a better index of cognition and performance. Specifically, we utilize sequence measures such as fixation position, duration, saccades, and blink-related information as inputs to a deep recurrent neural network (RNN) model for predicting subsequent pupil diameter. We build and evaluate the model for a task where subjects are watching educational videos and subsequently asked questions based on the content. Compared to commonly-used models for this task, the RNN had the lowest errors rates in predicting subsequent pupil dilation given sequence data. Most importantly was how the model output related to subjects' cognitive performance as assessed by a post-viewing test. Consistent with our hypothesis that the model captures non-cognitive pupil dynamics, we found (1) the model's root-mean square error was less for lower performing subjects than for those having better performance on the post-viewing test, (2) the residuals of the RNN (LSTM) model had the highest correlation with subject post-viewing test scores and (3) the residuals had the highest discriminability (assessed via area under the ROC curve, AUC) for classifying high and low test performers, compared to the true pupil size or the RNN model predictions. This suggests that deep learning sequence models may be good for separating components of pupil responses that are linked to luminance and accommodation from those that are linked to cognition and arousal.
more » « less
Full Text Available
Investigating Evoked EEG Responses to Targets Presented in Virtual Reality

https://doi.org/10.1109/EMBC.2019.8856761

Lapborisuth, Pawan; Faller, Josef; Koss, Jonathan; Waytowich, Nicholas R.; Touryan, Jonathan; Sajda, Paul (July 2019, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC))
null (Ed.)
Virtual reality (VR) offers the potential to study brain function in complex, ecologically realistic environments. However, the additional degrees of freedom make analysis more challenging, particularly with respect to evoked neural responses. In this paper we designed a target detection task in VR where we varied the visual angle of targets as subjects moved through a three dimensional maze. We investigated how the latency and shape of the classic P300 evoked response varied as a function of locking the electroencephalogram data to the target image onset, the target-saccade intersection, and the first fixation on the target. We found, as expected, a systematic shift in the timing of the evoked responses as a function of the type of response locking, as well as a difference in the shape of the waveforms. Interestingly, single-trial analysis showed that the peak discriminability of the evoked responses does not differ between image locked and saccade locked analysis, though it decreases significantly when fixation locked. These results suggest that there is a spread in the perception of visual information in VR environments across time and visual space. Our results point to the importance of considering how information may be perceived in naturalistic environments, specifically those that have more complexity and higher degrees of freedom than in traditional laboratory paradigms.
more » « less
Full Text Available

Search for: All records