- Award ID(s):
- 1724297
- PAR ID:
- 10327420
- Date Published:
- Journal Name:
- Frontiers in Systems Neuroscience
- Volume:
- 15
- ISSN:
- 1662-5137
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments.more » « less
-
This exploratory study examined the simultaneous interactions and relative contributions of bottom-up social information (regional dialect, speaking style), top-down contextual information (semantic predictability), and the internal dynamics of the lexicon (neighborhood density, lexical frequency) to lexical access and word recognition. Cross-modal matching and intelligibility in noise tasks were conducted with a community sample of adults at a local science museum. Each task featured one condition in which keywords were presented in isolation and one condition in which they were presented within a multiword phrase. Lexical processing was slower and more accurate when keywords were presented in their phrasal context, and was both faster and more accurate for auditory stimuli produced in the local Midland dialect. In both tasks, interactions were observed among stimulus dialect, speaking style, semantic predictability, phonological neighborhood density, and lexical frequency. These interactions revealed that bottom-up social information and top-down contextual information contribute more to speech processing than the internal dynamics of the lexicon. Moreover, the relatively stronger bottom-up social effects were observed in both the isolated word and multiword phrase conditions, suggesting that social variation is central to speech processing, even in non-interactive laboratory tasks. At the same time, the specific interactions observed differed between the two experiments, reflecting task-specific demands related to processing time constraints and signal degradation.
-
Comparing representations of complex stimuli in neural network layers to human brain representations or behavioral judgments can guide model development. However, even qualitatively distinct neural network models often predict similar representational geometries of typical stimulus sets. We propose a Bayesian experimental design approach to synthesizing stimulus sets for adjudicating among representational models efficiently. We apply our method to discriminate among candidate neural network models of behavioral face dissimilarity judgments. Our results indicate that a neural network trained to invert a 3D-face-model graphics renderer is more human-aligned than the same architecture trained on identification, classification, or autoencoding. Our proposed stimulus synthesis objective is generally applicable to designing experiments to be analyzed by representational similarity analysis for model comparison.more » « less
-
Abstract Face perception is a fundamental aspect of human social interaction, yet most research on this topic has focused on single modalities and specific aspects of face perception. Here, we present a comprehensive multimodal dataset for examining facial emotion perception and judgment. This dataset includes EEG data from 97 unique neurotypical participants across 8 experiments, fMRI data from 19 neurotypical participants, single-neuron data from 16 neurosurgical patients (22 sessions), eye tracking data from 24 neurotypical participants, behavioral and eye tracking data from 18 participants with ASD and 15 matched controls, and behavioral data from 3 rare patients with focal bilateral amygdala lesions. Notably, participants from all modalities performed the same task. Overall, this multimodal dataset provides a comprehensive exploration of facial emotion perception, emphasizing the importance of integrating multiple modalities to gain a holistic understanding of this complex cognitive process. This dataset serves as a key missing link between human neuroimaging and neurophysiology literature, and facilitates the study of neuropsychiatric populations.
-
Multisensory cutaneous displays have been developed to enhance the realism of objects touched in virtual environments. However, when stimuli are presented concurrently, tactile stimuli can mask thermal perception and so both these modalities may not be available to convey information to the user. In this study, we aim to determine the simultaneity window using the Simultaneity Judgment Task. A device was created that could present both tactile and thermal stimuli to the thenar eminence of the participant’s left hand with various stimulus onset asynchronies (SOA). The experimental results indicated that the simultaneity window width was 639 ms ranging from -561 ms to 78 ms. The point of subjective simultaneity (PSS) was at -242 ms, indicating that participants perceived simultaneity best when the thermal stimulus preceded the tactile stimulus by 242 ms. These findings have implications for the design of stimulus presentation in multisensory cutaneous displays.more » « less