skip to main content


Title: The Role of Unimodal Feedback Pathways in Gender Perception During Activation of Voice and Face Areas
Cross-modal effects provide a model framework for investigating hierarchical inter-areal processing, particularly, under conditions where unimodal cortical areas receive contextual feedback from other modalities. Here, using complementary behavioral and brain imaging techniques, we investigated the functional networks participating in face and voice processing during gender perception, a high-level feature of voice and face perception. Within the framework of a signal detection decision model, Maximum likelihood conjoint measurement (MLCM) was used to estimate the contributions of the face and voice to gender comparisons between pairs of audio-visual stimuli in which the face and voice were independently modulated. Top–down contributions were varied by instructing participants to make judgments based on the gender of either the face, the voice or both modalities ( N = 12 for each task). Estimated face and voice contributions to the judgments of the stimulus pairs were not independent; both contributed to all tasks, but their respective weights varied over a 40-fold range due to top–down influences. Models that best described the modal contributions required the inclusion of two different top–down interactions: (i) an interaction that depended on gender congruence across modalities (i.e., difference between face and voice modalities for each stimulus); (ii) an interaction that depended on the within modalities’ gender magnitude. The significance of these interactions was task dependent. Specifically, gender congruence interaction was significant for the face and voice tasks while the gender magnitude interaction was significant for the face and stimulus tasks. Subsequently, we used the same stimuli and related tasks in a functional magnetic resonance imaging (fMRI) paradigm ( N = 12) to explore the neural correlates of these perceptual processes, analyzed with Dynamic Causal Modeling (DCM) and Bayesian Model Selection. Results revealed changes in effective connectivity between the unimodal Fusiform Face Area (FFA) and Temporal Voice Area (TVA) in a fashion that paralleled the face and voice behavioral interactions observed in the psychophysical data. These findings explore the role in perception of multiple unimodal parallel feedback pathways.  more » « less
Award ID(s):
1724297
NSF-PAR ID:
10327420
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Frontiers in Systems Neuroscience
Volume:
15
ISSN:
1662-5137
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This exploratory study examined the simultaneous interactions and relative contributions of bottom-up social information (regional dialect, speaking style), top-down contextual information (semantic predictability), and the internal dynamics of the lexicon (neighborhood density, lexical frequency) to lexical access and word recognition. Cross-modal matching and intelligibility in noise tasks were conducted with a community sample of adults at a local science museum. Each task featured one condition in which keywords were presented in isolation and one condition in which they were presented within a multiword phrase. Lexical processing was slower and more accurate when keywords were presented in their phrasal context, and was both faster and more accurate for auditory stimuli produced in the local Midland dialect. In both tasks, interactions were observed among stimulus dialect, speaking style, semantic predictability, phonological neighborhood density, and lexical frequency. These interactions revealed that bottom-up social information and top-down contextual information contribute more to speech processing than the internal dynamics of the lexicon. Moreover, the relatively stronger bottom-up social effects were observed in both the isolated word and multiword phrase conditions, suggesting that social variation is central to speech processing, even in non-interactive laboratory tasks. At the same time, the specific interactions observed differed between the two experiments, reflecting task-specific demands related to processing time constraints and signal degradation.

     
    more » « less
  2. Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments. 
    more » « less
  3. Abstract

    As we gather noisy sensory information from the environment, prior knowledge about the likely cause(s) of sensory input can be leveraged to facilitate perceptual judgments. Here, we investigated the computational and neural manifestation of cued expectations in human subjects as they performed a probabilistic face/house discrimination task in which face and house stimuli were preceded by informative or neutral cues. Drift-diffusion modeling of behavioral data showed that cued expectations biased both the baseline (pre-sensory) and drift-rate (post-sensory) of evidence accumulation. By employing a catch-trial functional MRI design we were able to isolate neural signatures of expectation during pre- and post-sensory stages of decision processing in face- and house-selective areas of inferior temporal cortex (ITC). Cue-evoked timecourses were modulated by cues in a manner consistent with a pre-sensory prediction signal that scaled with probability. Sensory-evoked timecourses resembled a prediction-error signal, greater in magnitude for surprising than expected stimuli. Individual differences in baseline and drift-rate biases showed a clear mapping onto pre- and post-sensory fMRI activity in ITC. These findings highlight the specificity of perceptual expectations and provide new insight into the convergence of top-down and bottom-up signals in ITC and their distinct interactions prior to and during sensory processing.

     
    more » « less
  4. Visual statistical learning (VSL), the unsupervised learning of statistical contingencies across time and space, may play a key role in efficient and predictive encoding of the perceptual world. How VSL capabilities vary as a function of ongoing task demands is still poorly understood. VSL is modulated by selective attention and faces interference from some secondary tasks, but there is little evidence that the types of contingencies learned in VSL are sensitive to task demands. We found a powerful effect of task on what is learned in VSL. Participants first completed a visual familiarization task requiring judgments of face gender (female/male) or scene location (interior/exterior). Statistical regularities were embedded between stimulus pairs. During a surprise recognition phase, participants showed less recognition for pairs that had required a change in response key (e.g., female followed by male) or task (e.g., female followed by indoor) during familiarization. When familiarization required detection of "flicker" or "jiggle" events unrelated to image content, there was weaker, but uniform, VSL across pair types. These results suggest that simple task manipulations play a strong role in modulating the distribution of learning over different pair combinations. Such variations may arise from task and response conflict or because the manner in which images are processed is altered. 
    more » « less
  5. It has been postulated that the brain is organized by “metamodal,” sensory-independent cortical modules capable of performing tasks (e.g., word recognition) in both “standard” and novel sensory modalities. Still, this theory has primarily been tested in sensory-deprived individuals, with mixed evidence in neurotypical subjects, thereby limiting its support as a general principle of brain organization. Critically, current theories of metamodal processing do not specify requirements for successful metamodal processing at the level of neural representations. Specification at this level may be particularly important in neurotypical individuals, where novel sensory modalities must interface with existing representations for the standard sense. Here we hypothesized that effective metamodal engagement of a cortical area requires congruence between stimulus representations in the standard and novel sensory modalities in that region. To test this, we first used fMRI to identify bilateral auditory speech representations. We then trained 20 human participants (12 female) to recognize vibrotactile versions of auditory words using one of two auditory-to-vibrotactile algorithms. The vocoded algorithm attempted to match the encoding scheme of auditory speech while the token-based algorithm did not. Crucially, using fMRI, we found that only in the vocoded group did trained-vibrotactile stimuli recruit speech representations in the superior temporal gyrus and lead to increased coupling between them and somatosensory areas. Our results advance our understanding of brain organization by providing new insight into unlocking the metamodal potential of the brain, thereby benefitting the design of novel sensory substitution devices that aim to tap into existing processing streams in the brain.

    SIGNIFICANCE STATEMENTIt has been proposed that the brain is organized by “metamodal,” sensory-independent modules specialized for performing certain tasks. This idea has inspired therapeutic applications, such as sensory substitution devices, for example, enabling blind individuals “to see” by transforming visual input into soundscapes. Yet, other studies have failed to demonstrate metamodal engagement. Here, we tested the hypothesis that metamodal engagement in neurotypical individuals requires matching the encoding schemes between stimuli from the novel and standard sensory modalities. We trained two groups of subjects to recognize words generated by one of two auditory-to-vibrotactile transformations. Critically, only vibrotactile stimuli that were matched to the neural encoding of auditory speech engaged auditory speech areas after training. This suggests that matching encoding schemes is critical to unlocking the brain's metamodal potential.

     
    more » « less