skip to main content


Title: Narratives imagined in response to instrumental music reveal culture-bounded intersubjectivity
The scientific literature sometimes considers music an abstract stimulus, devoid of explicit meaning, and at other times considers it a universal language. Here, individuals in three geographically distinct locations spanning two cultures performed a highly unconstrained task: they provided free-response descriptions of stories they imagined while listening to instrumental music. Tools from natural language processing revealed that listeners provide highly similar stories to the same musical excerpts when they share an underlying culture, but when they do not, the generated stories show limited overlap. These results paint a more complex picture of music’s power: music can generate remarkably similar stories in listeners’ minds, but the degree to which these imagined narratives are shared depends on the degree to which culture is shared across listeners. Thus, music is neither an abstract stimulus nor a universal language but has semantic affordances shaped by culture, requiring more sustained attention from psychology.  more » « less
Award ID(s):
1734063
NSF-PAR ID:
10345047
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
4
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition. MEG Data MEG data is in FIFF format and can be opened with MNE-Python. Data has been directly converted from the acquisition device native format without any preprocessing. Events contained in the data indicate the stimuli in numerical order. Subjects R2650 and R2652 heard stimulus 11b instead of 11. Predictor Variables The original audio files are copyrighted and cannot be shared, but the make_audio folder contains make_clips.py which can be used to extract the exact clips from the commercially available audiobook (ISBN 978-1480555280). The predictors directory contains all the predictors used in the original study as pickled eelbrain objects. They can be loaded in Python with the eelbrain.load.unpickle function. The TextGrids directory contains the TextGrids aligned to the audio files. Source Localization The localization.zip file contains files needed for source localization. Structural brain models used in the published analysis are reconstructed by scaling the FreeSurfer fsaverage brain (distributed with FreeSurfer) based on each subject's `MRI scaling parameters.cfg` file. This can be done using the `mne.scale_mri` function. Each subject's MEG folder contains a `subject-trans.fif` file which contains the coregistration between MEG sensor space and (scaled) MRI space, which is used to compute the forward solution. 
    more » « less
  2. Musical imagery is the voluntary internal hearing of music in the mind without the need for physical action or external stimulation. Numerous studies have already revealed brain areas activated during imagery. However, it remains unclear to what extent imagined music responses preserve the detailed temporal dynamics of the acoustic stimulus envelope and, crucially, whether melodic expectations play any role in modulating responses to imagined music, as they prominently do during listening. These modulations are important as they reflect aspects of the human musical experience, such as its acquisition, engagement, and enjoyment. This study explored the nature of these modulations in imagined music based on EEG recordings from 21 professional musicians (6 females and 15 males). Regression analyses were conducted to demonstrate that imagined neural signals can be predicted accurately, similarly to the listening task, and were sufficiently robust to allow for accurate identification of the imagined musical piece from the EEG. In doing so, our results indicate that imagery and listening tasks elicited an overlapping but distinctive topography of neural responses to sound acoustics, which is in line with previous fMRI literature. Melodic expectation, however, evoked very similar frontal spatial activation in both conditions, suggesting that they are supported by the same underlying mechanisms. Finally, neural responses induced by imagery exhibited a specific transformation from the listening condition, which primarily included a relative delay and a polarity inversion of the response. This transformation demonstrates the top-down predictive nature of the expectation mechanisms arising during both listening and imagery. 
    more » « less
  3. null (Ed.)
    Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input. 
    more » « less
  4. Abstract Research Highlights

    Children and adults conceptually and perceptually categorize speech and song from age 4.

    Listeners use F0 instability, harmonicity, spectral flux, and utterance duration to determine whether vocal stimuli sound like song.

    Acoustic cue weighting changes with age, becoming adult‐like at age 8 for perceptual categorization and at age 12 for conceptual differentiation.

    Young children are still learning to categorize speech and song, which leaves open the possibility that music‐ and language‐specific skills are not so domain‐specific.

     
    more » « less
  5. null (Ed.)
    Neural Machine Translation (NMT) models have been observed to produce poor translations when there are few/no parallel sentences to train the models. In the absence of parallel data, several approaches have turned to the use of images to learn translations. Since images of words, e.g., horse may be unchanged across languages, translations can be identified via images associated with words in different languages that have a high degree of visual similarity. However, translating via images has been shown to improve upon text-only models only marginally. To better understand when images are useful for translation, we study image translatability of words, which we define as the translatability of words via images, by measuring intra- and inter-cluster similarities of image representations of words that are translations of each other. We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i.e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity. In addition, in line with previous works that show images help more in translating concrete words, we found that concrete words have improved image translatability compared to abstract ones. 
    more » « less