skip to main content


Title: The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension
Abstract The “Narratives” collection aggregates a variety of functional MRI datasets collected while human subjects listened to naturalistic spoken stories. The current release includes 345 subjects, 891 functional scans, and 27 diverse stories of varying duration totaling ~4.6 hours of unique stimuli (~43,000 words). This data collection is well-suited for naturalistic neuroimaging analysis, and is intended to serve as a benchmark for models of language and narrative comprehension. We provide standardized MRI data accompanied by rich metadata, preprocessed versions of the data ready for immediate use, and the spoken story stimuli with time-stamped phoneme- and word-level transcripts. All code and data are publicly available with full provenance in keeping with current best practices in transparent and reproducible neuroimaging.  more » « less
Award ID(s):
1912266
NSF-PAR ID:
10303996
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; « less
Date Published:
Journal Name:
Scientific Data
Volume:
8
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Advances in artificial intelligence have inspired a paradigm shift in human neuroscience, yielding large-scale functional magnetic resonance imaging (fMRI) datasets that provide high-resolution brain responses to thousands of naturalistic visual stimuli. Because such experiments necessarily involve brief stimulus durations and few repetitions of each stimulus, achieving sufficient signal-to-noise ratio can be a major challenge. We address this challenge by introducing GLMsingle , a scalable, user-friendly toolbox available in MATLAB and Python that enables accurate estimation of single-trial fMRI responses ( glmsingle.org ). Requiring only fMRI time-series data and a design matrix as inputs, GLMsingle integrates three techniques for improving the accuracy of trial-wise general linear model (GLM) beta estimates. First, for each voxel, a custom hemodynamic response function (HRF) is identified from a library of candidate functions. Second, cross-validation is used to derive a set of noise regressors from voxels unrelated to the experiment. Third, to improve the stability of beta estimates for closely spaced trials, betas are regularized on a voxel-wise basis using ridge regression. Applying GLMsingle to the Natural Scenes Dataset and BOLD5000, we find that GLMsingle substantially improves the reliability of beta estimates across visually-responsive cortex in all subjects. Comparable improvements in reliability are also observed in a smaller-scale auditory dataset from the StudyForrest experiment. These improvements translate into tangible benefits for higher-level analyses relevant to systems and cognitive neuroscience. We demonstrate that GLMsingle: (i) helps decorrelate response estimates between trials nearby in time; (ii) enhances representational similarity between subjects within and across datasets; and (iii) boosts one-versus-many decoding of visual stimuli. GLMsingle is a publicly available tool that can significantly improve the quality of past, present, and future neuroimaging datasets sampling brain activity across many experimental conditions. 
    more » « less
  2. null (Ed.)
    Abstract Naturalistic stimuli evoke strong, consistent, and information-rich patterns of brain activity, and engage large extents of the human brain. They allow researchers to compare highly similar brain responses across subjects, and to study how complex representations are encoded in brain activity. Here, we describe and share a dataset where 25 subjects watched part of the feature film “The Grand Budapest Hotel” by Wes Anderson. The movie has a large cast with many famous actors. Throughout the story, the camera shots highlight faces and expressions, which are fundamental to understand the complex narrative of the movie. This movie was chosen to sample brain activity specifically related to social interactions and face processing. This dataset provides researchers with fMRI data that can be used to explore social cognitive processes and face processing, adding to the existing neuroimaging datasets that sample brain activity with naturalistic movies. 
    more » « less
  3. Abstract

    Quantifying how brain functional architecture differs from person to person is a key challenge in human neuroscience. Current individualized models of brain functional organization are based on brain regions and networks, limiting their use in studying fine-grained vertex-level differences. In this work, we present the individualized neural tuning (INT) model, a fine-grained individualized model of brain functional organization. The INT model is designed to have vertex-level granularity, to capture both representational and topographic differences, and to model stimulus-general neural tuning. Through a series of analyses, we demonstrate that (a) our INT model provides a reliable individualized measure of fine-grained brain functional organization, (b) it accurately predicts individualized brain response patterns to new stimuli, and (c) for many benchmarks, it requires only 10–20 minutes of data for good performance. The high reliability, specificity, precision, and generalizability of our INT model affords new opportunities for building brain-based biomarkers based on naturalistic neuroimaging paradigms.

     
    more » « less
  4. Abstract

    Occipital cortices of different sighted people contain analogous maps of visual information (e.g. foveal vs. peripheral). In congenital blindness, “visual” cortices respond to nonvisual stimuli. Do visual cortices of different blind people represent common informational maps? We leverage naturalistic stimuli and inter-subject pattern similarity analysis to address this question. Blindfolded sighted (n = 22) and congenitally blind (n = 22) participants listened to 6 sound clips (5–7 min each): 3 auditory excerpts from movies; a naturalistic spoken narrative; and matched degraded auditory stimuli (Backwards Speech, scrambled sentences), during functional magnetic resonance imaging scanning. We compared the spatial activity patterns evoked by each unique 10-s segment of the different auditory excerpts across blind and sighted people. Segments of meaningful naturalistic stimuli produced distinctive activity patterns in frontotemporal networks that were shared across blind and across sighted individuals. In the blind group only, segment-specific, cross-subject patterns emerged in visual cortex, but only for meaningful naturalistic stimuli and not Backwards Speech. Spatial patterns of activity within visual cortices are sensitive to time-varying information in meaningful naturalistic auditory stimuli in a broadly similar manner across blind individuals.

     
    more » « less
  5. Abstract

    Magnetic resonance imaging (MRI) is a technique that scans the anatomical structure of the brain, whereas functional magnetic resonance imaging (fMRI) uses the same basic principles of atomic physics as MRI scans but image metabolic function. A major goal of MRI and fMRI study is to precisely delineate various types of tissues, anatomical structure, pathologies, and detect the brain regions that react to outer stimuli (e.g., viewing an image). As a key feature of these MRI‐based neuroimaging data, voxels (cubic pixels of the brain volume) are highly correlated. However, the associations between voxels are often overlooked in the statistical analysis. We adapt a recently proposed dimension reduction method called the envelope method to analyze neuoimaging data taking into account correlation among voxels. We refer to the modified procedure the envelope chain procedure. Because the envelope chain procedure has not been employed before, we demonstrate in simulations the empirical performance of estimator, and examine its sensitivity when our assumptions are violated. We use the estimator to analyze the MRI data from ADHD‐200 study. Data analyses demonstrate that leveraging the correlations among voxels can significantly increase the efficiency of the regression analysis, thus achieving higher detection power with small sample sizes.

     
    more » « less