Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.
more »
« less
Region-based conversion of neural activity across sessions
A common way to advance our understanding of brain processing is to decode behavior from recorded neural signals. In order to study the neural correlates of learning a task, we would like to decode behavior across the entire timespan of learning, which can take multiple recording sessions across many days. However, decoding across sessions is hindered due to a high amount of session-to-session variability in neural recordings. Here, we propose utilizing multidimensional neural signals from Localized semi-non negative matrix factorization processing (LocaNMF) with high behavioral correlations across sessions, as well as a novel data augmentation method and region-based converter, to optimally align neural recordings. We apply our method to widefield calcium activity across many sessions while a mouse learns a decision-making task. We first decompose each session's neural activity into region-based spatial and temporal components that can reconstruct the data with high variance. Next, we perform data augmentation of the neural data to smooth the variability across trials. Finally, we design a region-based neural converter across sessions that transforms one session's neural signals into another while preserving its dimensionality. We test our approach by decoding the mouse's behavior in the decision-making task, and find that our method outperforms approaches that use purely anatomical information while analyzing neural activity across sessions. By preserving the high dimensionality in the neural data while converting neural activity across sessions, our method can be used towards further analyses of neural data across sessions and the neural correlates of learning.
more »
« less
- Award ID(s):
- 2219876
- PAR ID:
- 10466593
- Publisher / Repository:
- IEEE
- Date Published:
- ISSN:
- 19483546
- ISBN:
- 978-1-6654-6292-1
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Location:
- Baltimore, MD, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Modern recordings of neural activity provide diverse observations of neurons across brain areas, behavioral conditions, and subjects; presenting an exciting opportunity to reveal the fundamentals of brain-wide dynamics. Current analysis methods, however, often fail to fully harness the richness of such data, as they provide either uninterpretable representations (e.g., via deep networks) or oversimplify models (e.g., by assuming stationary dynamics or analyzing each session independently). Here, instead of regarding asynchronous neural recordings that lack alignment in neural identity or brain areas as a limitation, we leverage these diverse views into the brain to learn a unified model of neural dynamics. Specifically, we assume that brain activity is driven by multiple hidden global sub-circuits. These sub-circuits represent global basis interactions between neural ensembles—functional groups of neurons—such that the time-varying decomposition of these sub-circuits defines how the ensembles’ interactions evolve over time non-stationarily and non-linearly. We discover the neural ensembles underlying non-simultaneous observations, along with their non-stationary evolving interactions, with our new model, CREIMBO (Cross-Regional Ensemble Interactions in Multi-view Brain Observations). CREIMBO identifies the hidden composition of per-session neural ensembles through novel graph-driven dictionary learning and models the ensemble dynamics on a low-dimensional manifold spanned by a sparse time-varying composition of the global sub-circuits. Thus, CREIMBO disentangles overlapping temporal neural processes while preserving interpretability due to the use of a shared underlying sub-circuit basis. Moreover, CREIMBO distinguishes session-specific computations from global (session-invariant) ones by identifying session covariates and variations in sub-circuit activations. We demonstrate CREIMBO’s ability to recover true components in synthetic data, and uncover meaningful brain dynamics in human high-density electrode recordings, including cross-subject neural mechanisms as well as inter- vs. intra-region dynamical motifs. Furthermore, using mouse whole-brain recordings, we show CREIMBO’s ability to discover dynamical interactions that capture task and behavioral variables and meaningfully align with the biological importance of the brain areas they representmore » « less
-
Breakspear, Michael (Ed.)To efficiently yet reliably represent and process information, our brains need to produce information-rich signals that differentiate between moments or cognitive states, while also being robust to noise or corruption. For many, though not all, natural systems, these two properties are often inversely related: More information-rich signals are less robust, and vice versa. Here, we examined how these properties change with ongoing cognitive demands. To this end, we applied dimensionality reduction algorithms and pattern classifiers to functional neuroimaging data collected as participants listened to a story, temporally scrambled versions of the story, or underwent a resting state scanning session. We considered two primary aspects of the neural data recorded in these different experimental conditions. First, we treated the maximum achievable decoding accuracy across participants as an indicator of the “informativeness” of the recorded patterns. Second, we treated the number of features (components) required to achieve a threshold decoding accuracy as a proxy for the “compressibility” of the neural patterns (where fewer components indicate greater compression). Overall, we found that the peak decoding accuracy (achievable without restricting the numbers of features) was highest in the intact (unscrambled) story listening condition. However, the number of features required to achieve comparable classification accuracy was also lowest in the intact story listening condition. Taken together, our work suggests that our brain networks flexibly reconfigure according to ongoing task demands and that the activity patterns associated with higher-order cognition and high engagement are both more informative and more compressible than the activity patterns associated with lower-order tasks and lower engagement.more » « less
-
Abstract Objective . Neural decoding is an important tool in neural engineering and neural data analysis. Of various machine learning algorithms adopted for neural decoding, the recently introduced deep learning is promising to excel. Therefore, we sought to apply deep learning to decode movement trajectories from the activity of motor cortical neurons. Approach . In this paper, we assessed the performance of deep learning methods in three different decoding schemes, concurrent, time-delay, and spatiotemporal. In the concurrent decoding scheme where the input to the network is the neural activity coincidental to the movement, deep learning networks including artificial neural network (ANN) and long-short term memory (LSTM) were applied to decode movement and compared with traditional machine learning algorithms. Both ANN and LSTM were further evaluated in the time-delay decoding scheme in which temporal delays are allowed between neural signals and movements. Lastly, in the spatiotemporal decoding scheme, we trained convolutional neural network (CNN) to extract movement information from images representing the spatial arrangement of neurons, their activity, and connectomes (i.e. the relative strengths of connectivity between neurons) and combined CNN and ANN to develop a hybrid spatiotemporal network. To reveal the input features of the CNN in the hybrid network that deep learning discovered for movement decoding, we performed a sensitivity analysis and identified specific regions in the spatial domain. Main results . Deep learning networks (ANN and LSTM) outperformed traditional machine learning algorithms in the concurrent decoding scheme. The results of ANN and LSTM in the time-delay decoding scheme showed that including neural data from time points preceding movement enabled decoders to perform more robustly when the temporal relationship between the neural activity and movement dynamically changes over time. In the spatiotemporal decoding scheme, the hybrid spatiotemporal network containing the concurrent ANN decoder outperformed single-network concurrent decoders. Significance . Taken together, our study demonstrates that deep learning could become a robust and effective method for the neural decoding of behavior.more » « less
-
Abstract Attention to a feature enhances the sensory representation of that feature. However, it is less clear whether attentional modulation is limited when needing to attend to multiple features. Here, we studied both the behavioral and neural correlates of the attentional limit by examining the effectiveness of attentional enhancement of one versus two color features. We recorded electroencephalography (EEG) while observers completed a color-coherence detection task in which they detected a weak coherence signal, an over-representation of a target color. Before stimulus onset, we presented either one or two valid color cues. We found that, on the one-cue trials compared with the two-cue trials, observers were faster and more accurate, indicating that observers could more effectively attend to a single color at a time. Similar behavioral deficits associated with attending to multiple colors were observed in a pre-EEG practice session with one-, two-, three-, and no-cue trials. Further, we were able to decode the target color using the EEG signals measured from the posterior electrodes. Notably, we found that decoding accuracy was greater on the one-cue than on two-cue trials, indicating a stronger color signal on one-cue trials likely due to stronger attentional enhancement. Lastly, we observed a positive correlation between the decoding effect and the behavioral effect comparing one-cue and two-cue trials, suggesting that the decoded neural signals are functionally associated with behavior. Overall, these results provide behavioral and neural evidence pointing to a strong limit in the attentional enhancement of multiple features and suggest that there is a cost in maintaining multiple attentional templates in an active state.more » « less
An official website of the United States government

