skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Stimulus Speech Decoding from Human Cortex with Generative Adversarial Network Transfer Learning
Decoding auditory stimulus from neural activity can enable neuroprosthetics and direct communication with the brain. Some recent studies have shown successful speech decoding from intracranial recording using deep learning models. However, scarcity of training data leads to low quality speech reconstruction which prevents a complete brain-computer-interface (BCI) application. In this work, we propose a transfer learning approach with a pre-trained GAN to disentangle representation and generation layers for decoding. We first pre-train a generator to produce spectrograms from a representation space using a large corpus of natural speech data. With a small amount of paired data containing the stimulus speech and corresponding ECoG signals, we then transfer it to a bigger network with an encoder attached before, which maps the neural signal to the representation space. To further improve the network generalization ability, we introduce a Gaussian prior distribution regularizer on the latent representation during the transfer phase. With at most 150 training samples for each tested subject, we achieve a state-of-the-art decoding performance. By visualizing the attention mask embedded in the encoder, we observe brain dynamics that are consistent with findings from previous studies investigating dynamics in the superior temporal gyrus (STG), pre-central gyrus (motor) and inferior frontal gyrus (IFG). Our findings demonstrate a high reconstruction accuracy using deep learning networks together with the potential to elucidate interactions across different brain regions during a cognitive task.  more » « less
Award ID(s):
1912286
PAR ID:
10196072
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI)
Page Range / eLocation ID:
390 to 394
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage. 
    more » « less
  2. Abstract Objective. Neurological disorders affecting speech production adversely impact quality of life for over 7 million individuals in the US. Traditional speech interfaces like eye-tracking devices and P300 spellers are slow and unnatural for these patients. An alternative solution, speech brain-computer interfaces (BCIs), directly decodes speech characteristics, offering a more natural communication mechanism. This research explores the feasibility of decoding speech features using non-invasive EEG.Approach. Nine neurologically intact participants were equipped with a 63-channel EEG system with additional sensors to eliminate eye artifacts. Participants read aloud sentences selected for phonetic similarity to the English language. Deep learning models, including Convolutional Neural Networks and Recurrent Neural Networks with and without attention modules, were optimized with a focus on minimizing trainable parameters and utilizing small input window sizes for real-time application. These models were employed for discrete and continuous speech decoding tasks.Main results. Statistically significant participant-independent decoding performance was achieved for discrete classes and continuous characteristics of the produced audio signal. A frequency sub-band analysis highlighted the significance of certain frequency bands (delta, theta, gamma) for decoding performance, and a perturbation analysis was used to identify crucial channels. Assessed channel selection methods did not significantly improve performance, suggesting a distributed representation of speech information encoded in the EEG signals. Leave-One-Out training demonstrated the feasibility of utilizing common speech neural correlates, reducing data collection requirements from individual participants.Significance. These findings contribute significantly to the development of EEG-enabled speech synthesis by demonstrating the feasibility of decoding both discrete and continuous speech features from EEG signals, even in the presence of EMG artifacts. By addressing the challenges of EMG interference and optimizing deep learning models for speech decoding, this study lays a strong foundation for EEG-based speech BCIs. 
    more » « less
  3. null (Ed.)
    Due to insufficient training data and the high computational cost to train a deep neural network from scratch, transfer learning has been extensively used in many deep-neural-network-based applications. A commonly used transfer learning approach involves taking a part of a pre-trained model, adding a few layers at the end, and re-training the new layers with a small dataset. This approach, while efficient and widely used, imposes a security vulnerability because the pre-trained model used in transfer learning is usually publicly available, including to potential attackers. In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. We assume that the attacker has no access to any target-specific information, including samples from target classes, re-trained model, and probabilities assigned by Softmax to each class, and thus making the attack target-agnostic. These assumptions render all previous attack models inapplicable, to the best of our knowledge. To evaluate the proposed attack, we perform a set of experiments on face recognition and speech recognition tasks and show the effectiveness of the attack. Our work reveals a fundamental security weakness of the Softmax layer when used in transfer learning settings 
    more » « less
  4. Due to insufficient training data and the high computational cost to train a deep neural network from scratch, transfer learning has been extensively used in many deep-neural-network-based applications. A commonly used transfer learning approach involves taking a part of a pre-trained model, adding a few layers at the end, and re-training the new layers with a small dataset. This approach, while efficient and widely used, imposes a security vulnerability because the pre-trained model used in transfer learning is usually publicly available, including to potential attackers. In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. We assume that the attacker has no access to any target-specific information, including samples from target classes, re-trained model, and probabilities assigned by Softmax to each class, and thus making the attack target-agnostic. These assumptions render all previous attack models inapplicable, to the best of our knowledge. To evaluate the proposed attack, we perform a set of experiments on face recognition and speech recognition tasks and show the effectiveness of the attack. Our work reveals a fundamental security weakness of the Softmax layer when used in transfer learning settings. 
    more » « less
  5. null (Ed.)
    In this paper, we propose a supervised graph representation learning method to model the relationship between brain functional connectivity (FC) and structural connectivity (SC) through a graph encoder-decoder system. The graph convolutional network (GCN) model is leveraged in the encoder to learn lower-dimensional node representations (i.e. node embeddings) integrating information from both node attributes and network topology. In doing so, the encoder manages to capture both direct and indirect interactions between brain regions in the node embeddings which later help reconstruct empirical FC networks. From node embeddings, graph representations are learnt to embed the entire graphs into a vector space. Our end-to-end model utilizes a multi-objective loss function to simultaneously learn node representations for FC network reconstruction and graph representations for subject classification. The experiment on a large population of non-drinkers and heavy drinkers shows that our model can provide a characterization of the population pattern in the SC-FC relationship, while also learning features that capture individual uniqueness for subject classification. The identified key brain subnetworks show significant between-group difference and support the promising prospect of GCN-based graph representation learning on brain networks to model human brain activity and function. 
    more » « less