skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Devinsky, Orrin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The ability to connect the form and meaning of a concept, known as word retrieval, is fundamental to human communication. While various input modalities could lead to identical word retrieval, the exact neural dy- namics supporting this process relevant to daily auditory discourse remain poorly understood. Here, we re- corded neurosurgical electrocorticography (ECoG) data from 48 patients and dissociated two key language networks that highly overlap in time and space, critical for word retrieval. Using unsupervised temporal clus- tering techniques, we found a semantic processing network located in the middle and inferior frontal gyri. This network was distinct from an articulatory planning network in the inferior frontal and precentral gyri, which was invariant to input modalities. Functionally, we confirmed that the semantic processing network en- codes word surprisal during sentence perception. These findings elucidate neurophysiological mechanisms underlying the processing of semantic auditory inputs ranging from passive language comprehension to conversational speech. 
    more » « less
    Free, publicly-accessible full text available May 1, 2026
  2. Abstract Objective: This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers, and the trained model should perform well on participants unseen during training. Approach: We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants. Main Results: The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC=0.837) as 15 models trained individually for these participants (PCC=0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance: The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture’s flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible. 
    more » « less
  3. Across the animal kingdom, neural responses in the auditory cortex are suppressed during vocalization, and humans are no exception. A common hypothesis is that suppression increases sensitivity to auditory feedback, enabling the detection of vocalization errors. This hypothesis has been previously confirmed in non-human primates, however a direct link between auditory suppression and sensitivity in human speech monitoring remains elusive. To address this issue, we obtained intracranial electroencephalography (iEEG) recordings from 35 neurosurgical participants during speech production. We first characterized the detailed topography of auditory suppression, which varied across superior temporal gyrus (STG). Next, we performed a delayed auditory feedback (DAF) task to determine whether the suppressed sites were also sensitive to auditory feedback alterations. Indeed, overlapping sites showed enhanced responses to feedback, indicating sensitivity. Importantly, there was a strong correlation between the degree of auditory suppression and feedback sensitivity, suggesting suppression might be a key mechanism that underlies speech monitoring. Further, we found that when participants produced speech with simultaneous auditory feedback, posterior STG was selectively activated if participants were engaged in a DAF paradigm, suggesting that increased attentional load can modulate auditory feedback sensitivity. 
    more » « less
  4. When we vocalize, our brain distinguishes self-generated sounds from external ones. A corollary discharge signal supports this function in animals; however, in humans, its exact origin and temporal dynamics remain unknown. We report electrocorticographic recordings in neurosurgical patients and a connectivity analysis framework based on Granger causality that reveals major neural communications. We find a reproducible source for corollary discharge across multiple speech production paradigms localized to the ventral speech motor cortex before speech articulation. The uncovered discharge predicts the degree of auditory cortex suppression during speech, its well-documented consequence. These results reveal the human corollary discharge source and timing with far-reaching implication for speech motor-control as well as auditory hallucinations in human psychosis. 
    more » « less
  5. Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage. 
    more » « less
  6. This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes, by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train both subject-specific models using data from a single participant as well as multi-patient models exploiting data from multiple participants. Main Results The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance The proposed SwinTW decoder enables future speech neuropros-theses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests the exciting possibility of developing speech neuropros-theses for people with speech disability without relying on their own neural data for training, which is not always feasible. 
    more » « less
  7. Speech production is a complex human function requiring continuous feedforward commands together with reafferent feedback processing. These processes are carried out by distinct frontal and temporal cortical networks, but the degree and timing of their recruitment and dynamics remain poorly understood. We present a deep learning architecture that translates neural signals recorded directly from the cortex to an interpretable representational space that can reconstruct speech. We leverage learned decoding networks to disentangle feedforward vs. feedback processing. Unlike prevailing models, we find a mixed cortical architecture in which frontal and temporal networks each process both feedforward and feedback information in tandem. We elucidate the timing of feedforward and feedback–related processing by quantifying the derived receptive fields. Our approach provides evidence for a surprisingly mixed cortical architecture of speech circuitry together with decoding advances that have important implications for neural prosthetics. 
    more » « less
  8. Bizley, Jennifer K. (Ed.)
    Hearing one’s own voice is critical for fluent speech production as it allows for the detection and correction of vocalization errors in real time. This behavior known as the auditory feedback control of speech is impaired in various neurological disorders ranging from stuttering to aphasia; however, the underlying neural mechanisms are still poorly understood. Computational models of speech motor control suggest that, during speech production, the brain uses an efference copy of the motor command to generate an internal estimate of the speech output. When actual feedback differs from this internal estimate, an error signal is generated to correct the internal estimate and update necessary motor commands to produce intended speech. We were able to localize the auditory error signal using electrocorticographic recordings from neurosurgical participants during a delayed auditory feedback (DAF) paradigm. In this task, participants hear their voice with a time delay as they produced words and sentences (similar to an echo on a conference call), which is well known to disrupt fluency by causing slow and stutter-like speech in humans. We observed a significant response enhancement in auditory cortex that scaled with the duration of feedback delay, indicating an auditory speech error signal. Immediately following auditory cortex, dorsal precentral gyrus (dPreCG), a region that has not been implicated in auditory feedback processing before, exhibited a markedly similar response enhancement, suggesting a tight coupling between the 2 regions. Critically, response enhancement in dPreCG occurred only during articulation of long utterances due to a continuous mismatch between produced speech and reafferent feedback. These results suggest that dPreCG plays an essential role in processing auditory error signals during speech production to maintain fluency. 
    more » « less
  9. When we vocalize, our brain distinguishes self-generated sounds from external ones. A corollary discharge signal supports this function in animals, however, in humans its exact origin and temporal dynamics remain unknown. We report Electrocorticographic (ECoG) recordings in neurosurgical patients and a novel connectivity approach based on Granger-causality that reveals major neural communications. We find a reproducible source for corollary discharge across multiple speech production paradigms localized to ventral speech motor cortex before speech articulation. The uncovered discharge predicts the degree of auditory cortex suppression during speech, its well-documented consequence. These results reveal the human corollary discharge source and timing with far-reaching implication for speech motor-control as well as auditory hallucinations in human psychosis. 
    more » « less
  10. Noncommunicable diseases (NCDs) are on the rise worldwide. Obesity, cardiovascular disease, and type 2 diabetes are among a long list of “lifestyle” diseases that were rare throughout human history but are now common. The evolutionary mismatch hypothesis posits that humans evolved in environments that radically differ from those we currently experience; consequently, traits that were once advantageous may now be “mismatched” and disease causing. At the genetic level, this hypothesis predicts that loci with a history of selection will exhibit “genotype by environment” (GxE) interactions, with different health effects in “ancestral” versus “modern” environments. To identify such loci, we advocate for combining genomic tools in partnership with subsistence-level groups experiencing rapid lifestyle change. In these populations, comparisons of individuals falling on opposite extremes of the “matched” to “mismatched” spectrum are uniquely possible. More broadly, the work we propose will inform our understanding of environmental and genetic risk factors for NCDs across diverse ancestries and cultures. 
    more » « less