Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Across the animal kingdom, neural responses in the auditory cortex are suppressed during vocalization, and humans are no exception. A common hypothesis is that suppression increases sensitivity to auditory feedback, enabling the detection of vocalization errors. This hypothesis has been previously confirmed in non-human primates, however a direct link between auditory suppression and sensitivity in human speech monitoring remains elusive. To address this issue, we obtained intracranial electroencephalography (iEEG) recordings from 35 neurosurgical participants during speech production. We first characterized the detailed topography of auditory suppression, which varied across superior temporal gyrus (STG). Next, we performed a delayed auditory feedback (DAF) task to determine whether the suppressed sites were also sensitive to auditory feedback alterations. Indeed, overlapping sites showed enhanced responses to feedback, indicating sensitivity. Importantly, there was a strong correlation between the degree of auditory suppression and feedback sensitivity, suggesting suppression might be a key mechanism that underlies speech monitoring. Further, we found that when participants produced speech with simultaneous auditory feedback, posterior STG was selectively activated if participants were engaged in a DAF paradigm, suggesting that increased attentional load can modulate auditory feedback sensitivity.more » « lessFree, publicly-accessible full text available September 10, 2025
-
Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.more » « less
-
This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes, by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train both subject-specific models using data from a single participant as well as multi-patient models exploiting data from multiple participants. Main Results The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance The proposed SwinTW decoder enables future speech neuropros-theses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests the exciting possibility of developing speech neuropros-theses for people with speech disability without relying on their own neural data for training, which is not always feasible.more » « less
-
Speech production is a complex human function requiring continuous feedforward commands together with reafferent feedback processing. These processes are carried out by distinct frontal and temporal cortical networks, but the degree and timing of their recruitment and dynamics remain poorly understood. We present a deep learning architecture that translates neural signals recorded directly from the cortex to an interpretable representational space that can reconstruct speech. We leverage learned decoding networks to disentangle feedforward vs. feedback processing. Unlike prevailing models, we find a mixed cortical architecture in which frontal and temporal networks each process both feedforward and feedback information in tandem. We elucidate the timing of feedforward and feedback–related processing by quantifying the derived receptive fields. Our approach provides evidence for a surprisingly mixed cortical architecture of speech circuitry together with decoding advances that have important implications for neural prosthetics.more » « less
-
Bizley, Jennifer K. (Ed.)Hearing one’s own voice is critical for fluent speech production as it allows for the detection and correction of vocalization errors in real time. This behavior known as the auditory feedback control of speech is impaired in various neurological disorders ranging from stuttering to aphasia; however, the underlying neural mechanisms are still poorly understood. Computational models of speech motor control suggest that, during speech production, the brain uses an efference copy of the motor command to generate an internal estimate of the speech output. When actual feedback differs from this internal estimate, an error signal is generated to correct the internal estimate and update necessary motor commands to produce intended speech. We were able to localize the auditory error signal using electrocorticographic recordings from neurosurgical participants during a delayed auditory feedback (DAF) paradigm. In this task, participants hear their voice with a time delay as they produced words and sentences (similar to an echo on a conference call), which is well known to disrupt fluency by causing slow and stutter-like speech in humans. We observed a significant response enhancement in auditory cortex that scaled with the duration of feedback delay, indicating an auditory speech error signal. Immediately following auditory cortex, dorsal precentral gyrus (dPreCG), a region that has not been implicated in auditory feedback processing before, exhibited a markedly similar response enhancement, suggesting a tight coupling between the 2 regions. Critically, response enhancement in dPreCG occurred only during articulation of long utterances due to a continuous mismatch between produced speech and reafferent feedback. These results suggest that dPreCG plays an essential role in processing auditory error signals during speech production to maintain fluency.more » « less
-
When we vocalize, our brain distinguishes self-generated sounds from external ones. A corollary discharge signal supports this function in animals, however, in humans its exact origin and temporal dynamics remain unknown. We report Electrocorticographic (ECoG) recordings in neurosurgical patients and a novel connectivity approach based on Granger-causality that reveals major neural communications. We find a reproducible source for corollary discharge across multiple speech production paradigms localized to ventral speech motor cortex before speech articulation. The uncovered discharge predicts the degree of auditory cortex suppression during speech, its well-documented consequence. These results reveal the human corollary discharge source and timing with far-reaching implication for speech motor-control as well as auditory hallucinations in human psychosis.more » « less
-
Noncommunicable diseases (NCDs) are on the rise worldwide. Obesity, cardiovascular disease, and type 2 diabetes are among a long list of “lifestyle” diseases that were rare throughout human history but are now common. The evolutionary mismatch hypothesis posits that humans evolved in environments that radically differ from those we currently experience; consequently, traits that were once advantageous may now be “mismatched” and disease causing. At the genetic level, this hypothesis predicts that loci with a history of selection will exhibit “genotype by environment” (GxE) interactions, with different health effects in “ancestral” versus “modern” environments. To identify such loci, we advocate for combining genomic tools in partnership with subsistence-level groups experiencing rapid lifestyle change. In these populations, comparisons of individuals falling on opposite extremes of the “matched” to “mismatched” spectrum are uniquely possible. More broadly, the work we propose will inform our understanding of environmental and genetic risk factors for NCDs across diverse ancestries and cultures.more » « less
-
Abstract ObjectiveEffective surgical treatment of drug‐resistant epilepsy depends on accurate localization of the epileptogenic zone (EZ). High‐frequency oscillations (HFOs) are potential biomarkers of the EZ. Previous research has shown that HFOs often occur within submillimeter areas of brain tissue and that the coarse spatial sampling of clinical intracranial electrode arrays may limit the accurate capture of HFO activity. In this study, we sought to characterize microscale HFO activity captured on thin, flexible microelectrocorticographic (μECoG) arrays, which provide high spatial resolution over large cortical surface areas. MethodsWe used novel liquid crystal polymer thin‐film μECoG arrays (.76–1.72‐mm intercontact spacing) to capture HFOs in eight intraoperative recordings from seven patients with epilepsy. We identified ripple (80–250 Hz) and fast ripple (250–600 Hz) HFOs using a common energy thresholding detection algorithm along with two stages of artifact rejection. We visualized microscale subregions of HFO activity using spatial maps of HFO rate, signal‐to‐noise ratio, and mean peak frequency. We quantified the spatial extent of HFO events by measuring covariance between detected HFOs and surrounding activity. We also compared HFO detection rates on microcontacts to simulated macrocontacts by spatially averaging data. ResultsWe found visually delineable subregions of elevated HFO activity within each μECoG recording. Forty‐seven percent of HFOs occurred on single 200‐μm‐diameter recording contacts, with minimal high‐frequency activity on surrounding contacts. Other HFO events occurred across multiple contacts simultaneously, with covarying activity most often limited to a .95‐mm radius. Through spatial averaging, we estimated that macrocontacts with 2–3‐mm diameter would only capture 44% of the HFOs detected in our μECoG recordings. SignificanceThese results demonstrate that thin‐film microcontact surface arrays with both highresolution and large coverage accurately capture microscale HFO activity and may improve the utility of HFOs to localize the EZ for treatment of drug‐resistant epilepsy.more » « less
-
Abstract Perception results from the interplay of sensory input and prior knowledge. Despite behavioral evidence that long-term priors powerfully shape perception, the neural mechanisms underlying these interactions remain poorly understood. We obtained direct cortical recordings in neurosurgical patients as they viewed ambiguous images that elicit constant perceptual switching. We observe top-down influences from the temporal to occipital cortex, during the preferred percept that is congruent with the long-term prior. By contrast, stronger feedforward drive is observed during the non-preferred percept, consistent with a prediction error signal. A computational model based on hierarchical predictive coding and attractor networks reproduces all key experimental findings. These results suggest a pattern of large-scale information flow change underlying long-term priors’ influence on perception and provide constraints on theories about long-term priors’ influence on perception.more » « less
-
Objective Sudden unexpected death in epilepsy (SUDEP) is the leading cause of epilepsy-related mortality. Although lots of effort has been made in identifying clinical risk factors for SUDEP in the literature, there are few validated methods to predict individual SUDEP risk. Prolonged postictal EEG suppression (PGES) is a potential SUDEP biomarker, but its occurrence is infrequent and requires epilepsy monitoring unit admission. We use machine learning methods to examine SUDEP risk using interictal EEG and ECG recordings from SUDEP cases and matched living epilepsy controls. Methods This multicenter, retrospective, cohort study examined interictal EEG and ECG recordings from 30 SUDEP cases and 58 age-matched living epilepsy patient controls. We trained machine learning models with interictal EEG and ECG features to predict the retrospective SUDEP risk for each patient. We assessed cross-validated classification accuracy and the area under the receiver operating characteristic (AUC) curve. Results The logistic regression (LR) classifier produced the overall best performance, outperforming the support vector machine (SVM), random forest (RF), and convolutional neural network (CNN). Among the 30 patients with SUDEP [14 females; mean age (SD), 31 (8.47) years] and 58 living epilepsy controls [26 females (43%); mean age (SD) 31 (8.5) years], the LR model achieved the median AUC of 0.77 [interquartile range (IQR), 0.73–0.80] in five-fold cross-validation using interictal alpha and low gamma power ratio of the EEG and heart rate variability (HRV) features extracted from the ECG. The LR model achieved the mean AUC of 0.79 in leave-one-center-out prediction. Conclusions Our results support that machine learning-driven models may quantify SUDEP risk for epilepsy patients, future refinements in our model may help predict individualized SUDEP risk and help clinicians correlate predictive scores with the clinical data. Low-cost and noninvasive interictal biomarkers of SUDEP risk may help clinicians to identify high-risk patients and initiate preventive strategies.more » « less