Objective: This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements. The model should not have subject-specific layers, and the trained model should perform well on participants unseen during training. Approach: We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train subject-specific models using data from a single participant and multi-subject models exploiting data from multiple participants. Main Results: The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, significantly outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. A single multi-subject model trained on ECoG data from 15 participants yielded comparable results (PCC=0.837) as 15 models trained individually for these participants (PCC=0.831). Furthermore, the multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance: The proposed SwinTW decoder enables future speech decoding approaches to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. The success of the single multi-subject model when tested on participants within the training cohort demonstrates that the model architecture is capable of exploiting data from multiple participants with diverse electrode placements. The architecture’s flexibility in training with both single-subject and multi-subject data, as well as grid and non-grid electrodes, ensures its broad applicability. Importantly, the generalizability of the multi-subject models in our study population suggests that a model trained using paired acoustic and neural data from multiple patients can potentially be applied to new patients with speech disability where acoustic-neural training data is not feasible.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
When we vocalize, our brain distinguishes self-generated sounds from external ones. A corollary discharge signal supports this function in animals; however, in humans, its exact origin and temporal dynamics remain unknown. We report electrocorticographic recordings in neurosurgical patients and a connectivity analysis framework based on Granger causality that reveals major neural communications. We find a reproducible source for corollary discharge across multiple speech production paradigms localized to the ventral speech motor cortex before speech articulation. The uncovered discharge predicts the degree of auditory cortex suppression during speech, its well-documented consequence. These results reveal the human corollary discharge source and timing with far-reaching implication for speech motor-control as well as auditory hallucinations in human psychosis.
-
Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.more » « lessFree, publicly-accessible full text available April 1, 2025
-
This study investigates speech decoding from neural signals captured by intracranial electrodes. Most prior works can only work with electrodes on a 2D grid (i.e., Electrocorticographic or ECoG array) and data from a single patient. We aim to design a deep-learning model architecture that can accommodate both surface (ECoG) and depth (stereotactic EEG or sEEG) electrodes. The architecture should allow training on data from multiple participants with large variability in electrode placements and the trained model should perform well on participants unseen during training. Approach We propose a novel transformer-based model architecture named SwinTW that can work with arbitrarily positioned electrodes, by leveraging their 3D locations on the cortex rather than their positions on a 2D grid. We train both subject-specific models using data from a single participant as well as multi-patient models exploiting data from multiple participants. Main Results The subject-specific models using only low-density 8x8 ECoG data achieved high decoding Pearson Correlation Coefficient with ground truth spectrogram (PCC=0.817), over N=43 participants, outperforming our prior convolutional ResNet model and the 3D Swin transformer model. Incorporating additional strip, depth, and grid electrodes available in each participant (N=39) led to further improvement (PCC=0.838). For participants with only sEEG electrodes (N=9), subject-specific models still enjoy comparable performance with an average PCC=0.798. The multi-subject models achieved high performance on unseen participants, with an average PCC=0.765 in leave-one-out cross-validation. Significance The proposed SwinTW decoder enables future speech neuropros-theses to utilize any electrode placement that is clinically optimal or feasible for a particular participant, including using only depth electrodes, which are more routinely implanted in chronic neurosurgical procedures. Importantly, the generalizability of the multi-patient models suggests the exciting possibility of developing speech neuropros-theses for people with speech disability without relying on their own neural data for training, which is not always feasible.more » « less
-
Speech production is a complex human function requiring continuous feedforward commands together with reafferent feedback processing. These processes are carried out by distinct frontal and temporal cortical networks, but the degree and timing of their recruitment and dynamics remain poorly understood. We present a deep learning architecture that translates neural signals recorded directly from the cortex to an interpretable representational space that can reconstruct speech. We leverage learned decoding networks to disentangle feedforward vs. feedback processing. Unlike prevailing models, we find a mixed cortical architecture in which frontal and temporal networks each process both feedforward and feedback information in tandem. We elucidate the timing of feedforward and feedback–related processing by quantifying the derived receptive fields. Our approach provides evidence for a surprisingly mixed cortical architecture of speech circuitry together with decoding advances that have important implications for neural prosthetics.
-
Abstract The [C
ii ] 158μ m emission line and the underlying far-infrared (FIR) dust continuum are important tracers for studying star formation and kinematic properties of early galaxies. We present a survey of the [Cii ] emission lines and FIR continua of 31 luminous quasars atz > 6.5 using the Atacama Large Millimeter Array (ALMA) and the NOrthern Extended Millimeter Array at sub-arcsec resolution. This survey more than doubles the number of quasars with [Cii ] and FIR observations at these redshifts and enables statistical studies of quasar host galaxies deep into the epoch of reionization. We detect [Cii ] emission in 27 quasar hosts with a luminosity range ofL [CII ]= (0.3–5.5) × 109L ⊙and detect the FIR continuum of 28 quasar hosts with a luminosity range ofL FIR= (0.5–13.0) × 1012L ⊙. BothL [CII ]andL FIRare correlated (ρ ≃ 0.4) with the quasar bolometric luminosity, albeit with substantial scatter. The quasar hosts detected by ALMA are clearly resolved with a median diameter of ∼5 kpc. About 40% of the quasar host galaxies show a velocity gradient in [Cii ] emission, while the rest show either dispersion-dominated or disturbed kinematics. Basic estimates of the dynamical masses of the rotation-dominated host galaxies yieldM dyn= (0.1–7.5) × 1011M ⊙. Considering our findings alongside those of literature studies, we found that the ratio betweenM BHandM dynis about 10 times higher than that of localM BH–M dynrelation on average but with substantial scatter (the ratio difference ranging from ∼0.6 to 60) and large uncertainties. -
Abstract We report the first statistical analyses of [C
ii ] and dust continuum observations in six strong Oi absorber fields at the end of the reionization epoch obtained by the Atacama Large Millimeter/submillimeter Array (ALMA). Combined with one [Cii ] emitter reported in Wu et al., we detect one Oi -associated [Cii ] emitter in six fields. At redshifts of Oi absorbers in nondetection fields, no emitters are brighter than our detection limit within impact parameters of 50 kpc and velocity offsets between ±200 km s−1. The averaged [Cii ]-detection upper limit is <0.06 Jy km s−1(3σ ), corresponding to the [Cii ] luminosity ofL [CII ]< 5.8 × 107L ⊙and the [Cii ]-based star formation rate of SFR[CII ]<5.5M ⊙yr−1. Cosmological simulations suggest that only ∼10−2.5[Cii ] emitters around Oi absorbers have comparable SFR to our detection limit. Although the detection in one out of six fields is reported, an order of magnitude number excess of emitters obtained from our ALMA observations supports that the contribution of massive galaxies that caused the metal enrichment cannot be ignored. Further, we also found 14 tentative galaxy candidates with a signal-to-noise ratio of ≈4.3 at large impact parameters (>50 kpc) and having larger outflow velocities within ±600 km s−1. If these detections are confirmed in the future, then the mechanism of pushing metals at larger distances with higher velocities needs to be further explored from the theoretical side. -
Abstract Macroclimate drives vegetation distributions, but fine‐scale topographic variation can generate microclimate refugia for plant persistence in unsuitable areas. However, we lack quantitative descriptions of topography‐driven microclimatic variation and how it shapes forest structure, diversity, and composition. We hypothesized that topographic variation and the presence of the forest overstory cause spatiotemporal microclimate variation affecting tree performance, causing forest structure, diversity, and composition to vary with topography and microclimate, and topography and the overstory to buffer microclimate. In a 20.2‐ha inventory plot in the North American Great Plains, we censused woody stems ≥1 cm in diameter and collected detailed topographic and microclimatic data. Across 59‐m of elevation, microclimate covaried with topography to create a sharp desiccation gradient, and topography and the overstory buffered understory microclimate. The magnitude of microclimatic variation mirrored that of regional‐scale variation: with increasing elevation, there was a decrease in soil moisture corresponding to the difference across ~2.1° of longitude along the east‐to‐west aridity gradient and an increase in air temperature corresponding to the difference across ~2.7° of latitude along the north‐to‐south gradient. More complex forest structure and higher diversity occurred in moister, less‐exposed habitats, and species occupied distinct topographic niches. Our study demonstrates how topographic and microclimatic gradients structure forests in putative climate‐change refugia, by revealing ecological processes enabling populations to be maintained during periods of unfavorable macroclimate.
-
Abstract The formation of the first supermassive black holes is expected to have occurred in some most pronounced matter and galaxy overdensities in the early universe. We have conducted a submillimeter wavelength continuum survey of 54
z ∼ 6 quasars using the Submillimeter Common-User Bolometre Array-2 on the James Clerk Maxwell Telescope to study the environments aroundz ∼ 6 quasars. We identified 170 submillimeter galaxies (SMGs) with above 3.5σ detections in 450 or 850μ m maps. Their far-IR luminosities are (2.2–6.4) × 1012L ⊙, and their star formation rates are ∼400–1200M ⊙yr−1. We also calculated the SMGs’ differential and cumulative number counts in a combined area of ∼620 arcmin2. To a 4σ detection (at ∼5.5 mJy), SMGs’ overdensity is (±0.19), exceeding the blank-field source counts by a factor of 1.68. We find that 13/54 quasars show overdensities (at ∼5.5 mJy) ofδ SMG∼ 1.5–5.4. The combined area of these 13 quasars exceeds the blank-field counts with the overdensity to 5.5 mJy ofδ SMG∼ (±0.25) in the regions of ∼150 arcmin2. However, the excess is insignificant on the bright end (e.g., 7.5 mJy). We also compare results with previous environmental studies of Lyα emitters and Lyman break galaxies on a similar scale. Our survey presents the first systematic study of the environment of quasars atz ∼ 6. The newly discovered SMGs provide essential candidates for follow-up spectroscopic observations to test whether they reside in the same large-scale structures as the quasars and search for protoclusters at an early epoch.