The evolution of Web Speech has increased the ease of development and public availability of auditory description without the use of screen reader software, broadening its exposure to users who may benefit from spoken descriptions. Building off an existing design framework for auditory description of interactive web media, we have designed an optional Voicing feature instantiated in two PhET Interactive Simulations regularly used by students and educators globally. We surveyed over 2000 educators to investigate their perceptions and preferences of the Web Speech-based Voicing feature and its broad appeal and effectiveness for teaching and learning. We find a general approval by educators of the Voicing feature and more moderate statement ratings than expected to the different preset speech levels we presented to them. We find that educators perceive the feature as beneficial both broadly and for specific populations while some acknowledge particular populations for whom it remains ineffective. Lastly, we identify some variance in the perceptions of the feature based on different aspects of the simulation experience.
more »
« less
This content will become publicly available on April 1, 2025
Decoding speech sounds from neurophysiological data: Practical considerations and theoretical implications
Machine learning techniques have proven to be a useful tool in cognitive neuroscience. However, their implementation in scalp‐recorded electroencephalography (EEG) is relatively limited. To address this, we present three analyses using data from a previous study that examined event‐related potential (ERP) responses to a wide range of naturally‐produced speech sounds. First, we explore which features of the EEG signal best maximize machine learning accuracy for a voicing distinction, using a support vector machine (SVM). We manipulate three dimensions of the EEG signal as input to the SVM: number of trials averaged, number of time points averaged, and polynomial fit. We discuss the trade‐offs in using different feature sets and offer some recommendations for researchers using machine learning. Next, we use SVMs to classify specific pairs of phonemes, finding that we can detect differences in the EEG signal that are not otherwise detectable using conventional ERP analyses. Finally, we characterize the timecourse of phonetic feature decoding across three phonological dimensions (voicing, manner of articulation, and place of articulation), and find that voicing and manner are decodable from neural activity, whereas place of articulation is not. This set of analyses addresses both practical considerations in the application of machine learning to EEG, particularly for speech studies, and also sheds light on current issues regarding the nature of perceptual representations of speech.
more »
« less
- Award ID(s):
- 1945069
- PAR ID:
- 10509978
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Psychophysiology
- Volume:
- 61
- Issue:
- 4
- ISSN:
- 0048-5772
- Subject(s) / Keyword(s):
- Analysis/Statistical Methods Auditory Processes EEG ERPs Language/Speech Machine Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Articulation, emotion, and personality play strong roles in the orofacial movements. To improve the naturalness and expressiveness of virtual agents(VAs), it is important that we carefully model the complex interplay between these factors. This paper proposes a conditional generative adversarial network, called conditional sequential GAN(CSG), which learns the relationship between emotion, lexical content and lip movements in a principled manner. This model uses a set of spectral and emotional speech features directly extracted from the speech signal as conditioning inputs, generating realistic movements. A key feature of the approach is that it is a speech-driven framework that does not require transcripts. Our experiments show the superiority of this model over three state-of-the-art baselines in terms of objective and subjective evaluations. When the target emotion is known, we propose to create emotionally dependent models by either adapting the base model with the target emotional data (CSG-Emo-Adapted), or adding emotional conditions as the input of the model(CSG-Emo-Aware). Objective evaluations of these models show improvements for the CSG-Emo-Adapted compared with the CSG model, as the trajectory sequences are closer to the original sequences. Subjective evaluations show significantly better results for this model compared with the CSG model when the target emotion is happiness.more » « less
-
Brain Computer Interfaces (BCIs) traditionally deploy visual or auditory stimuli to elicit brain signals. However, these stimuli are not very useful in situations where the visual or auditory senses are involved in other decision making processes. In this paper, we explore the use of vibrotactile stimuli on the fi ngers as a viable replacement. Using a fi ve-level Wavelet Packet feature extraction on the obtained EEG signals, along with a kernel Support Vector Machine (SVM) algorithm, we were able to achieve 83% classi cation accuracy for binary user choices. This new BCI paradigm shows potential for use in situations where visual and auditory stimuli are not feasible.more » « less
-
We present the first ultrasound analysis of the secondary palatalization contrast in Irish, analyzing data from five speakers from the Connemara dialect group. Word-initial /pʲ(bʲ),pˠ(bˠ),tʲ,tˠ,kʲ,kˠ,fʲ,fˠ,sʲ,sˠ,xʲ,xˠ/ are analyzed in the context of /iː,uː/. We find, first, that tongue body position robustly distinguishes palatalized from velarized consonants, across place of articulation, manner, and vowel place contexts, with palatalized consonants having fronter and/or higher tongue body realizations than their velarized counterparts. This conclusion holds equally for labial consonants, contrary to some previous descriptive claims. Second, the nature and degree of palatalization and velarization depend in systematic ways on consonant place and manner. In coronal consonants, for example, velarization is weaker or absent. Third, the Irish consonants examined resist coarticulation in backness with a following vowel. In all of these respects Irish palatalization is remarkably similar to that of Russian. Our results also support an independent role for pharyngeal cavity expansion/retraction in the production of the palatalization contrast. Finally, we discuss preliminary findings on the dynamics of the secondary articulation gestures. Our use of principal component analysis (PCA) in reaching these findings is also of interest, since PCA has not been employed a great deal in analyses of tongue body movement.more » « less
-
Martin, Andreas; Hinkelmann, Knut; Fill, Hans-Georg; Gerber, Aurona; Lenat, Doug; Stolle, Reinhard; Harmelen, Frank van (Ed.)Record linkage, often called entity resolution or de-duplication, refers to identifying the same entities across one or more databases. As the amount of data that is generated grows at an exponential rate, it becomes increasingly important to be able to integrate data from several sources to perform richer analysis. In this paper, we present an open source comprehensive end to end hybrid record linkage framework that combines the automatic and manual review process. Using this framework, we train several models based on different machine learning algorithms such as random forests, linear SVM, Radial SVM, and Dense Neural Networks and compare the effectiveness and efficiency of these models for record linkage in different settings. We evaluate model performance based on Recall, F1-score (quality of linkages) and number of uncertain pairs which is the number of pairs that need manual review. We also test our trained models in a new dataset to test how different trained models transfer to a new setting. The RF, linear SVM and radial SVM models transfer much better compared to the DNN. Finally, we study the effect of name2vec (n2v) feature, a letter embedding in names, on model performance. Using n2v results in smaller manual review set with slightly less F1-score. Overall the SVM models performed best in all experiments.more » « less