Prosody perception is fundamental to spoken language communication as it supports comprehension, pragmatics, morphosyntactic parsing of speech streams, and phonological awareness. A particular aspect of prosody: perceptual sensitivity to speech rhythm patterns in words (i.e., lexical stress sensitivity), is also a robust predictor of reading skills, though it has received much less attention than phonological awareness in the literature. Given the importance of prosody and reading in educational outcomes, reliable and valid tools are needed to conduct large-scale health and genetic investigations of individual differences in prosody, as groundwork for investigating the biological underpinnings of the relationship between prosody and reading. Motivated by this need, we present the Test of Prosody via Syllable Emphasis (“TOPsy”) and highlight its merits as a phenotyping tool to measure lexical stress sensitivity in as little as 10 min, in scalable internet-based cohorts. In this 28-item speech rhythm perception test [modeled after the stress identification test from Wade-Woolley (2016) ], participants listen to multi-syllabic spoken words and are asked to identify lexical stress patterns. Psychometric analyses in a large internet-based sample shows excellent reliability, and predictive validity for self-reported difficulties with speech-language, reading, and musical beat synchronization. Further, items loaded onto two distinct factors corresponding to initially stressed vs. non-initially stressed words. These results are consistent with previous reports that speech rhythm perception abilities correlate with musical rhythm sensitivity and speech-language/reading skills, and are implicated in reading disorders (e.g., dyslexia). We conclude that TOPsy can serve as a useful tool for studying prosodic perception at large scales in a variety of different settings, and importantly can act as a validated brief phenotype for future investigations of the genetic architecture of prosodic perception, and its relationship to educational outcomes.
more »
« less
Before, Between, and After: Enriching Robot Communication Surrounding Collaborative Creative Activities
Research in creative robotics continues to expand across all creative domains, including art, music and language. Creative robots are primarily designed to be task specific, with limited research into the implications of their design outside their core task. In the case of a musical robot, this includes when a human sees and interacts with the robot before and after the performance, as well as in between pieces. These non-musical interaction tasks such as the presence of a robot during musical equipment set up, play a key role in the human perception of the robot however have received only limited attention. In this paper, we describe a new audio system using emotional musical prosody, designed to match the creative process of a musical robot for use before, between and after musical performances. Our generation system relies on the creation of a custom dataset for musical prosody. This system is designed foremost to operate in real time and allow rapid generation and dialogue exchange between human and robot. For this reason, the system combines symbolic deep learning through a Conditional Convolution Variational Auto-encoder, with an emotion-tagged audio sampler. We then compare this to a SOTA text-to-speech system in our robotic platform, Shimon the marimba player.We conducted a between-groups study with 100 participants watching a musician interact for 30 s with Shimon. We were able to increase user ratings for the key creativity metrics; novelty and coherence, while maintaining ratings for expressivity across each implementation. Our results also indicated that by communicating in a form that relates to the robot’s core functionality, we can raise likeability and perceived intelligence, while not altering animacy or anthropomorphism. These findings indicate the variation that can occur in the perception of a robot based on interactions surrounding a performance, such as initial meetings and spaces between pieces, in addition to the core creative algorithms.
more »
« less
- Award ID(s):
- 1925178
- PAR ID:
- 10286310
- Date Published:
- Journal Name:
- Frontiers in Robotics and AI
- Volume:
- 8
- ISSN:
- 2296-9144
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Robots are increasingly being introduced into domains where they assist or collaborate with human counterparts. There is a growing body of literature on how robots might serve as collaborators in creative activities, but little is known about the factors that shape human perceptions of robots as creative collaborators. This paper investigates the effects of a robot’s social behaviors on people’s creative thinking and their perceptions of the robot. We developed an interactive system to facilitate collaboration between a human and a robot in a creative activity. We conducted a user study (n = 12), in which the robot and adult participants took turns to create compositions using tangram pieces projected on a shared workspace. We observed four human behavioral traits related to creativity in the interaction: accepting robot inputs as inspiration, delegating the creative lead to the robot, communicating creative intents, and being playful in the creation. Our findings suggest designs for co-creation in social robots that consider the adversarial effect of giving the robot too much control in creation, as well as the role playfulness plays in the creative process.more » « less
-
Every artist has a creative process that draws inspiration from previous artists and their works. Today, “inspiration” has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists’ works. We establish a replicable methodology to systematically identify similar pieces of music audio in a manner that is useful for understanding training data attribution. A key aspect of our approach is to harness an effective music audio similarity measure. We compare the effect of applying CLMR [50] and CLAP [55] embeddings to similarity measurement in a set of 5 million audio clips used to train VampNet [24], a recent open source generative music model. We validate this approach with a human listening study. We also explore the effect that modifications of an audio example (e.g., pitch shifting, time stretching, background noise) have on similarity measurements. This work is foundational to incorporating automated influence attribution into generative modeling, which promises to let model creators and users move from ignorant appropriation to informed creation. Audio samples that accompany this paper are available at https://tinyurl.com/exploring-musical-roots.more » « less
-
We present VoiceCraft-Dub, a novel approach for automated video dubbing that synthesizes high-quality speech from text and facial cues. This task has broad applications in filmmaking, multimedia creation, and assisting voice-impaired individuals. Building on the success of Neural Codec Language Models (NCLMs) for speech synthesis, our method extends their capabilities by incorporating video features, ensuring that synthesized speech is time-synchronized and expressively aligned with facial movements while preserving natural prosody. To inject visual cues, we design adapters to align facial features with the NCLM token space and introduce audio-visual fusion layers to merge audio-visual information within the NCLM framework. Additionally, we curate CelebV-Dub, a new dataset of expressive, real-world videos specifically designed for automated video dubbing. Extensive experiments show that our model achieves high-quality, intelligible, and natural speech synthesis with accurate lip synchronization, outperforming existing methods in human perception and performing favorably in objective evaluations. We also adapt VoiceCraft-Dub for the video-to-speech task, demonstrating its versatility for various applications.more » « less
-
Pleasure in music has been linked to predictive coding of melodic and rhythmic patterns, subserved by connectivity between regions in the brain's auditory and reward networks. Specific musical anhedonics derive little pleasure from music and have altered auditory-reward connectivity, but no difficulties with music perception abilities and no generalized physical anhedonia. Recent research suggests that specific musical anhedonics experience pleasure in nonmusical sounds, suggesting that the implicated brain pathways may be specific to music reward. However, this work used sounds with clear real-world sources (e.g., babies laughing, crowds cheering), so positive hedonic responses could be based on the referents of these sounds rather than the sounds themselves. We presented specific musical anhedonics and matched controls with isolated short pleasing and displeasing synthesized sounds of varying timbres with no clear real-world referents. While the two groups found displeasing sounds equally displeasing, the musical anhedonics gave substantially lower pleasure ratings to the pleasing sounds, indicating that their sonic anhedonia is not limited to musical rhythms and melodies. Furthermore, across a large sample of participants, mean pleasure ratings for pleasing synthesized sounds predicted significant and similar variance in six dimensions of musical reward considered to be relatively independent, suggesting that pleasure in sonic timbres play a role in eliciting reward-related responses to music. We replicate the earlier findings of preserved pleasure ratings for semantically referential sounds in musical anhedonics and find that pleasure ratings of semantic referents, when presented without sounds, correlated with ratings for the sounds themselves. This association was stronger in musical anhedonics than in controls, suggesting the use of semantic knowledge as a compensatory mechanism for affective sound processing. Our results indicate that specific musical anhedonia is not entirely specific to melodic and rhythmic processing, and suggest that timbre merits further research as a source of pleasure in music.more » « less
An official website of the United States government

