skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Matching human vocal imitations to birdsong: An exploratory analysis
We explore computational strategies for matching human vocal imitations of birdsong to actual birdsong recordings. We recorded human vocal imitations of birdsong and subsequently analysed these data using three categories of audio features for matching imitations to original birdsong: spectral, temporal, and spectrotemporal. These exploratory analyses suggest that spectral features can help distinguish imitation strategies (e.g. whistling vs. singing) but are insufficient for distinguishing species. Similarly, whereas temporal features are correlated between human imitations and natural birdsong, they are also insufficient. Spectrotemporal features showed the greatest promise, in particular when used to extract a representation of the pitch contour of birdsong and human imitations. This finding suggests a link between the task of matching human imitations to birdsong to retrieval tasks in the music domain such as query-by-humming and cover song retrieval; we borrow from such existing methodologies to outline directions for future research.  more » « less
Award ID(s):
1633206
PAR ID:
10118935
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2nd International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The development of rhythmicity is foundational to communicative and social behaviours in humans and many other species, and mechanisms of synchrony could be conserved across species. The goal of the current paper is to explore evolutionary hypotheses linking vocal learning and beat synchronization through genomic approaches, testing the prediction that genetic underpinnings of birdsong also contribute to the aetiology of human interactions with musical beat structure. We combined state-of-the-art-genomic datasets that account for underlying polygenicity of these traits: birdsong genome-wide transcriptomics linked to singing in zebra finches, and a human genome-wide association study of beat synchronization. Results of competitive gene set analysis revealed that the genetic architecture of human beat synchronization is significantly enriched for birdsong genes expressed in songbird Area X (a key nucleus for vocal learning, and homologous to human basal ganglia). These findings complement ethological and neural evidence of the relationship between vocal learning and beat synchronization, supporting a framework of some degree of common genomic substrates underlying rhythm-related behaviours in two clades, humans and songbirds (the largest evolutionary radiation of vocal learners). Future cross-species approaches investigating the genetic underpinnings of beat synchronization in a broad evolutionary context are discussed. This article is part of the theme issue ‘Synchrony and rhythm interaction: from the brain to behavioural ecology’. 
    more » « less
  2. Most studies of acoustic communication focus on short units of vocalization such as songs, yet these units are often hierarchically organized into higher-order sequences and, outside human language, little is known about the drivers of sequence structure. Here, we investigate the organization, transmission and function of vocal sequences sung by male Albert's lyrebirds ( Menura alberti ), a species renowned for vocal imitations of other species. We quantified the organization of mimetic units into sequences, and examined the extent to which these sequences are repeated within and between individuals and shared among populations. We found that individual males organized their mimetic units into stereotyped sequences. Sequence structures were shared within and to a lesser extent among populations, implying that sequences were socially transmitted. Across the entire species range, mimetic units were sung with immediate variety and a high acoustic contrast between consecutive units, suggesting that sequence structure is a means to enhance receiver perceptions of repertoire complexity. Our results provide evidence that higher-order sequences of vocalizations can be socially transmitted, and that the order of vocal units can be functionally significant. We conclude that, to fully understand vocal behaviours, we must study both the individual vocal units and their higher-order temporal organization. 
    more » « less
  3. Birdsong has long been a subject of extensive research in the fields of ethology as well as neuroscience. Neural and behavioral mechanisms underlying song acquisition and production in male songbirds are particularly well studied, mainly because birdsong shares some important features with human speech such as critical dependence on vocal learning. However, birdsong, like human speech, primarily functions as communication signals. The mechanisms of song perception and recognition should also be investigated to attain a deeper understanding of the nature of complex vocal signals. Although relatively less attention has been paid to song receivers compared to signalers, recent studies on female songbirds have begun to reveal the neural basis of song preference. Moreover, there are other studies of song preference in juvenile birds which suggest possible functions of preference in social context including the sensory phase of song learning. Understanding the behavioral and neural mechanisms underlying the formation, maintenance, expression, and alteration of such song preference in birds will potentially give insight into the mechanisms of speech communication in humans. To pursue this line of research, however, it is necessary to understand current methodological challenges in defining and measuring song preference. In addition, consideration of ultimate questions can also be important for laboratory researchers in designing experiments and interpreting results. Here we summarize the current understanding of song preference in female and juvenile songbirds in the context of Tinbergen’s four questions, incorporating results ranging from ethological field research to the latest neuroscience findings. We also discuss problems and remaining questions in this field and suggest some possible solutions and future directions. 
    more » « less
  4. Candolin, Ulrika (Ed.)
    Abstract Learned traits, such as foraging strategies and communication signals, can change over time via cultural evolution. Using historical recordings, we investigate the cultural evolution of birdsong over nearly a 50-year period. Specifically, we examine the parts of white-crowned sparrow (Zonotrichia leucophrys nuttalli) songs used for mate attraction and territorial defense. We compared historical (early 1970s) recordings with contemporary (mid-2010s) recordings from populations within and near San Francisco, CA and assessed the vocal performance of these songs. Because birds exposed to anthropogenic noise tend to sing at higher minimum frequencies with narrower frequency bandwidths, potentially reducing one measure of song performance, we hypothesized that other song features, such as syllable complexity, might be exaggerated, as an alternative means to display performance capabilities. We found that vocal performance increased between historical and contemporary songs, with a larger effect size for urban songs, and that syllable complexity, measured as the number of frequency modulations per syllable, was historically low for urban males but increased significantly in urban songs. We interpret these results as evidence for males increasing song complexity and trilled performance over time in urban habitats, despite performance constraints from urban noise, and suggest a new line of inquiry into how environments alter vocal performance over time. 
    more » « less
  5. We present Sketch2Sound, a generative audio model capable of creating high-quality sounds from a set of interpretable time-varying control signals: loudness, brightness, and pitch, as well as text prompts. Sketch2Sound can synthesize arbitrary sounds from sonic imitations (i.e.,~a vocal imitation or a reference sound-shape). Sketch2Sound can be implemented on top of any text-to-audio latent diffusion transformer (DiT), and requires only 40k steps of fine-tuning and a single linear layer per control, making it more lightweight than existing methods like ControlNet. To synthesize from sketchlike sonic imitations, we propose applying random median filters to the control signals during training, allowing Sketch2Sound to be prompted using controls with flexible levels of temporal specificity. We show that Sketch2Sound can synthesize sounds that follow the gist of input controls from a vocal imitation while retaining the adherence to an input text prompt and audio quality compared to a text-only baseline. Sketch2Sound allows sound artists to create sounds with the semantic flexibility of text prompts and the expressivity and precision of a sonic gesture or vocal imitation. 
    more » « less