skip to main content


Title: Matching human vocal imitations to birdsong: An exploratory analysis
We explore computational strategies for matching human vocal imitations of birdsong to actual birdsong recordings. We recorded human vocal imitations of birdsong and subsequently analysed these data using three categories of audio features for matching imitations to original birdsong: spectral, temporal, and spectrotemporal. These exploratory analyses suggest that spectral features can help distinguish imitation strategies (e.g. whistling vs. singing) but are insufficient for distinguishing species. Similarly, whereas temporal features are correlated between human imitations and natural birdsong, they are also insufficient. Spectrotemporal features showed the greatest promise, in particular when used to extract a representation of the pitch contour of birdsong and human imitations. This finding suggests a link between the task of matching human imitations to birdsong to retrieval tasks in the music domain such as query-by-humming and cover song retrieval; we borrow from such existing methodologies to outline directions for future research.  more » « less
Award ID(s):
1633206
NSF-PAR ID:
10118935
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2nd International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Birdsong is a longstanding model system for studying evolution and biodiversity. Here, we collected and analyzed high quality song recordings from seven species in the familyEstrildidae. We measured the acoustic features of syllables and then used dimensionality reduction and machine learning classifiers to identify features that accurately assigned syllables to species. Species differences were captured by the first 3 principal components, corresponding to basic frequency, power distribution, and spectrotemporal features. We then identified the measured features underlying classification accuracy. We found that fundamental frequency, mean frequency, spectral flatness, and syllable duration were the most informative features for species identification. Next, we tested whether specific acoustic features of species’ songs predicted phylogenetic distance. We found significant phylogenetic signal in syllable frequency features, but not in power distribution or spectrotemporal features. Results suggest that frequency features are more constrained by species’ genetics than are other features, and are the best signal features for identifying species from song recordings. The absence of phylogenetic signal in power distribution and spectrotemporal features suggests that these song features are labile, reflecting learning processes and individual recognition.

     
    more » « less
  2. Candolin, Ulrika (Ed.)
    Abstract Learned traits, such as foraging strategies and communication signals, can change over time via cultural evolution. Using historical recordings, we investigate the cultural evolution of birdsong over nearly a 50-year period. Specifically, we examine the parts of white-crowned sparrow (Zonotrichia leucophrys nuttalli) songs used for mate attraction and territorial defense. We compared historical (early 1970s) recordings with contemporary (mid-2010s) recordings from populations within and near San Francisco, CA and assessed the vocal performance of these songs. Because birds exposed to anthropogenic noise tend to sing at higher minimum frequencies with narrower frequency bandwidths, potentially reducing one measure of song performance, we hypothesized that other song features, such as syllable complexity, might be exaggerated, as an alternative means to display performance capabilities. We found that vocal performance increased between historical and contemporary songs, with a larger effect size for urban songs, and that syllable complexity, measured as the number of frequency modulations per syllable, was historically low for urban males but increased significantly in urban songs. We interpret these results as evidence for males increasing song complexity and trilled performance over time in urban habitats, despite performance constraints from urban noise, and suggest a new line of inquiry into how environments alter vocal performance over time. 
    more » « less
  3. Birdsong has long been a subject of extensive research in the fields of ethology as well as neuroscience. Neural and behavioral mechanisms underlying song acquisition and production in male songbirds are particularly well studied, mainly because birdsong shares some important features with human speech such as critical dependence on vocal learning. However, birdsong, like human speech, primarily functions as communication signals. The mechanisms of song perception and recognition should also be investigated to attain a deeper understanding of the nature of complex vocal signals. Although relatively less attention has been paid to song receivers compared to signalers, recent studies on female songbirds have begun to reveal the neural basis of song preference. Moreover, there are other studies of song preference in juvenile birds which suggest possible functions of preference in social context including the sensory phase of song learning. Understanding the behavioral and neural mechanisms underlying the formation, maintenance, expression, and alteration of such song preference in birds will potentially give insight into the mechanisms of speech communication in humans. To pursue this line of research, however, it is necessary to understand current methodological challenges in defining and measuring song preference. In addition, consideration of ultimate questions can also be important for laboratory researchers in designing experiments and interpreting results. Here we summarize the current understanding of song preference in female and juvenile songbirds in the context of Tinbergen’s four questions, incorporating results ranging from ethological field research to the latest neuroscience findings. We also discuss problems and remaining questions in this field and suggest some possible solutions and future directions. 
    more » « less
  4. null (Ed.)
    The development of rhythmicity is foundational to communicative and social behaviours in humans and many other species, and mechanisms of synchrony could be conserved across species. The goal of the current paper is to explore evolutionary hypotheses linking vocal learning and beat synchronization through genomic approaches, testing the prediction that genetic underpinnings of birdsong also contribute to the aetiology of human interactions with musical beat structure. We combined state-of-the-art-genomic datasets that account for underlying polygenicity of these traits: birdsong genome-wide transcriptomics linked to singing in zebra finches, and a human genome-wide association study of beat synchronization. Results of competitive gene set analysis revealed that the genetic architecture of human beat synchronization is significantly enriched for birdsong genes expressed in songbird Area X (a key nucleus for vocal learning, and homologous to human basal ganglia). These findings complement ethological and neural evidence of the relationship between vocal learning and beat synchronization, supporting a framework of some degree of common genomic substrates underlying rhythm-related behaviours in two clades, humans and songbirds (the largest evolutionary radiation of vocal learners). Future cross-species approaches investigating the genetic underpinnings of beat synchronization in a broad evolutionary context are discussed. This article is part of the theme issue ‘Synchrony and rhythm interaction: from the brain to behavioural ecology’. 
    more » « less
  5. Abstract

    Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes.

     
    more » « less