skip to main content


Title: Exploring Acoustic Similarity and Preference for Novel Music Recommendation
Most commercial music services rely on collaborative filtering to recommend artists and songs. While this method is effective for popular artists with large fanbases, it can present difficulties for recommending novel, lesser known artists due to a relative lack of user preference data. In this paper, we therefore seek to understand how content-based approaches can be used to more effectively recommend songs from these lesser known artists. Specifically, we conduct a user study to answer three questions. Firstly, do most users agree which songs are most acoustically similar? Secondly, is acoustic similarity a good proxy for how an individual might construct a playlist or recommend music to a friend? Thirdly, if so, can we find acoustic features that are related to human judgments of acoustic similarity? To answer these questions, our study asked 117 test subjects to compare two unknown candidate songs relative to a third known reference song. Our findings show that 1) judgments about acoustic similarity are fairly consistent, 2) acoustic similarity is highly correlated with playlist selection and recommendation, but not necessarily personal preference, and 3) we identify a subset of acoustic features from the Spotify Web API that is particularly predictive of human similarity judgments.  more » « less
Award ID(s):
1901330
NSF-PAR ID:
10290593
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Symposium on Music Information Retrieval
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations. 
    more » « less
  2. With digital music consumption being at an all-time high, online music encyclopedia like MusicBrainz and music intelligence platforms like The Echo Nest are becoming increasingly important in identifying, organizing, and recommending music for listeners around the globe. As a byproduct, such sites collect comprehensive information about a vast amount of artists, their recorded songs, institutional support, and the collaborations between them. Using a unique mash-up of crowdsourced, curated, and algorithmically augmented data, this paper unpacks an unsolved problem that is key to promoting artistic innovation, i.e., how gender penetrates into artistic context leading to the globally perceived gender gap in the music industry. Specifically, we investigate gender-related differences in the sonic features of artists’ work, artists’ tagging by listeners, their record label affiliations, and collaboration networks. We find statistically significant disparities along all these dimensions. Moreover, the differences allow models to reliably identify the gender of songs’ creators and help elucidate the role of cultural and structural factors in sustaining inequality. Our findings contribute to a better understanding of gender differences in music production and inspire strategies that could improve the recognition of female artists and advance gender equity in artistic leadership and innovation. 
    more » « less
  3. Most studies of acoustic communication focus on short units of vocalization such as songs, yet these units are often hierarchically organized into higher-order sequences and, outside human language, little is known about the drivers of sequence structure. Here, we investigate the organization, transmission and function of vocal sequences sung by male Albert's lyrebirds ( Menura alberti ), a species renowned for vocal imitations of other species. We quantified the organization of mimetic units into sequences, and examined the extent to which these sequences are repeated within and between individuals and shared among populations. We found that individual males organized their mimetic units into stereotyped sequences. Sequence structures were shared within and to a lesser extent among populations, implying that sequences were socially transmitted. Across the entire species range, mimetic units were sung with immediate variety and a high acoustic contrast between consecutive units, suggesting that sequence structure is a means to enhance receiver perceptions of repertoire complexity. Our results provide evidence that higher-order sequences of vocalizations can be socially transmitted, and that the order of vocal units can be functionally significant. We conclude that, to fully understand vocal behaviours, we must study both the individual vocal units and their higher-order temporal organization. 
    more » « less
  4. Over the past two decades, innovations powered by artificial intelligence (AI) have extended into nearly all facets of human experience. Our ethnographic research suggests that while young people sense they can't “trust” AI, many are not sure how it works or how much control they have over its growing role in their lives. In this study, we attempt to answer the following questions: 1) What can we learn about young people's understandings of AI when they produce media with and about it? 2) What are the design features of an ethics-centered pedagogy that promotes STEM engagement via AI? To answer these questions, we co-developed and documented three projects at YR Media, a national network of youth journalists and artists who create multimedia for public distribution. Participants are predominantly youth of color and those contending with economic and other barriers to full participation in STEM fields. Findings showed that by creating a learning ecology that centered the cultures and experiences of its learners while leveraging familiar tools for critical analysis, youth deepened their understanding of AI. Our study also showed that providing opportunities for youth to produce ethics-centered Interactive stories interrogating invisibilized AI functionalities, and to release those stories to the public, empowered them to creatively express their understandings and apprehensions about AI. 
    more » « less
  5. Assessing similarity between design ideas is an inherent part of many design evaluations to measure novelty. In such evaluation tasks, humans excel at making mental connections among diverse knowledge sets and scoring ideas on their uniqueness. However, their decisions on novelty are often subjective and difficult to explain. In this paper, we demonstrate a way to uncover human judgment of design idea similarity using two dimensional idea maps. We derive these maps by asking humans for simple similarity comparisons of the form “Is idea A more similar to idea B or to idea C?” We show that these maps give insight into the relationships between ideas and help understand the domain. We also propose that the novelty of ideas can be estimated by measuring how far items are on these maps. We demonstrate our methodology through the experimental evaluations on two datasets of colored polygons (known answer) and milk frothers (unknown answer) sketches. We show that these maps shed light on factors considered by raters in judging idea similarity. We also show how maps change when less data is available or false/noisy ratings are provided. This method provides a new direction of research into deriving ground truth novelty metrics by combining human judgments and computational methods. 
    more » « less