Beginner musicians often struggle to identify specific errors in their performances, such as playing incorrect notes or rhythms. There are two limitations in existing tools for music error detection: (1) Existing approaches rely on automatic alignment; therefore, they are prone to errors caused by small deviations between alignment targets; (2) There is insufficient data to train music error detection models, resulting in over-reliance on heuristics. To address (1), we propose a novel transformer model, Polytune, that takes audio inputs and outputs annotated music scores. This model can be trained end-to-end to implicitly align and compare performance audio with music scores through latent space representations. To address (2), we present a novel data generation technique capable of creating large-scale synthetic music error datasets. Our approach achieves a 64.1% average Error Detection F1 score, improving upon prior work by 40 percentage points across 14 instruments. Additionally, our model can handle multiple instruments compared with existing transcription methods repurposed for music error detection.
more »
« less
This content will become publicly available on June 11, 2026
pyAMPACT: A Score-Audio Alignment Toolkit for Performance Data Estimation and Multi-modal Processing
pyAMPACT (Python-based Automatic Music Performance Analysis and Comparison Toolkit) links symbolic and audio music representations to facilitate score-informed estimation of performance data from audio. It can read a range of symbolic formats and can output note-linked audio descriptors/performance data into MEI-formatted files. pyAMPACT uses score alignment to calculate time-frequency regions of importance for each note in the symbolic representation from which it estimates a range of parameters from the corresponding audio. These include frame-wise and note-level tuning-, dynamics-, and timbre-related performance descriptors, with timing-related information available from the score alignment. Beyond performance data estimation, pyAMPACT also facilitates multi-modal investigations through its infrastructure for linking symbolic representations and annotations to audio.
more »
« less
- Award ID(s):
- 2228910
- PAR ID:
- 10657702
- Publisher / Repository:
- 2025 International Computer Music Conference
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Music is one of the most universal forms of communication and entertainment across cultures. This can largely be credited to the sense of synesthesia, or the combining of senses. Based on this concept of synesthesia, we want to explore whether generative AI can create visual representations for music. The aim is to inspire the user’s imagination and enhance the user experience when enjoying music. Our approach has the following steps: (a) Music is analyzed and classified into multiple dimensions (including instruments, emotion, tempo, pitch range, harmony, and dynamics) to produce textual descriptions. (b) The texts form inputs of machine models that can predict the genre of the input audio. (c) The prompts are inputs of generative machine models to create visual representations. The visual representations are continuously updated as the music plays, ensuring that the visual effects aptly mirror the musical changes. A comprehensive user study with 88 users confirms that our approach is able to generate visual art reflecting the music pieces. From a list of images covering both abstract images and realistic images, users considered that our system-generated images can better represent pieces of music than human-chosen images. It suggests that generative arts can become a promising method to enhance users' listening experience while enjoying music. Our method provides a new approach to visualize music and to enjoy music through generative arts.more » « less
-
This note revisits the classical orthogonal Procrustes problem and investigates the norm-dependent geometric behavior underlying Procrustes alignment for subspaces. It presents generic, deterministic bounds quantifying the performance of a specified Procrustes-based choice of subspace alignment. Numerical examples illustrate the theoretical observations and offer additional, empirical findings which are discussed in detail. This note complements recent advances in statistics involving Procrustean matrix perturbation decompositions and eigenvector estimation.more » « less
-
null (Ed.)Co-array-based Direction of Arrival (DoA) estimation using Sparse Linear Arrays (SLAs) has recently gained considerable attention in array processing thanks to its capability of providing enhanced degrees of freedom for DoAs that can be resolved. Additionally, deployment of one-bit Analog-to-Digital Converters (ADCs) has become an important topic in array processing, as it offers both a low-cost and a low-complexity implementation. Although the problem of DoA estimation from one-bit SLA measurements has been studied in some prior works, its analytical performance has not yet been investigated and characterized. In this paper, to provide valuable insights into the performance of DoA estimation from one-bit SLA measurements, we derive an asymptotic closed-form expression for the performance of One-Bit Co-Array-Based MUSIC (OBCAB-MUSIC). Further, numerical simulations are provided to validate the asymptotic closed-form expression for the performance of OBCAB-MUSIC and to show an interesting use case of it in evaluating the resolution of OBCAB-MUSIC.more » « less
-
We consider the problem of personalizing audio to maximize user experience. Briefly, we aim to find a filter h*, which applied to any music or speech, will maximize the user’s satisfaction. This is a black-box optimization problem since the user’s satisfaction function is unknown. Substantive work has been done on this topic where the key idea is to play audio samples to the user, each shaped by a different filter hi, and query the user for their satisfaction scores f(hi). A family of “surrogate” functions is then designed to fit these scores and the optimization method gradually refines these functions to arrive at the filter ˆh* that maximizes satisfaction. In certain applications, we observe that a second type of querying is possible where users can tell us the individual elements h*[j] of the optimal filter h*. Consider an analogy from cooking where the goal is to cook a recipe that maximizes user satisfaction. A user can be asked to score various cooked recipes (e.g., tofu fried rice) or to score individual ingredients (say, salt, sugar, rice, chicken, etc.). Given a budget of B queries, where a query can be of either type, our goal is to find the recipe that will maximize this user’s satisfaction. Our proposal builds on Sparse Gaussian Process Regression (GPR) and shows how a hybrid approach can outperform any one type of querying. Our results are validated through simulations and real world experiments, where volunteers gave feedback on music/speech audio and were able to achieve high satisfaction levels. We believe this idea of hybrid querying opens new problems in black-box optimization and solutions can benefit other applications beyond audio personalization.more » « less
An official website of the United States government
