Beginner musicians often struggle to identify specific errors in their performances, such as playing incorrect notes or rhythms. There are two limitations in existing tools for music error detection: (1) Existing approaches rely on automatic alignment; therefore, they are prone to errors caused by small deviations between alignment targets; (2) There is insufficient data to train music error detection models, resulting in over-reliance on heuristics. To address (1), we propose a novel transformer model, Polytune, that takes audio inputs and outputs annotated music scores. This model can be trained end-to-end to implicitly align and compare performance audio with music scores through latent space representations. To address (2), we present a novel data generation technique capable of creating large-scale synthetic music error datasets. Our approach achieves a 64.1% average Error Detection F1 score, improving upon prior work by 40 percentage points across 14 instruments. Additionally, our model can handle multiple instruments compared with existing transcription methods repurposed for music error detection.
more »
« less
Limitations of Post-Hoc Feature Alignment for Robustness
Feature alignment is an approach to improving robustness to distribution shift that matches the distribution of feature activations between the training distribution and test distribution. A particularly simple but effective approach to feature alignment involves aligning the batch normalization statistics between the two distributions in a trained neural network. This technique has received renewed interest lately because of its impressive performance on robustness benchmarks. However, when and why this method works is not well understood. We investigate the approach in more detail and identify several limitations. We show that it only significantly helps with a narrow set of distribution shifts and we identify several settings in which it even degrades performance. We also explain why these limitations arise by pinpointing why this approach can be so effective in the first place. Our findings call into question the utility of this approach and Unsupervised Domain Adaptation more broadly for improving robustness in practice.
more »
« less
- Award ID(s):
- 2031899
- PAR ID:
- 10251087
- Date Published:
- Journal Name:
- IEEE Computer Society Conference on Computer Vision and Pattern Recognition
- ISSN:
- 2332-564X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Polymer porous membranes are crucial in various applications, including water filtration, tissue engineering, and drug administration. Conventional far‐field electrospinning (FFES) is widely used for producing polymeric membranes due to its cost‐effectiveness, scalability, and flexibility in using many polymers. However, FFES has limitations in controlling pore form and size, as it produces randomly oriented fibers that lead to inconsistent and noncustomizable pore sizes. To address these limitations, this work combines near‐field electrospinning (NFES) with thermal treatment of polymer fibers and membranes. NFES offers more precise control over fiber placement and alignment, producing well‐defined fiber patterns with consistent and customizable pore sizes without compromising the thickness of membranes. By exploring the interplay between polymer behavior, thermal effects, and capillary action, the differences in pore area under various temperatures and fiber spacings are characterized. Additionally, this study investigates the influence of multilayer infusion on pore size and geometric arrangement by examining multilayer configurations stacked at various angles. The results indicate that increasing the number of layers leads to decreased pore size, while the alignment of infused fibers affects pore shape. This integrated approach enhances control over membrane characteristics, improving the performance and consistency of polymer porous membrane fabrication across various applications.more » « less
-
na (Ed.)Most existing audio-text emotion recognition studies have focused on the computational modeling aspects, including strategies for fusing the modalities. An area that has received less attention is understanding the role of proper temporal synchronization between the modalities in the model performance. This study presents a transformer-based model designed with a word-chunk concept, which offers an ideal framework to explore different strategies to align text and speech. The approach creates chunks with alternative alignment strategies with different levels of dependency on the underlying lexical boundaries. A key contribution of this study is the multi-scale chunk alignment strategy, which generates random alignments to create the chunks without considering lexical boundaries. For every epoch, the approach generates a different alignment for each sentence, serving as an effective regularization method for temporal dependency. Our experimental results based on the MSP-Podcast corpus indicate that providing precise temporal alignment information to create the audio-text chunks does not improve the performance of the system. The attention mechanisms in the transformer-based approach are able to compensate for imperfect synchronization between the modalities. However, using exact lexical boundaries makes the system highly vulnerable to missing modalities. In contrast, the model trained with the proposed multi-scale chunk regularization strategy using random alignment can significantly increase its robustness against missing data and remain effective, even under a single audio-only emotion recognition task. The code is available at: https://github.com/winston-lin-wei-cheng/MultiScale-Chunk-Regularizationmore » « less
-
The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature’s distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature’s function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.more » « less
-
Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples.more » « less
An official website of the United States government

