Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Raza Syed, Ali; Mandel, Michael I.

doi:10.1109/ICASSP49357.2023.10097237

Citation Details

Estimating Shapley Values of Training Utterances for Automatic Speech Recognition Models

Data Valuation in machine learning is concerned with quantifying the relative contribution of a training example to a model’s performance. Quantifying the importance of training examples is useful for identifying high and low quality data to curate training datasets and for address data quality issues. Shapley values have gained traction in machine learning for curating training data and identifying data quality issues. While computing the Shapley values of training examples is computationally prohibitive, approximation methods have been used successfully for classification models in computer vision tasks. We investigate data valuation for Automatic Speech Recognition models which perform a structured prediction task and propose a method for estimating Shapley values for these models. We show that a proxy model can be learned for the acoustic model component of an end-to-end ASR and used to estimate Shapley values for acoustic frames. We present a method for using the proxy acoustic model to estimate Shapley values for variable length utterances and demonstrate that the Shapley values provide a signal of example quality. more »

Award ID(s):: 1750383

PAR ID:: 10439006

Author(s) / Creator(s):: Raza Syed, Ali; Mandel, Michael I.

Date Published:: 2023-06-04

Journal Name:: IEEE International Conference on Acoustics Speech and Signal Processing

Page Range / eLocation ID:: 1 to 5

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP49357.2023.10097237

More Like this