In many high-impact applications, it is important to ensure the quality of the output of a machine learning algorithm as well as its reliability in comparison to the complexity of the algorithm used. In this paper, we have initiated a mathematically rigorous theory to decide which models (algorithms applied on data sets) are close to each other in terms of certain metrics, such as performance and the complexity level of the algorithm. This involves creating a grid on the hypothetical spaces of data sets and algorithms so as to identify a finite set of probability distributions from which the data sets are sampled and a finite set of algorithms. A given threshold metric acting on this grid will express the nearness (or statistical distance) of each algorithm and data set of interest to any given application. A technically difficult part of this project is to estimate the so-called metric entropy of a compact subset of functions of \textbf{infinitely many variables} that arise in the definition of these spaces.
more »
« less
This content will become publicly available on April 6, 2026
CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions
Deep-learning models have allowed effective end-to-end SE systems in the Speech Enhancement (SE) field. Most of these methods are trained using a fixed reconstruction loss in a supervised setting. Often these losses do not perfectly represent the desired perceptual quality metrics, resulting in sub-optimal performance. Recently, there have been efforts to learn the behavior of those metrics directly via neural nets for training SE models. However, an accurate estimation of the true metric function introduces statistical complexity for training because it attempts to capture the exact value of the metric. We propose an adversarial training strategy based on statistical correlation that avoids the complexity of estimating the SE metric while learning to mimic its overall behavior. We call this framework CorrGAN and show its significant improvement over standard losses of the SOTA baselines and achieve SOTA performance on the VoiceBank+DEMAND dataset.
more »
« less
- PAR ID:
- 10616198
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-6874-1
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Location:
- Hyderabad, India
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Machine learning models—including prominent zero-shot models—are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes—or, in the case of zero-shot prediction, to improve its performance—without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping arg max with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.more » « less
-
Previous researches demonstrate that users’ actions in search interaction are associated with relative gains and losses to reference points, known as the reference dependence effect. However, this widely confirmed effect is not represented in most user models underpinning existing search evaluation metrics. In this study, we propose a new evaluation metric framework, namely Reference Dependent Metric (ReDeM), for assessing query-level search by incorporating the effect of reference dependence into the modelling of user search behavior. To test the overall effectiveness of the proposed framework, (1) we evaluate the performance, in terms of correlation with user satisfaction, of ReDeMs built upon different reference points against that of the widely-used metrics on three search datasets; (2) we examine the performance of ReDeMs under different task states, like task difficulty and task urgency; and (3) we analyze the statistical reliability of ReDeMs in terms of discriminative power. Experimental results indicate that: (1) ReDeMs integrated with a proper reference point achieve better correlations with user satisfaction than most of the existing metrics, like Discounted Cumulative Gain (DCG) and Rank-Biased Precision (RBP), even though their parameters have already been well-tuned; (2) ReDeMs reach relatively better performance compared to existing metrics when the task triggers a high-level cognitive load; (3) the discriminative power of ReDeMs is far stronger than Expected Reciprocal Rank (ERR), slightly stronger than Precision and similar to DCG, RBP and INST. To our knowledge, this study is the first to explicitly incorporate the reference dependence effect into the user browsing model and offline evaluation metrics. Our work illustrates a promising approach to leveraging the insights about user biases from cognitive psychology in better evaluating user search experience and enhancing user models.more » « less
-
We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out- of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the entire storage hierarchy—including disk—for GNN training. MariusGNN introduces a series of data organization and algorithmic contributions that 1) minimize the end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in memory. We evaluate MariusGNN against SoTA systems for learning GNN models and find that single-GPU training in MariusGNN achieves the same level of accuracy up to 8× faster than multi-GPU training in these systems, thus, introducing an order of magnitude monetary cost reduction. MariusGNN is open-sourced at www.marius-project.org.more » « less
-
Abstract Environmental decisions with substantial social and environmental implications are regularly informed by model predictions, incurring inevitable uncertainty. The selection of a set of model predictions to inform a decision is usually based on model performance, measured by goodness‐of‐fit metrics. Yet goodness‐of‐fit metrics have a questionable relationship to a model's value to end users, particularly when validation data are themselves uncertain. For example, decisions based on flow frequency models are not necessarily improved by adopting models with the best overall goodness of fit. We propose an alternative model evaluation approach based on the conditional value of sample information, first defined in 1961, which has found extensive use in sampling design optimization but which has not previously been used for model evaluation. The metric uses observations from a validation set to estimate the expected monetary costs associated with model prediction uncertainties. A model is only considered superior to alternatives if (i) its predictions reduce these costs and (ii) sufficient validation data are available to distinguish its performance from alternative models. By describing prediction uncertainties in monetary terms, the metric facilitates the communication of prediction uncertainty by end users, supporting the inclusion of uncertainty analysis in decision making.more » « less
An official website of the United States government
