skip to main content


Title: An Evaluative Measure of Clustering Methods Incorporating Hyperparameter Sensitivity
Clustering algorithms are often evaluated using metrics which compare with ground-truth cluster assignments, such as Rand index and NMI. Algorithm performance may vary widely for different hyperparameters, however, and thus model selection based on optimal performance for these metrics is discordant with how these algorithms are applied in practice, where labels are unavailable and tuning is often more art than science. It is therefore desirable to compare clustering algorithms not only on their optimally tuned performance, but also some notion of how realistic it would be to obtain this performance in practice. We propose an evaluation of clustering methods capturing this ease-of-tuning by modeling the expected best clustering score under a given computation budget. To encourage the adoption of the proposed metric alongside classic clustering evaluations, we provide an extensible benchmarking framework. We perform an extensive empirical evaluation of our proposed metric on popular clustering algorithms over a large collection of datasets from different domains, and observe that our new metric leads to several noteworthy observations.  more » « less
Award ID(s):
1763618 2106391
NSF-PAR ID:
10356094
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
36
Issue:
7
ISSN:
2159-5399
Page Range / eLocation ID:
7788 to 7796
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Intrinsic image decomposition and inverse rendering are long-standing problems in computer vision. To evaluate albedo recovery, most algorithms report their quantitative performance with a mean Weighted Human Disagreement Rate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relative albedo values and often fails to capture overall quality of the albedo. In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR: intensity, chromaticity and texture metrics. We show that existing algorithms often improve WHDR metric but perform poorly on other metrics. We then finetune different algorithms on our MAW dataset to significantly improve the quality of the reconstructed albedo both quantitatively and qualitatively. Since the proposed intensity, chromaticity, and texture metrics and the WHDR are all complementary we further introduce a relative performance measure that captures average performance. By analysing existing algorithms we show that there is significant room for improvement. Our dataset and evaluation metrics will enable researchers to develop algorithms that improve albedo reconstruction. Code and Data available at: https://measuredalbedo.github.io/ 
    more » « less
  2. Abstract Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules where the weights of the rules govern probabilistic interactions and are usually learned from data. Existing weight learning approaches typically attempt to learn a set of weights that maximizes some function of data likelihood; however, this does not always translate to optimal performance on a desired domain metric, such as accuracy or F1 score. In this paper, we introduce a taxonomy of search-based weight learning approaches for SRL frameworks that directly optimize weights on a chosen domain performance metric. To effectively apply these search-based approaches, we introduce a novel projection, referred to as scaled space (SS), that is an accurate representation of the true weight space. We show that SS removes redundancies in the weight space and captures the semantic distance between the possible weight configurations. In order to improve the efficiency of search, we also introduce an approximation of SS which simplifies the process of sampling weight configurations. We demonstrate these approaches on two state-of-the-art SRL frameworks: Markov logic networks and probabilistic soft logic. We perform empirical evaluation on five real-world datasets and evaluate them each on two different metrics. We also compare them against four other weight learning approaches. Our experimental results show that our proposed search-based approaches outperform likelihood-based approaches and yield up to a 10% improvement across a variety of performance metrics. Further, we perform an extensive evaluation to measure the robustness of our approach to different initializations and hyperparameters. The results indicate that our approach is both accurate and robust. 
    more » « less
  3. null (Ed.)
    Recommendation and ranking systems are known to suffer from popularity bias; the tendency of the algorithm to favor a few popular items while under-representing the majority of other items. Prior research has examined various approaches for mitigating popularity bias and enhancing the recommendation of long-tail, less popular, items. The effectiveness of these approaches is often assessed using different metrics to evaluate the extent to which over-concentration on popular items is reduced. However, not much attention has been given to the user-centered evaluation of this bias; how different users with different levels of interest towards popular items are affected by such algorithms. In this paper, we show the limitations of the existing metrics to evaluate popularity bias mitigation when we want to assess these algorithms from the users’ perspective and we propose a new metric that can address these limitations. In addition, we present an effective approach that mitigates popularity bias from the user-centered point of view. Finally, we investigate several state-of-the-art approaches proposed in recent years to mitigate popularity bias and evaluate their performances using the existing metrics and also from the users’ perspective. Our experimental results using two publicly-available datasets show that existing popularity bias mitigation techniques ignore the users’ tolerance towards popular items. Our proposed user-centered method can tackle popularity bias effectively for different users while also improving the existing metrics. 
    more » « less
  4. Abstract

    A common way to integrate and analyze large amounts of biological “omic” data is through pathway reconstruction: using condition-specific omic data to create a subnetwork of a generic background network that represents some process or cellular state. A challenge in pathway reconstruction is that adjusting pathway reconstruction algorithms’ parameters produces pathways with drastically different topological properties and biological interpretations. Due to the exploratory nature of pathway reconstruction, there is no ground truth for direct evaluation, so parameter tuning methods typically used in statistics and machine learning are inapplicable. We developed the pathway parameter advising algorithm to tune pathway reconstruction algorithms to minimize biologically implausible predictions. We leverage background knowledge in pathway databases to select pathways whose high-level structure resembles that of manually curated biological pathways. At the core of this method is a graphlet decomposition metric, which measures topological similarity to curated biological pathways. In order to evaluate pathway parameter advising, we compare its performance in avoiding implausible networks and reconstructing pathways from the NetPath database with other parameter selection methods across four pathway reconstruction algorithms. We also demonstrate how pathway parameter advising can guide reconstruction of an influenza host factor network. Pathway parameter advising is method agnostic; it is applicable to any pathway reconstruction algorithm with tunable parameters.

     
    more » « less
  5. The Pearson correlation coefficient squared,r2, is an important tool used in the analysis of neural data to quantify the similarity between neural tuning curves. Yet this metric is biased by trial-to-trial variability; as trial-to-trial variability increases, measured correlation decreases. Major lines of research are confounded by this bias, including those involving the study of invariance of neural tuning across conditions and the analysis of the similarity of tuning across neurons. To address this, we extend an estimator,r̂ER2, that was recently developed for estimating model-to-neuron correlation, in which a noisy signal is compared with a noise-free prediction, to the case of neuron-to-neuron correlation, in which two noisy signals are compared with each other. We compare the performance of our novel estimator to a prior method developed by Spearman, commonly used in other fields but widely overlooked in neuroscience, and find that our method has less bias. We then apply our estimator to demonstrate how it avoids drastic confounds introduced by trial-to-trial variability using data collected in two prior studies (macaque, both sexes) that examined two different forms of invariance in the neural encoding of visual inputs—translation invariance and fill-outline invariance. Our results quantify for the first time the gradual falloff with spatial offset of translation-invariant shape selectivity within visual cortical neuronal receptive fields and offer a principled method to compare invariance in noisy biological systems to that in noise-free models.

    SIGNIFICANCE STATEMENTQuantifying the similarity between two sets of averaged neural responses is fundamental to the analysis of neural data. A ubiquitous metric of similarity, the correlation coefficient, is attenuated by trial-to-trial variability that arises from many irrelevant factors. Spearman recognized this problem and proposed corrected methods that have been extended over a century. We show this method has large asymptotic biases that can be overcome using a novel estimator. Despite the frequent use of the correlation coefficient in neuroscience, consensus on how to address this fundamental statistical issue has not been reached. We provide an accurate estimator of the correlation coefficient and apply it to gain insight into visual invariance.

     
    more » « less