skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Measuring Class-Imbalance Sensitivity of Deterministic Performance Evaluation Metrics
The class-imbalance issue is intrinsic to many real-world machine learning tasks, particularly to the rare-event classification problems. Although the impact and treatment of imbalanced data is widely known, the magnitude of a metric’s sensitivity to class imbalance has attracted little attention. As a result, often the sensitive metrics are dismissed while their sensitivity may only be marginal. In this paper, we introduce an intuitive evaluation framework that quantifies metrics’ sensitivity to the class imbalance. Moreover, we reveal an interesting fact that there is a logarithmic behavior in metrics’ sensitivity meaning that the higher imbalance ratios are associated with the lower sensitivity of metrics. Our framework builds an intuitive understanding of the class-imbalance impact on metrics. We believe this can help avoid many common mistakes, specially the less-emphasized and incorrect assumption that all metrics’ quantities are comparable under different class-imbalance ratios.  more » « less
Award ID(s):
1931555
PAR ID:
10402092
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 IEEE International Conference on Image Processing (ICIP)
Page Range / eLocation ID:
51 - 55
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Rutkowski L.; Scherer R.; Korytkowski M.; Pedrycz W.; Tadeusiewicz R.; Zurada J. (Ed.)
    In this work, we investigate the impact of class imbalance on the accuracy and diversity of synthetic samples generated by conditional generative adversarial networks (CGAN) models. Though many studies utilizing GANs have seen extraordinary success in producing realistic image samples, these studies generally assume the use of well-processed and balanced benchmark image datasets, including MNIST and CIFAR-10. However, well-balanced data is uncommon in real world applications such as detecting fraud, diagnosing diabetes, and predicting solar flares. It is well known that when class labels are not distributed uniformly, the predictive ability of classification algorithms suffers significantly, a phenomenon known as the "class-imbalance problem." We show that the imbalance in the training set can also impact sample generation of CGAN models. We utilize the well known MNIST datasets, controlling the imbalance ratio of certain classes within the data through sampling. We are able to show that both the quality and diversity of generated samples suffer in the presence of class imbalances and propose a novel framework named Two-stage CGAN to produce high-quality synthetic samples in such cases. Our results indicate that the proposed framework provides a significant improvement over typical oversampling and undersampling techniques utilized for class imbalance remediation. 
    more » « less
  2. Abstract We present a case study of solar flare forecasting by means of metadata feature time series, by treating it as a prominent class-imbalance and temporally coherent problem. Taking full advantage of pre-flare time series in solar active regions is made possible via the Space Weather Analytics for Solar Flares (SWAN-SF) benchmark data set, a partitioned collection of multivariate time series of active region properties comprising 4075 regions and spanning over 9 yr of the Solar Dynamics Observatory period of operations. We showcase the general concept of temporal coherence triggered by the demand of continuity in time series forecasting and show that lack of proper understanding of this effect may spuriously enhance models’ performance. We further address another well-known challenge in rare-event prediction, namely, the class-imbalance issue. The SWAN-SF is an appropriate data set for this, with a 60:1 imbalance ratio for GOES M- and X-class flares and an 800:1 imbalance ratio for X-class flares against flare-quiet instances. We revisit the main remedies for these challenges and present several experiments to illustrate the exact impact that each of these remedies may have on performance. Moreover, we acknowledge that some basic data manipulation tasks such as data normalization and cross validation may also impact the performance; we discuss these problems as well. In this framework we also review the primary advantages and disadvantages of using true skill statistic and Heidke skill score, two widely used performance verification metrics for the flare-forecasting task. In conclusion, we show and advocate for the benefits of time series versus point-in-time forecasting, provided that the above challenges are measurably and quantitatively addressed. 
    more » « less
  3. Abstract Assessing sensitivity to unmeasured confounding is an important step in observational studies, which typically estimate effects under the assumption that all confounders are measured. In this paper, we develop a sensitivity analysis framework for balancing weights estimators, an increasingly popular approach that solves an optimization problem to obtain weights that directly minimizes covariate imbalance. In particular, we adapt a sensitivity analysis framework using the percentile bootstrap for a broad class of balancing weights estimators. We prove that the percentile bootstrap procedure can, with only minor modifications, yield valid confidence intervals for causal effects under restrictions on the level of unmeasured confounding. We also propose an amplification—a mapping from a one-dimensional sensitivity analysis to a higher dimensional sensitivity analysis—to allow for interpretable sensitivity parameters in the balancing weights framework. We illustrate our method through extensive real data examples. 
    more » « less
  4. Fraud detection has emerged as a pivotal process in different fields (e.g., e-commerce, social networks). Since interactions among entities provide valuable insights into fraudulent activities, such behaviors can be naturally represented as graphs, where graph neural networks (GNNs) have been developed as prominent models to boost the efficacy of fraud detection. However, the application of GNNs in this domain encounters significant challenges, primarily due to class imbalance and a mixture of homophily and heterophily of fraud graphs. To address these challenges, in this paper, we propose LACA, which implements fraud detection on graphs using Label-Aware feature aggregation to advance GNN training, which is regularized by Clustering Augmented optimization. Specifically, label-aware feature aggregation simplifies adaptive aggregation in homophily-heterophily mixed neighborhoods, preventing gradient domination by legitimate nodes and mitigating class imbalance in message passing. Clustering-augmented optimization provides fine-grained subclass semantics to improve detection performance, and yields additional benefit in addressing class imbalance. Extensive experiments on four fraud datasets demonstrate that LACA can significantly improve fraud detection performance on graphs with different imbalance ratios and homophily ratios, outperforming state-of-the-art GNN models. 
    more » « less
  5. Data imbalance is a fundamental challenge in ap- plying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state- of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications. 
    more » « less