skip to main content

Search for: All records

Editors contains: "Pedrycz W."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Rutkowski L. ; Scherer R. ; Korytkowski M. ; Pedrycz W. ; Tadeusiewicz R. ; Zurada J. (Ed.)
    In this work, we investigate the impact of class imbalance on the accuracy and diversity of synthetic samples generated by conditional generative adversarial networks (CGAN) models. Though many studies utilizing GANs have seen extraordinary success in producing realistic image samples, these studies generally assume the use of well-processed and balanced benchmark image datasets, including MNIST and CIFAR-10. However, well-balanced data is uncommon in real world applications such as detecting fraud, diagnosing diabetes, and predicting solar flares. It is well known that when class labels are not distributed uniformly, the predictive ability of classification algorithms suffers significantly, a phenomenon known as the "class-imbalance problem." We show that the imbalance in the training set can also impact sample generation of CGAN models. We utilize the well known MNIST datasets, controlling the imbalance ratio of certain classes within the data through sampling. We are able to show that both the quality and diversity of generated samples suffer in the presence of class imbalances and propose a novel framework named Two-stage CGAN to produce high-quality synthetic samples in such cases. Our results indicate that the proposed framework provides a significant improvement over typical oversampling and undersampling techniques utilized for class imbalance remediation. 
    more » « less
    Free, publicly-accessible full text available September 14, 2024
  2. Rutkowski, L. ; Scherer, R. ; Korytkowski, M. ; Pedrycz W. ; Tadeusiewicz R. ; Zurada J. (Ed.)
    Solar flares not only pose risks to outer space technologies and astronauts’ well being, but also cause disruptions on earth to our high-tech, interconnected infrastructure our lives highly depend on. While a number of machine-learning methods have been proposed to improve flare prediction, none of them, to the best of our knowledge, have investigated the impact of outliers on the reliability and robustness of those models’ performance. In this study, we investigate the impact of outliers in a multivariate time series benchmark dataset, namely SWAN-SF, on flare prediction models, and test our hypothesis. That is, there exist outliers in SWAN-SF, removal of which enhances the performance of the prediction models on unseen datasets. We employ Isolation Forest to detect the outliers among the weaker flare instances. Several experiments are carried out using a large range of contamination rates which determine the percentage of present outliers. We assess the quality of each dataset in terms of its actual contamination using TimeSeriesSVC. In our best findings, we achieve a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score. The results show that overall a significant improvement can be achieved for flare prediction if outliers are detected and removed properly. 
    more » « less