Statistical Significance of Clustering with Multidimensional Scaling

Shen, Hui; Bhamidi, Shankar; Liu, Yufeng

doi:10.1080/10618600.2023.2219708

Citation Details

Statistical Significance of Clustering with Multidimensional Scaling

Clustering is a fundamental tool for exploratory data analysis. One central problem in clustering is deciding if the clusters discovered by clustering methods are reliable as opposed to being artifacts of natural sampling variation. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimension, low-sample size data. Despite its successful application to many scientific problems, there are cases where the original SigClust may not work well. Furthermore, for specific applications, researchers may not have access to the original data and only have the dissimilarity matrix. In this case, clustering is still a valuable exploratory tool, but the original SigClust is not applicable. To address these issues, we propose a new SigClust method using multidimensional scaling (MDS). The underlying idea behind MDS-based SigClust is that one can achieve low-dimensional representations of the original data via MDS using only the dissimilarity matrix and then apply SigClust on the low-dimensional MDS space. The proposed MDS-based SigClust can circumvent the challenge of parameter estimation of the original method in high-dimensional spaces while keeping the essential clustering structure in the MDS space. Both simulations and real data applications demonstrate that the proposed method works remarkably well for assessing the statistical significance of clustering. Supplementary materials for this article are available online. more »

Award ID(s):: 2217440 2100729 2113662

PAR ID:: 10505583

Author(s) / Creator(s):: Shen, Hui; Bhamidi, Shankar; Liu, Yufeng

Publisher / Repository:: Taylor and Francis

Date Published:: 2024-01-02

Journal Name:: Journal of Computational and Graphical Statistics

Volume:: 33

Issue:: 1

ISSN:: 1061-8600

Page Range / eLocation ID:: 219 to 230

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1080/10618600.2023.2219708

More Like this