skip to main content


Title: Clustering quality metrics for subspace clustering
Award ID(s):
1845076
NSF-PAR ID:
10224893
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Pattern Recognition
Volume:
104
Issue:
C
ISSN:
0031-3203
Page Range / eLocation ID:
107328
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods.

     
    more » « less
  2. In this work, we analyze the activity of bees starting at 6 days old. The data was collected at the INRA (France) during 2014 and 2016. The activity is counted according to whether the bees enter or leave the hive. After data wrangling, we decided to analyze data corresponding to a period of 10 days. We use clustering method to determine bees with similar activity and to estimate the time during the day when the bees are most active. To achieve our objective, the data was analyzed in three different time periods in a day. One considering the daily activity during in two periods: morning and afternoon, then looking at activities in periods of 3 hours from 8:00am to 8:00pm and, finally looking at the activities hourly from 8:00am to 8:00pm. Our study found two clusters of bees and in one of them clearly the bees activity increased at the day 5. The smaller cluster included the most active bees representing about 24 percent of the total bees under study. Also, the highest activity of the bees was registered between 2:00pm until 3:00pm. A Chi-square test shows that there is a combined effect Treatment× Colony on the clusters formation. 
    more » « less
  3. Bellomo, N. ; Carrillo, J.A. ; Tadmor, E. (Ed.)
    In this work, we build a unifying framework to interpolate between density-driven and geometry-based algorithms for data clustering and, specifically, to connect the mean shift algorithm with spectral clustering at discrete and continuum levels. We seek this connection through the introduction of Fokker–Planck equations on data graphs. Besides introducing new forms of mean shift algorithms on graphs, we provide new theoretical insights on the behavior of the family of diffusion maps in the large sample limit as well as provide new connections between diffusion maps and mean shift dynamics on a fixed graph. Several numerical examples illustrate our theoretical findings and highlight the benefits of interpolating density-driven and geometry-based clustering algorithms. 
    more » « less