skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data
Multi-view data can be generated from diverse sources, by different technologies, and in multiple modalities. In various fields, integrating information from multi-view data has pushed the frontier of discovery. In this paper, we develop a new approach for multi-view clustering, which overcomes the limitations of existing methods such as the need of pooling data across views, restrictions on the clustering algorithms allowed within each view, and the disregard for complementary information between views. Our new method, called CPS-merge analysis , merges clusters formed by the Cartesian product of single-view cluster labels, guided by the principle of maximizing clustering stability as evaluated by CPS analysis. In addition, we introduce measures to quantify the contribution of each view to the formation of any cluster. CPS-merge analysis can be easily incorporated into an existing clustering pipeline because it only requires single-view cluster labels instead of the original data. We can thus readily apply advanced single-view clustering algorithms. Importantly, our approach accounts for both consensus and complementary effects between different views, whereas existing ensemble methods focus on finding a consensus for multiple clustering results, implying that results from different views are variations of one clustering structure. Through experiments on single-cell datasets, we demonstrate that our approach frequently outperforms other state-of-the-art methods.  more » « less
Award ID(s):
2013905
PAR ID:
10437321
Author(s) / Creator(s):
; ;
Editor(s):
Alber, Mark
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
19
Issue:
4
ISSN:
1553-7358
Page Range / eLocation ID:
e1011044
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods. 
    more » « less
  2. In recent years, incomplete multi-view clustering (IMVC), which studies the challenging multi-view clustering problem on missing views, has received growing research interests. Previous IMVC methods suffer from the following issues: (1) the inaccurate imputation for missing data, which leads to suboptimal clustering performance, and (2) most existing IMVC models merely consider the explicit presence of graph structure in data, ignoring the fact that latent graphs of different views also provide valuable information for the clustering task. To overcome such challenges, we present a novel method, termed Adaptive feature imputation with latent graph for incomplete multi-view clustering (AGDIMC). Specifically, it captures the embbedded features of each view by incorporating the view-specific deep encoders. Then, we construct partial latent graphs on complete data, which can consolidate the intrinsic relationships within each view while preserving the topological information. With the aim of estimating the missing sample based on the available information, we utilize an adaptive imputation layer to impute the embedded feature of missing data by using cross-view soft cluster assignments and global cluster centroids. As the imputation progresses, the portion of complete data increases, contributing to enhancing the discriminative information contained in global pseudo-labels. Meanwhile, to alleviate the negative impact caused by inferior impute samples and the discrepancy of cluster structures, we further design an adaptive imputation strategy based on the global pseudo-label and the local cluster assignment. Experimental results on multiple real-world datasets demonstrate the effectiveness of our method over existing approaches. 
    more » « less
  3. In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that enjoys strong empirical performance and inherits the mathematical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering. To solve our problem, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data. 
    more » « less
  4. Abstract Multi‐view data, which is matched sets of measurements on the same subjects, have become increasingly common with advances in multi‐omics technology. Often, it is of interest to find associations between the views that are related to the intrinsic class memberships. Existing association methods cannot directly incorporate class information, while existing classification methods do not take into account between‐views associations. In this work, we propose a framework for Joint Association and Classification Analysis of multi‐view data (JACA). Our goal is not to merely improve the misclassification rates, but to provide a latent representation of high‐dimensional data that is both relevant for the subtype discrimination and coherent across the views. We motivate the methodology by establishing a connection between canonical correlation analysis and discriminant analysis. We also establish the estimation consistency of JACA in high‐dimensional settings. A distinct advantage of JACA is that it can be applied to the multi‐view data with block‐missing structure, that is to cases where a subset of views or class labels is missing for some subjects. The application of JACA to quantify the associations between RNAseq and miRNA views with respect to consensus molecular subtypes in colorectal cancer data from The Cancer Genome Atlas project leads to improved misclassification rates and stronger found associations compared to existing methods. 
    more » « less
  5. null (Ed.)
    Recently, significant efforts are made to explore device-free human activity recognition techniques that utilize the information collected by existing indoor wireless infrastructures without the need for the monitored subject to carry a dedicated device. Most of the existing work, however, focuses their attention on the analysis of the signal received by a single device. In practice, there are usually multiple devices "observing" the same subject. Each of these devices can be regarded as an information source and provides us an unique "view" of the observed subject. Intuitively, if we can combine the complementary information carried by the multiple views, we will be able to improve the activity recognition accuracy. Towards this end, we propose DeepMV, a unified multi-view deep learning framework, to learn informative representations of heterogeneous device-free data. DeepMV can combine different views' information weighted by the quality of their data and extract commonness shared across different environments to improve the recognition performance. To evaluate the proposed DeepMV model, we set up a testbed using commercialized WiFi and acoustic devices. Experiment results show that DeepMV can effectively recognize activities and outperform the state-of-the-art human activity recognition methods. 
    more » « less