skip to main content

Title: Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data
Multi-view data can be generated from diverse sources, by different technologies, and in multiple modalities. In various fields, integrating information from multi-view data has pushed the frontier of discovery. In this paper, we develop a new approach for multi-view clustering, which overcomes the limitations of existing methods such as the need of pooling data across views, restrictions on the clustering algorithms allowed within each view, and the disregard for complementary information between views. Our new method, called CPS-merge analysis , merges clusters formed by the Cartesian product of single-view cluster labels, guided by the principle of maximizing clustering stability as evaluated by CPS analysis. In addition, we introduce measures to quantify the contribution of each view to the formation of any cluster. CPS-merge analysis can be easily incorporated into an existing clustering pipeline because it only requires single-view cluster labels instead of the original data. We can thus readily apply advanced single-view clustering algorithms. Importantly, our approach accounts for both consensus and complementary effects between different views, whereas existing ensemble methods focus on finding a consensus for multiple clustering results, implying that results from different views are variations of one clustering structure. Through experiments on single-cell datasets, we demonstrate that our approach frequently outperforms other state-of-the-art methods.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Alber, Mark
Date Published:
Journal Name:
PLOS Computational Biology
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-View Clustering (MVC) aims to find the cluster structure shared by multiple views of a particular dataset. Existing MVC methods mainly integrate the raw data from different views, while ignoring the high-level information. Thus, their performance may degrade due to the conflict between heterogeneous features and the noises existing in each individual view. To overcome this problem, we propose a novel Multi-View Ensemble Clustering (MVEC) framework to solve MVC in an Ensemble Clustering (EC) way, which generates Basic Partitions (BPs) for each view individually and seeks for a consensus partition among all the BPs. By this means, we naturally leverage the complementary information of multi-view data in the same partition space. Instead of directly fusing BPs, we employ the low-rank and sparse decomposition to explicitly consider the connection between different views and detect the noises in each view. Moreover, the spectral ensemble clustering task is also involved by our framework with a carefully designed constraint, making MVEC a unified optimization framework to achieve the final consensus partition. Experimental results on six real-world datasets show the efficacy of our approach compared with both MVC and EC methods.

    more » « less
  2. ABSTRACT CONTEXT Culture influences the dynamics and outcomes of organizations in profound ways, including individual-level outcomes (like the quality of work products) and collective impacts (such as reputation or influence). As such, understanding organizational culture is a crucial element of understanding performance; from an anthropological perspective, ‘performance’ is not an outcome of culture, it is a part of culture. A key challenge in understanding organizational culture, especially in complex academic organizations, is the lack of a flexible, scalable approach for data collection and analysis. PURPOSE OR GOAL In this study, we report on our development of a survey-based cultural characterization tool that leverages both lightweight data collection from stakeholders in the organization and public information about that organization. We also integrate perspectives from prior literature about faculty, students, and staff in academic departments. Taken together, the resulting survey covers key elements of culture and allows for scalable data collection across settings via customizations and embedded logic in the survey itself. The outcome of this work is a design process for a new and promising tool for scalable cultural characterization, and we have deployed this tool across two institutions. APPROACH OR METHODOLOGY/METHODS We leverage prior research, our own preliminary data collection, and our experience with this approach in a different setting to develop a cultural characterization survey suitable for delivery to multiple engineering department stakeholders (faculty, staff, and students). We start with a modest number of interviews, stratified by these three groups and achieving saturation of responses, to understand their views on their organization, its strengths and weaknesses, and their perceptions of how it ‘works’. We merge this information with public data (for instance, departmental vision or mission statements, which convey a sense of priorities or values) as well as prior literature about higher education culture. We also draw upon our experience in another setting as well as pilot testing data, and the result is a carefully-constructed set of dichotomous items that are offered to department stakeholders in survey form using an electronic survey platform. We also collect background and demographic information in the survey. The resulting data are analyzed using Cultural Consensus Theory (CCT) to extract meaningful information about the departmental culture from the perspectives of the stakeholder groups. ACTUAL OR ANTICIPATED OUTCOMES The resulting survey consists of two parts, each with sub-components. The two top level survey parts contain: (i) items common to all respondents in all settings (i.e. all institutions in this study), and (ii) a set of institution-specific items. Within those sections, the framing of the items is calibrated for the stakeholder groups so that items make sense to them within the context of their experience. The survey has been administered, and the data are being analyzed and interpreted presently. We expect the results to capture the specific elements of local culture within these institutions, as well as differences in perspectives and experience among the three primary stakeholder groups. CONCLUSIONS/RECOMMENDATIONS/SUMMARY This study demonstrates a scalable approach to survey development for the purposes of cultural characterization, and its use across settings and with multiple stakeholder groups. This work enables a very nuanced view of culture within a department, and these results can be used within academic departments to enable discussion about change, priorities, performance, and the work environment. 
    more » « less
  3. In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that enjoys strong empirical performance and inherits the mathematical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data view that are best for determining the groups, often leading to improved integrative clustering. To solve our problem, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data. 
    more » « less
  4. null (Ed.)
    Recently, significant efforts are made to explore device-free human activity recognition techniques that utilize the information collected by existing indoor wireless infrastructures without the need for the monitored subject to carry a dedicated device. Most of the existing work, however, focuses their attention on the analysis of the signal received by a single device. In practice, there are usually multiple devices "observing" the same subject. Each of these devices can be regarded as an information source and provides us an unique "view" of the observed subject. Intuitively, if we can combine the complementary information carried by the multiple views, we will be able to improve the activity recognition accuracy. Towards this end, we propose DeepMV, a unified multi-view deep learning framework, to learn informative representations of heterogeneous device-free data. DeepMV can combine different views' information weighted by the quality of their data and extract commonness shared across different environments to improve the recognition performance. To evaluate the proposed DeepMV model, we set up a testbed using commercialized WiFi and acoustic devices. Experiment results show that DeepMV can effectively recognize activities and outperform the state-of-the-art human activity recognition methods. 
    more » « less
  5. As one of the most important research topics in the unsupervised learning field, Multi-View Clustering (MVC) has been widely studied in the past decade and numerous MVC methods have been developed. Among these methods, the recently emerged Graph Neural Networks (GNN) shine a light on modeling both topological structure and node attributes in the form of graphs, to guide unified embedding learning and clustering. However, the effectiveness of existing GNN-based MVC methods is still limited due to the insufficient consideration in utilizing the self-supervised information and graph information, which can be reflected from the following two aspects: 1) most of these models merely use the self-supervised information to guide the feature learning and fail to realize that such information can be also applied in graph learning and sample weighting; 2) the usage of graph information is generally limited to the feature aggregation in these models, yet it also provides valuable evidence in detecting noisy samples. To this end, in this paper we propose Self-Supervised Graph Attention Networks for Deep Weighted Multi-View Clustering (SGDMC), which promotes the performance of GNN-based deep MVC models by making full use of the self-supervised information and graph information. Specifically, a novel attention-allocating approach that considers both the similarity of node attributes and the self-supervised information is developed to comprehensively evaluate the relevance among different nodes. Meanwhile, to alleviate the negative impact caused by noisy samples and the discrepancy of cluster structures, we further design a sample-weighting strategy based on the attention graph as well as the discrepancy between the global pseudo-labels and the local cluster assignment. Experimental results on multiple real-world datasets demonstrate the effectiveness of our method over existing approaches. 
    more » « less