skip to main content


Title: High-Order Co-Clustering via Strictly Orthogonal and Symmetric L1-Norm Nonnegative Matrix Tri-Factorization

Different to traditional clustering methods that deal with one single type of data, High-Order Co- Clustering (HOCC) aims to cluster multiple types of data simultaneously by utilizing the inter- or/and intra-type relationships across different data types. In existing HOCC methods, data points routinely enter the objective functions with squared residual errors. As a result, outlying data samples can dominate the objective functions, which may lead to incorrect clustering results. Moreover, existing methods usually suffer from soft clustering, where the probabilities to different groups can be very close. In this paper, we propose an L1 -norm symmetric nonnegative matrix tri-factorization method to solve the HOCC problem. Due to the orthogonal constraints and the symmetric L1 -norm formulation in our new objective, conventional auxiliary function approach no longer works. Thus we derive the solution algorithm using the alternating direction method of multipliers. Extensive experiments have been conducted on a real world data set, in which promising empirical results, including less time consumption, strictly orthogonal membership matrix, lower local minima etc., have demonstrated the effectiveness of our proposed method.

 
more » « less
Award ID(s):
1652943
PAR ID:
10084447
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Page Range / eLocation ID:
2454 to 2460
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Laplacian Embedding (LE) is a powerful method to reveal the intrinsic geometry of high-dimensional data by using graphs. Imposing the orthogonal and nonnegative constraints onto the LE objective has proved to be effective to avoid degenerate and negative solutions, which, though, are challenging to achieve simultaneously because they are nonlinear and nonconvex. In addition, recent studies have shown that using the p-th order of the L2-norm distances in LE can find the best solution for clustering and promote the robustness of the embedding model against outliers, although this makes the optimization objective nonsmooth and difficult to efficiently solve in general. In this work, we study LE that uses the p-th order of the L2-norm distances and satisfies both orthogonal and nonnegative constraints. We introduce a novel smoothed iterative reweighted method to tackle this challenging optimization problem and rigorously analyze its convergence. We demonstrate the effectiveness and potential of our proposed method by extensive empirical studies on both synthetic and real data sets.

     
    more » « less
  2. The low-rank matrix recovery is an important machine learning research topic with various scientific applications. Most existing low-rank matrix recovery methods relax the rank minimization problem via the trace norm minimization. However, such a relaxation makes the solution seriously deviate from the original one. Meanwhile, most matrix recovery methods minimize the squared prediction errors on the observed entries, which is sensitive to outliers. In this paper, we propose a new robust matrix recovery model to address the above two challenges. The joint capped trace norm and capped $\ell_1$-norm are used to tightly approximate the rank minimization and enhance the robustness to outliers. The evaluation experiments are performed on both synthetic data and real world applications in collaborative filtering and social network link prediction. All empirical results show our new method outperforms the existing matrix recovery methods.

     
    more » « less
  3. Abstract

    Multimodal integration combines information from different sources or modalities to gain a more comprehensive understanding of a phenomenon. The challenges in multi-omics data analysis lie in the complexity, high dimensionality, and heterogeneity of the data, which demands sophisticated computational tools and visualization methods for proper interpretation and visualization of multi-omics data. In this paper, we propose a novel method, termed Orthogonal Multimodality Integration and Clustering (OMIC), for analyzing CITE-seq. Our approach enables researchers to integrate multiple sources of information while accounting for the dependence among them. We demonstrate the effectiveness of our approach using CITE-seq data sets for cell clustering. Our results show that our approach outperforms existing methods in terms of accuracy, computational efficiency, and interpretability. We conclude that our proposed OMIC method provides a powerful tool for multimodal data analysis that greatly improves the feasibility and reliability of integrated data.

     
    more » « less
  4. Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called “higher-order interactions” that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels — or different interaction types — where clusters corresponds to groups of nodes that frequently participate in the same type of interaction. Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis. 
    more » « less
  5. Bringmann, Karl ; Grohe, Martin ; Puppis, Gabriele ; Svensson, Ola (Ed.)
    This paper considers correlation clustering on unweighted complete graphs. We give a combinatorial algorithm that returns a single clustering solution that is simultaneously O(1)-approximate for all 𝓁_p-norms of the disagreement vector; in other words, a combinatorial O(1)-approximation of the all-norms objective for correlation clustering. This is the first proof that minimal sacrifice is needed in order to optimize different norms of the disagreement vector. In addition, our algorithm is the first combinatorial approximation algorithm for the 𝓁₂-norm objective, and more generally the first combinatorial algorithm for the 𝓁_p-norm objective when 1 < p < ∞. It is also faster than all previous algorithms that minimize the 𝓁_p-norm of the disagreement vector, with run-time O(n^ω), where O(n^ω) is the time for matrix multiplication on n × n matrices. When the maximum positive degree in the graph is at most Δ, this can be improved to a run-time of O(nΔ² log n). 
    more » « less