Different to traditional clustering methods that deal with one single type of data, High-Order Co- Clustering (HOCC) aims to cluster multiple types of data simultaneously by utilizing the inter- or/and intra-type relationships across different data types. In existing HOCC methods, data points routinely enter the objective functions with squared residual errors. As a result, outlying data samples can dominate the objective functions, which may lead to incorrect clustering results. Moreover, existing methods usually suffer from soft clustering, where the probabilities to different groups can be very close. In this paper, we propose an L1 -norm symmetric nonnegative matrix tri-factorization method to solve the HOCC problem. Due to the orthogonal constraints and the symmetric L1 -norm formulation in our new objective, conventional auxiliary function approach no longer works. Thus we derive the solution algorithm using the alternating direction method of multipliers. Extensive experiments have been conducted on a real world data set, in which promising empirical results, including less time consumption, strictly orthogonal membership matrix, lower local minima etc., have demonstrated the effectiveness of our proposed method.
Laplacian Embedding (LE) is a powerful method to reveal the intrinsic geometry of high-dimensional data by using graphs. Imposing the orthogonal and nonnegative constraints onto the LE objective has proved to be effective to avoid degenerate and negative solutions, which, though, are challenging to achieve simultaneously because they are nonlinear and nonconvex. In addition, recent studies have shown that using the p-th order of the L2-norm distances in LE can find the best solution for clustering and promote the robustness of the embedding model against outliers, although this makes the optimization objective nonsmooth and difficult to efficiently solve in general. In this work, we study LE that uses the p-th order of the L2-norm distances and satisfies both orthogonal and nonnegative constraints. We introduce a novel smoothed iterative reweighted method to tackle this challenging optimization problem and rigorously analyze its convergence. We demonstrate the effectiveness and potential of our proposed method by extensive empirical studies on both synthetic and real data sets.
more » « less- PAR ID:
- 10129598
- Date Published:
- Journal Name:
- Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
- Page Range / eLocation ID:
- 4040 to 4046
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Metric Learning, which aims at learning a distance metric for a given data set, plays an important role in measuring the distance or similarity between data objects. Due to its broad usefulness, it has attracted a lot of interest in machine learning and related areas in the past few decades. This paper proposes to learn the distance metric from the side information in the forms of must-links and cannot-links. Given the pairwise constraints, our goal is to learn a Mahalanobis distance that minimizes the ratio of the distances of the data pairs in the must-links to those in the cannot-links. Different from many existing papers that use the traditional squared L2-norm distance, we develop a robust model that is less sensitive to data noise or outliers by using the not-squared L2-norm distance. In our objective, the orthonormal constraint is enforced to avoid degenerate solutions. To solve our objective, we have derived an efficient iterative solution algorithm. We have conducted extensive experiments, which demonstrated the superiority of our method over state-of-the-art.
-
Linear discriminant analysis (LDA) is widely used for dimensionality reduction under supervised learning settings. Traditional LDA objective aims to minimize the ratio of squared Euclidean distances that may not perform optimally on noisy data sets. Multiple robust LDA objectives have been proposed to address this problem, but their implementations have two major limitations. One is that their mean calculations use the squared l2-norm distance to center the data, which is not valid when the objective does not use the Euclidean distance. The second problem is that there is no generalized optimization algorithm to solve different robust LDA objectives. In addition, most existing algorithms can only guarantee the solution to be locally optimal, rather than globally optimal. In this paper, we review multiple robust loss functions and propose a new and generalized robust objective for LDA. Besides, to better remove the mean value within data, our objective uses an optimal way to center the data through learning. As one important algorithmic contribution, we derive an efficient iterative algorithm to optimize the resulting non-smooth and non-convex objective function. We theoretically prove that our solution algorithm guarantees that both the objective and the solution sequences converge to globally optimal solutions at a sub-linear convergence rate. The experimental results demonstrate the effectiveness of our new method, achieving significant improvements compared to the other competing methods.more » « less
-
Visual place recognition is essential for large-scale simultaneous localization and mapping (SLAM). Long-term robot operations across different time of the days, months, and seasons introduce new challenges from significant environment appearance variations. In this paper, we propose a novel method to learn a location representation that can integrate the semantic landmarks of a place with its holistic representation. To promote the robustness of our new model against the drastic appearance variations due to long-term visual changes, we formulate our objective to use non-squared ℓ2-norm distances, which leads to a difficult optimization problem that minimizes the ratio of the ℓ2,1-norms of matrices. To solve our objective, we derive a new efficient iterative algorithm, whose convergence is rigorously guaranteed by theory. In addition, because our solution is strictly orthogonal, the learned location representations can have better place recognition capabilities. We evaluate the proposed method using two large-scale benchmark data sets, the CMU-VL and Nordland data sets. Experimental results have validated the effectiveness of our new method in long-term visual place recognition applications.more » « less
-
The COVID-19 pandemic caused by SARS-CoV-2 has emphasized the importance of studying virus-host protein-protein interactions (PPIs) and drug-target interactions (DTIs) to discover effective antiviral drugs. While several computational algorithms have been developed for this purpose, most of them overlook the interplay pathways during infection along PPIs and DTIs. In this paper, we present a novel multipartite graph learning approach to uncover hidden binding affinities in PPIs and DTIs. Our method leverages a comprehensive biomolecular mechanism network that integrates protein-protein, genetic, and virus-host interactions, enabling us to learn a new graph that accurately captures the underlying connected components. Notably, our method identifies clustering structures directly from the new graph, eliminating the need for post-processing steps. To mitigate the detrimental effects of noisy or outlier data in sparse networks, we propose a robust objective function that incorporates the L2,p-norm and a constraint based on the pth-order Ky-Fan norm applied to the graph Laplacian matrix. Additionally, we present an efficient optimization method tailored to our framework. Experimental results demonstrate the superiority of our approach over existing state-of-the-art techniques, as it successfully identifies potential repurposable drugs for SARS-CoV-2, offering promising therapeutic options for COVID-19 treatment.more » « less