We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these specialmore »
Subspace clustering using ensembles of K -subspaces
Abstract Subspace clustering is the unsupervised grouping of points lying near a union of low-dimensional linear subspaces. Algorithms based directly on geometric properties of such data tend to either provide poor empirical performance, lack theoretical guarantees or depend heavily on their initialization. We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the $K$-subspace (KSS) algorithm via the evidence accumulation clustering framework. Our algorithm, referred to as ensemble $K$-subspaces (EKSSs), forms a co-association matrix whose $(i,j)$th entry is the number of times points $i$ and $j$ are clustered together by several runs of KSS with random initializations. We prove general recovery guarantees for any algorithm that forms an affinity matrix with entries close to a monotonic transformation of pairwise absolute inner products. We then show that a specific instance of EKSS results in an affinity matrix with entries of this form, and hence our proposed algorithm can provably recover subspaces under similar conditions to state-of-the-art algorithms. The finding is, to the best of our knowledge, the first recovery guarantee for evidence accumulation clustering and for KSS variants. We show on synthetic data that our method performs well in the traditionally challenging settings of subspaces with more »
- Publication Date:
- NSF-PAR ID:
- 10206908
- Journal Name:
- Information and Inference: A Journal of the IMA
- ISSN:
- 2049-8772
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional Reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by K -tuples of observations.more »
-
Discovering and clustering subspaces in high-dimensional data is a fundamental problem of machine learning with a wide range of applications in data mining, computer vision, and pattern recognition. Earlier methods divided the problem into two separate stages of finding the similarity matrix and finding clusters. Similar to some recent works, we integrate these two steps using a joint optimization approach. We make the following contributions: (i) we estimate the reliability of the cluster assignment for each point before assigning a point to a subspace. We group the data points into two groups of “certain” and “uncertain”, with the assignment ofmore »
-
In the non-negative matrix factorization (NMF) problem, the input is an m×n matrix M with non-negative entries and the goal is to factorize it as M ≈ AW. The m × k matrix A and the k × n matrix W are both constrained to have non-negative entries. This is in contrast to singular value decomposition, where the matrices A and W can have negative entries but must satisfy the orthogonality constraint: the columns of A are orthogonal and the rows of W are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonalitymore »
-
We extend the self-organizing mapping algorithm to the problem of visualizing data on Grassmann manifolds. In this setting, a collection of k points in n-dimensions is represented by a k-dimensional subspace, e.g., via the singular value or QR-decompositions. Data assembled in this way is challenging to visualize given abstract points on the Grassmannian do not reside in Euclidean space. The extension of the SOM algorithm to this geometric setting only requires that distances between two points can be measured and that any given point can be moved towards a presented pattern. The similarity between two points on the Grassmannian ismore »