skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: S-Isomap++: Multi manifold learning from streaming data
Manifold learning based methods have been widely used for non-linear dimensionality reduction (NLDR). However, in many practical settings, the need to process streaming data is a challenge for such methods, owing to the high computational complexity involved. Moreover, most methods operate under the assumption that the input data is sampled from a single manifold, embedded in a high dimensional space. We propose a method for streaming NLDR when the observed data is either sampled from multiple manifolds or irregularly sampled from a single manifold. We show that existing NLDR methods, such as Isomap, fail in such situations, primarily because they rely on smoothness and continuity of the underlying manifold, which is violated in the scenarios explored in this paper. However, the proposed algorithm is able to learn effectively in presence of multiple, and potentially intersecting, manifolds, while allowing for the input data to arrive as a massive stream.  more » « less
Award ID(s):
1651475
PAR ID:
10064701
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Bigdata 2017
Page Range / eLocation ID:
716 to 725
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case of two-dimensional surfaces with predefined meshes. In this work, we present practical schemes, based on the theory of diffusion maps, for implementing the manifold scattering transform to datasets arising in naturalistic systems, such as single cell genetics, where the data is a high-dimensional point cloud modeled as lying on a low-dimensional manifold. We show that our methods are effective for signal classification and manifold classification tasks. 
    more » « less
  2. We propose a novel, angle-based path metric for the multi-manifold clustering problem. This metric, which we call the largest-angle path distance (LAPD), is computed as a bottleneck path distance in a graph constructed on d-simplices of data points. When data is sampled from a collection of d-dimensional manifolds which may intersect, the method can cluster the manifolds with high accuracy and automatically detect how many manifolds are present. By leveraging fast approximation schemes for bottleneck distance, this method exhibits quasi-linear computational complexity in the number of data points. In addition to being highly scalable, the method outperforms existing algorithms in numerous numerical experiments on intersecting manifolds, and exhibits robustness with respect to noise and curvature in the data. 
    more » « less
  3. Gaussian processes (GPs) are very widely used for modeling of unknown functions or surfaces in applications ranging from regression to classification to spatial processes. Although there is an increasingly vast literature on applications, methods, theory and algorithms related to GPs, the overwhelming majority of this literature focuses on the case in which the input domain corresponds to a Euclidean space. However, particularly in recent years with the increasing collection of complex data, it is commonly the case that the input domain does not have such a simple form. For example, it is common for the inputs to be restricted to a non-Euclidean manifold, a case which forms the motivation for this article. In particular, we propose a general extrinsic framework for GP modeling on manifolds, which relies on embedding of the manifold into a Euclidean space and then constructing extrinsic kernels for GPs on their images. These extrinsic Gaussian processes (eGPs) are used as prior distributions for unknown functions in Bayesian inferences. Our approach is simple and general, and we show that the eGPs inherit fine theoretical properties from GP models in Euclidean spaces. We consider applications of our models to regression and classification problems with predictors lying in a large class of manifolds, including spheres, planar shape spaces, a space of positive definite matrices, and Grassmannians. Our models can be readily used by practitioners in biological sciences for various regression and classification problems, such as disease diagnosis or detection. Our work is also likely to have impact in spatial statistics when spatial locations are on the sphere or other geometric spaces. 
    more » « less
  4. Image datasets in specialized fields of science, such as biomedicine, are typically smaller than traditional machine learning datasets. As such, they present a problem for training many models. To address this challenge, researchers often attempt to incorporate priors, i.e., external knowledge, to help the learning procedure. Geometric priors, for example, offer to restrict the learning process to the manifold to which the data belong. However, learning on manifolds is sometimes computationally intensive to the point of being prohibitive. Here, we ask a provocative question: is machine learning on manifolds really more accurate than its linear counterpart to the extent that it is worth sacrificing significant speedup in computation? We answer this question through an extensive theoretical and experimental study of one of the most common learning methods for manifold-valued data: geodesic regression. 
    more » « less
  5. The shape and orientation of data clouds reflect variability in observations that can confound pattern recognition systems. Subspace methods, utilizing Grassmann manifolds, have been a great aid in dealing with such variability. However, this usefulness begins to falter when the data cloud contains sufficiently many outliers corresponding to stray elements from another class or when the number of data points is larger than the number of features. We illustrate how nested subspace methods, utilizing flag manifolds, can help to deal with such additional confounding factors. Flag manifolds, which are parameter spaces for nested sequences of subspaces, are a natural geometric generalization of Grassmann manifolds. We utilize and extend known algorithms for determining the minimal length geodesic, the initial direction generating the minimal length geodesic, and the distance between any pair of points on a flag manifold. The approach is illustrated in the context of (hyper) spectral imagery showing the impact of ambient dimension, sample dimension, and flag structure. 
    more » « less