skip to main content


Title: Self-Organizing Mappings on the Flag Manifold
A flag is a nested sequence of vector spaces. The type of the flag is determined by the sequence of dimensions of the vector spaces making up the flag. A flag manifold is a manifold whose points parameterize all flags of a particular type in a fixed vector space. This paper provides the mathematical framework necessary for implementing self-organizing mappings on flag manifolds. Flags arise implicitly in many data analysis techniques for instance in wavelet, Fourier, and singular value decompositions. The proposed geometric framework in this paper enables the computation of distances between flags, the computation of geodesics between flags, and the ability to move one flag a prescribed distance in the direction of another flag. Using these operations as building blocks, we implement the SOM algorithm on a flag manifold. The basic algorithm is applied to the problem of parameterizing a set of flags of a fixed type.  more » « less
Award ID(s):
1830676
NSF-PAR ID:
10099059
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in intelligent systems and computing
Volume:
976
ISSN:
2194-5357
Page Range / eLocation ID:
13-22
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A flag is a nested sequence of vector spaces. The type of the flag encodes the sequence of dimensions of the vector spaces making up the flag. A flag manifold is a manifold whose points parameterize all flags of a fixed type in a fixed vector space. This paper provides the mathematical framework necessary for implementing self-organizing mappings on flag manifolds. Flags arise implicitly in many data analysis contexts including wavelet, Fourier, and singular value decompositions. The proposed geometric framework in this paper enables the computation of distances between flags, the computation of geodesics between flags, and the ability to move one flag a prescribed distance in the direction of another flag. Using these operations as building blocks, we implement the SOM algorithm on a flag manifold. The basic algorithm is applied to the problem of parameterizing a set of flags of a fixed type. 
    more » « less
  2. A flag is a nested sequence of vector spaces. The type of the flag encodes the sequence of dimensions of the vector spaces making up the flag. A flag manifold is a manifold whose points parameterize all flags of a fixed type in a fixed vector space. This paper provides the mathematical framework necessary for implementing self-organizing mappings on flag manifolds. Flags arise implicitly in many data analysis contexts including wavelet, Fourier, and singular value decompositions. The proposed geometric framework in this paper enables the computation of distances between flags, the computation of geodesics between flags, and the ability to move one flag a prescribed distance in the direction of another flag. Using these operations as building blocks, we implement the SOM algorithm on a flag manifold. The basic algorithm is applied to the problem of parameterizing a set of flags of a fixed type. 
    more » « less
  3. A fundamental question in many data analysis settings is the problem of discerning the “natural” dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with T points has O(T 2 ) secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible. 
    more » « less
  4. A fundamental question in many data analysis settings is the problem of discerning the ``natural'' dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with $n$ points has $n(n-1)/2$ secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible. 
    more » « less
  5. Given a sequence of random graphs, we address the problem of online monitoring and detection of changes in the underlying data distribution. To this end, we adopt the Random Dot Product Graph (RDPG) model which postulates each node has an associated latent vector, and inner products between these vectors dictate the edge formation probabilities. Existing approaches for graph change-point detection (CPD) rely either on extensive computation, or they store and process the entire observed time series. In this paper we consider the cumulative sum of a judicious monitoring function, which quantifies the discrepancy between the streaming graph observations and the nominal model. This reference distribution is inferred via spectral embeddings of the first few graphs in the sequence, and the monitoring function can be updated in an efficient, online fashion. We characterize the distribution of this running statistic, allowing us to select appropriate thresholding parameters that guarantee error-rate control. The end result is a lightweight online CPD algorithm, with a proven capability to flag distribution shifts in the arriving graphs. The novel method is tested on both synthetic and real network data, corroborating its effectiveness in quickly detecting changes in the input graph sequence. 
    more » « less