Graph convolutional neural networks (GCNs) embed nodes in a graph into Euclidean
space, which has been shown to incur a large distortion when embedding
realworld graphs with scalefree or hierarchical structure. Hyperbolic geometry
offers an exciting alternative, as it enables embeddings with much smaller
distortion. However, extending GCNs to hyperbolic geometry presents several
unique challenges because it is not clear how to define neural network operations,
such as feature transformation and aggregation, in hyperbolic space. Furthermore,
since input features are often Euclidean, it is unclear how to transform the features
into hyperbolic embeddings with the right amount of curvature. Here we propose
Hyperbolic Graph Convolutional Neural Network (HGCN), the first inductive
hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic
geometry to learn inductive node representations for hierarchical and scalefree
graphs. We derive GCNs operations in the hyperboloid model of hyperbolic space
and map Euclidean input features to embeddings in hyperbolic spaces with different
trainable curvature at each layer. Experiments demonstrate that HGCN learns
embeddings that preserve hierarchical structure, and leads to improved performance
when compared to Euclidean analogs, even with very low dimensional embeddings:
compared to stateoftheart GCNs, HGCN achieves an error reduction of up to
63.1% in ROC AUC for link prediction and of up to 47.5% in F1 score for node
classification, also improving stateofthe art on the PubMed dataset.
more »
« less
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design
Hyperbolic neural networks have been popular in the re cent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in develop ing these networks lies in the nonlinearity of the embed ding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group which is a semiRiemannian manifold, i.e. a mani fold equipped with an indefinite metric. Most existing meth ods (with some exceptions) use local linearization to de fine a variety of operations paralleling those used in tra ditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) fol lowed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transforma tions, which are the natural isometric transformations in hyperbolic spaces. This projection is computationally effi cient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it al lows for weight sharing. The nested hyperbolic space rep resentation is the core component of our network and there fore, we first compare this representation – independent of the network – with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projec tion. Finally, we present experiments demonstrating com parative performance of our network on several publicly available data sets.
more »
« less
 Award ID(s):
 1724174
 NSFPAR ID:
 10467120
 Publisher / Repository:
 IEEE Computer Society
 Date Published:
 Page Range / eLocation ID:
 356365
 Format(s):
 Medium: X
 Location:
 New Orleans, LA
 Sponsoring Org:
 National Science Foundation
More Like this


A fundamental question in many data analysis settings is the problem of discerning the “natural” dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of SecantAvoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lowerdimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with T points has O(T 2 ) secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secantbased dimensionalityreduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.more » « less

A fundamental question in many data analysis settings is the problem of discerning the ``natural'' dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of SecantAvoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lowerdimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with $n$ points has $n(n1)/2$ secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secantbased dimensionalityreduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.more » « less

Phylogenetic placement, used widely in ecological analyses, seeks to add a new species to an existing tree. A deep learning approach was previously proposed to estimate the distance between query and backbone species by building a map from gene sequences to a highdimensional space that preserves species tree distances. They then use a distancebased placement method to place the queries on that species tree. In this paper, we examine the appropriate geometry for faithfully representing tree distances while embedding gene sequences. Theory predicts that hyperbolic spaces should provide a drastic reduction in distance distortion compared to the conventional Euclidean space. Nevertheless, hyperbolic embedding imposes its own unique challenges related to arithmetic operations, exponentiallygrowing functions, and limited bit precision, and we address these challenges. Our results confirm that hyperbolic embeddings have substantially lower distance errors than Euclidean space. However, these betterestimated distances do not always lead to better phylogenetic placement. We then show that the deep learning framework can be used not just to place on a backbone tree but to update it to obtain a fully resolved tree. With our hyperbolic embedding framework, species trees can be updated remarkably accurately with only a handful of genes.more » « less

Dimensionalityreduction techniques are a fundamental tool for extracting useful information from highdimensional data sets. Because secant sets encode manifold geometry, they are a useful tool for designing meaningful datareduction algorithms. In one such approach, the goal is to construct a projection that maximally avoids secant directions and hence ensures that distinct data points are not mapped too close together in the reduced space. This type of algorithm is based on a mathematical framework inspired by the constructive proof of Whitney's embedding theorem from differential topology. Computing all (unit) secants for a set of points is by nature computationally expensive, thus opening the door for exploitation of GPU architecture for achieving fast versions of these algorithms. We present a polynomialtime datareduction algorithm that produces a meaningful lowdimensional representation of a data set by iteratively constructing improved projections within the framework described above. Key to our algorithm design and implementation is the use of GPUs which, among other things, minimizes the computational time required for the calculation of all secant lines. One goal of this report is to share ideas with GPU experts and to discuss a class of mathematical algorithms that may be of interest to the broader GPU community.more » « less