<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE Computer Society</publisher>
				<date>06/20/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10467120</idno>
					<idno type="doi"></idno>
					
					<author>Fan Xiran</author><author>Yang Chun-Hao</author><author>C Vemuri Baba</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Hyperbolic neural networks have been popular in the re- cent past due to their ability to represent hierarchical data sets effectively and efficiently. The challenge in develop- ing these networks lies in the nonlinearity of the embed- ding space namely, the Hyperbolic space. Hyperbolic space is a homogeneous Riemannian manifold of the Lorentz group which is a semi-Riemannian manifold, i.e. a mani- fold equipped with an indefinite metric. Most existing meth- ods (with some exceptions) use local linearization to de- fine a variety of operations paralleling those used in tra- ditional deep neural networks in Euclidean spaces. In this paper, we present a novel fully hyperbolic neural network which uses the concept of projections (embeddings) fol- lowed by an intrinsic aggregation and a nonlinearity all within the hyperbolic space. The novelty here lies in the projection which is designed to project data on to a lower- dimensional embedded hyperbolic space and hence leads to a nested hyperbolic space representation independently useful for dimensionality reduction. The main theoretical contribution is that the proposed embedding is proved to be isometric and equivariant under the Lorentz transforma- tions, which are the natural isometric transformations in hyperbolic spaces. This projection is computationally effi- cient since it can be expressed by simple linear operations, and, due to the aforementioned equivariance property, it al- lows for weight sharing. The nested hyperbolic space rep- resentation is the core component of our network and there- fore, we first compare this representation – independent of the network – with other dimensionality reduction methods such as tangent PCA, principal geodesic analysis (PGA) and HoroPCA. Based on this equivariant embedding, we develop a novel fully hyperbolic graph convolutional neural network architecture to learn the parameters of the projec- tion. Finally, we present experiments demonstrating com- parative performance of our network on several publicly available data sets.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Hyperbolic geometry is a centuries old field of non-Euclidean geometry and has recently found its way into the field of machine learning, in particular into deep learning in the form of hyperbolic neural networks (HNNs) or hyperbolic graph convolutional networks (HGCNs) and recently for dimensionality reduction of data embedded in the hyperbolic space. In this paper, we will discuss both problems namely, dimensionality reduction in hyperbolic spaces and HNN architectures. In particular, we will present novel techniques for both these problems. In the following, we present literature review of the two above stated problems and establish the motivation for our work. A word on terminology, we will use the term hyperbolic neural network and hyperbolic graph (convolutional) neural network synonymously in the rest of the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Dimensionality Reduction of Manifold-valued Data</head><p>Dimensionality reduction is a fundamental problem in machine learning with applications in computer vision and many other fields of engineering and sciences. The simplest and most popular method among these is the principal component analysis (PCA), which was proposed more than a century ago (see <ref type="bibr">[22]</ref> for a review and some recent developments on PCA). PCA however is limited to data in vector spaces. For data that are manifold-valued, principal geodesic analysis (PGA) was presented in <ref type="bibr">[10]</ref>, which yields the projection of data onto principal geodesic submanifolds passing through an intrinsic <ref type="bibr">(Fr&#233;chet)</ref> mean (FM) <ref type="bibr">[11]</ref> of the data (assuming it exists). They find the geodesic submanifold of a lower dimension that maximizes the projected variance and computationally, this was achieved via linear approximation, i.e., applying PCA on the tangent space anchored at the FM. This is sometimes referred to as the tangent PCA (tPCA). This approximation however requires the data to be clustered around the FM, otherwise the tangent space approximation to the manifold leads to Figure <ref type="figure">1</ref>. Projections of data from a 2D to 1D hyperbolic space using different dimensionality reduction methods. The results are visualized in the Poincar&#233; disk. Original data (blue dots) lie in a 2D hyperbolic space and have a zero mean (origin of the Poincar&#233; disk). The HoroPCA direction (red dotted line) and the principal geodesic obtained by tPCA (orange dashed line) and EPGA (purple dash-dotted line) fail to capture the main trend of the data since they are restricted to learn a geodesic submanifold passing through the FM. In contrast, our NH model (green solid line), captures the data trend more accurately. The diamond markers on each line represent the reconstructed data from each method. The reconstruction errors for HoroPCA, tPCA, EPGA and the NH model in this example are, 0.1708, 0.1202, 0.1638 and 0.0062 respectively. inaccuracies. Subsequently, <ref type="bibr">[42]</ref> presented the Exact PGA (EPGA) algorithm, which does not use any linear approximation. However, EPGA is computationally expensive as it requires two non-linear optimizations steps per iteration (projection to the geodesic submanifold and finding the new geodesic direction such that the reconstruction error is minimized). Later, authors in <ref type="bibr">[4]</ref> developed a version of EPGA for constant sectional curvature manifolds, namely the hypersphere and the hyperbolic space, by deriving closed form formulae for the projection. There are many variants of PGA and we refer the reader to <ref type="bibr">[1,</ref><ref type="bibr">21,</ref><ref type="bibr">48]</ref> for the details. More recently, Barycentric subspace analysis (BSA) was proposed in <ref type="bibr">[36]</ref> which finds a more general parameterization of a nested sequence of submanifolds via the minimization of unexplained variance. Another useful dimensionality reduction scheme is the Principal curves <ref type="bibr">[18]</ref> and their generalization to Riemannian manifolds <ref type="bibr">[19]</ref> that are more appropriate for certain applications.</p><p>A salient feature of PCA is that it yields nested linear subspaces, i.e., the reduced dimensional principal subspaces form a nested hierarchy. This idea was exploited in <ref type="bibr">[23]</ref> where authors proposed the principal nested spheres (PNS) by embedding an (n 1)-sphere in to an n-sphere, the embedding however is not necessarily isometric. Hence, PNS is more general than PGA in that PNS does not have to be geodesic. Similarly, for the manifold P n of (n &#8677; n) symmetric positive definite (SPD) matrices, authors in <ref type="bibr">[17]</ref> pro-posed a geometry-aware dimensionality reduction by projecting data on P n to P m for some m &#8999; n. More recently, the idea of constructing a nested sequence of manifolds was presented in <ref type="bibr">[47]</ref> where authors unified and generalized the nesting concept to general Riemannian homogeneous manifolds, which form a large class of Riemannian manifolds, including the hypersphere, P n , the Grassmannian, Stiefel manifold, Lie groups, and others. Although the general framework in <ref type="bibr">[47]</ref> seems straightforward and applicable to hyperbolic spaces, many significantly important technical aspects need to be addressed and derived in detail. In this paper, we will present novel derivations suited for the hyperbolic spaces -a projection operator which is proved to yield an isometric embedding, and a proof of equivariance to isometries of the projection operator -which will facilitate the construction of nested hyperbolic spaces and the hyperbolic neural network. Note that there are five models of the hyperbolic space namely, the hyperboloid (Lorentz) model, the Poincar&#233; disk/ball model, the Poincar&#233; half plane model, the Klein model and the Jemisphere model <ref type="bibr">[3]</ref>. All these models are isometrically equivalent but some are better suited than others depending on the application. We choose the Lorentz model of the hyperbolic space with a Lorentzian metric in our work. The choice of this model and the associated metric over other models is motivated by the properties of Riemannian optimization efficiency and numerical stability afforded <ref type="bibr">[7,</ref><ref type="bibr">34]</ref>.</p><p>Most recently, an elegant approach called HoroPCA was proposed in <ref type="bibr">[5]</ref>, for dimensionality reduction in hyperbolic spaces. In particular, the authors represented the hyperbolic space using the Poincar&#233; model and they proposed to generalize the notion of direction and the coordinates in a given direction using ideal points (points at infinity) and the Busemann coordinates (defined using the Busemann function) <ref type="bibr">[2]</ref>. The levels sets of the Busemann function, called the horospheres, resemble the hyperplanes (or affine subspaces) in Euclidean spaces and hence the dimensionality reduction is achieved by a projection that moves points along a horosphere. The data is then projected to a geodesic hull of a base point b and a number of ideal points p 1 , . . . , p K , which is also a geodesic submanifold. This is the key difference between HoroPCA and our proposed method which leads to a significant difference in performance. This is evident from the toy example in Figure <ref type="figure">1</ref> which depicts the reduced dimensional representations obtained by our method in comparison to those from EPGA, HoroPCA, and tPCA. Note that all of the other methods yield submanifold representations that do not capture the data trend accurately, unlike ours. More comprehensive comparisons will be made in a later section.</p><p>To briefly summarize, our first goal in this paper is to present a nested hyperbolic space representation for dimensionality reduction and then demonstrate via synthesized and real datasets, that it achieves a lower reconstruction error in comparison to competing methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Hyperbolic Neural Networks</head><p>Several researchers have demonstrated that the hyperbolic space is apt for modeling hierarchically organized data, for example, graphs and trees <ref type="bibr">[33,</ref><ref type="bibr">38,</ref><ref type="bibr">39]</ref>. Recently, the formalism of Gyrovector spaces (an algebraic structure) <ref type="bibr">[44]</ref> was applied to the hyperbolic space to define basic operations paralleling those in vector spaces and were used to build a hyperbolic neural network (HNN) <ref type="bibr">[13,</ref><ref type="bibr">41]</ref>. The Gyrovector space formalism facilitates performing M&#246;bius additions and subtractions in the Poincar&#233; model of the hyperbolic space. HNNs have been successfully applied to word embeddings <ref type="bibr">[43]</ref> as well as image embeddings <ref type="bibr">[24]</ref>. Additionally, several existing deep network architectures have been modified to suit hyperbolic embeddings of data, e.g., graph networks <ref type="bibr">[6,</ref><ref type="bibr">29]</ref>, attention module <ref type="bibr">[15]</ref>, and variational auto-encoders <ref type="bibr">[30,</ref><ref type="bibr">35]</ref>. These hyperbolic networks were shown to perform comparably or even better than their euclidean counterparts.</p><p>Existing HNNs have achieved moderate to great successes in multiple areas and shown great potential in solving complex problems. However, most of them use tangent space approximations to facilitate the use of vector space operations prevalent in existing neural network architectures. There are however some exceptions, for instance, the authors in <ref type="bibr">[8]</ref> developed what they call a Hyperbolicto-Hyperbolic network and the authors in <ref type="bibr">[7]</ref> also developed a fully Hyperbolic network. They both considered the use of Lorentz transformations on hyperbolic features since the Lorentz transformation matrix acts transitively on a hyperbolic space and thus preserves the global hyperbolic structure. Each Lorentz transformation is a composition of a Lorentz rotation and a rotation free Lorentz transformation called the Lorentz boost operation. Authors in <ref type="bibr">[8]</ref> only use Lorentz rotation for hyperbolic feature transformations while authors in <ref type="bibr">[7]</ref> build a fully-connected layer in hyperbolic space (called a hyperbolic linear layer) parameterized by an arbitrary weight matrix (not necessarily invertible) which is applied to each data point in the hyperbolic space resulting in a mapping from a hyperbolic space to itself. This procedure is ad hoc in the sense that it does not use the intrinsic characterization of the hyperbolic space as a homogeneous space with the isometry group being the Lorentz group.</p><p>Lorentz transformations are however inappropriate for defining projection operations (required for reducing the dimensionality) as they preserve the Lorentz model only when there is no change in dimension. In other words, to find a lower-dimensional hyperbolic space representation for data embedded in a higher-dimensional hyperbolic space, one cannot use Lorentz transformations directly. Hence, we pro-pose to use an isometric embedding operation mentioned in the previous subsection as the building block to design a hyperbolic neural network. We will now briefly summarize our proposed model and the contributions of our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3.">Proposed Model and Contributions</head><p>Inspired by <ref type="bibr">[23]</ref> and <ref type="bibr">[47]</ref>, we construct a nested representation in a hyperbolic space to extract the hyperbolic features. Such a nested (hierarchical) hyperbolic space representation has the advantage that the data in reduced dimensions remains in a hyperbolic space. Hereafter, we refer to these nested hyperbolic spaces as nested hyperboloids (NHs). As a dimensionality reduction method in Riemannian manifolds, the learned lower dimensional submanifold in NH is not required to pass the FM unlike in PGA and need not be a geodesic submanifold as in HoroPCA, PGA or EPGA. In the experiments section, we will demonstrate that this leads to much lower reconstruction error in comparison to the aforementioned dimensionality reduction methods.</p><p>After defining the projection which leads to an embedding within hyperbolic spaces of different dimensions, these projections/embeddings are used to define a feature transformation layer in the hyperbolic space. This layer is then composed with a hyperbolic neighborhood aggregation operation/layer and an appropriate non-linear operations in between namely, the tangent-ReLU, to define a novel nested hyperbolic graph convolutional network (NHGCN) architecture.</p><p>The rest of the paper is organized as follows. In Section 2, we briefly review the geometry of hyperbolic space. In Section 3, we explicitly give the projection and embedding to map data between hyperbolic spaces of different dimensions. We also present a novel hyperbolic graph convolutional neural network architecture based on these projections and the tangent-ReLU activation. In Section 4, we first present the performance of the NH model as a dimensionality reduction method and compare with other competing methods, including EPGA, tPCA and HoroPCA. Next, we compare our NHGCN with other hyperbolic networks in tackling the problems of link prediction and node classification on four graph datasets described and used in <ref type="bibr">[6]</ref>. Finally, we draw conclusions in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries</head><p>In this section, we briefly review relevant concepts of hyperbolic geometry. In this paper, we will regard the hyperbolic space as a homogeneous Riemannian manifold of the Lorentz group and present a few important geometric concepts, including the geodesic distance and the exponential map, in the hyperbolic space, which are used in our work. The materials presented in this section can be found in most textbooks on hyperbolic spaces, for example <ref type="bibr">[3,</ref><ref type="bibr">37]</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Lorentzian Space and Hyperbolic Space</head><p>As mentioned in Section 1, there are several (isometrically) equivalent models of a hyperbolic space, including the Poincar&#233; model, Klein model, the upper-half space model, and the Jemisphere model <ref type="bibr">[3]</ref>. We choose to use the hyperboloid (Lorentz) model of the hyperbolic space in this paper due to its numerical stability property which is very useful for the optimization problem involved in the training and test phases. Our technique is however applicable to all of the models due to the isometric equivalence of the models.</p><p>The (n + 1)-dimensional Lorentzian space R 1,n is the Euclidean space R n+1 equipped with a bilinear form</p><p>where x = [x 0 , x 1 , . . . , x n ] T , y = [y 0 , y 1 , . . . , y n ] T 2 R n+1 . This bilinear form is sometimes referred to as the Lorentzian inner product although it is not positive-definite. We denote the norm, called Lorentzian norm, induced by the Lorentzian inner product by kxk L = p hx, xi L . Note that kxk L is either positive, zero, or positive imaginary.</p><p>We consider the following submanifold of R 1,n</p><p>) This is called the n-dimensional hyperboloid model of one sheet of a hyperbolic space defined in R n+1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Lorentz Transformations</head><p>In the Lorentzian space, the linear isometries are called the Lorentz transformation, i.e. the map : R n+1 ! R n+1 is a Lorentz transformation if h (x), (y)i L = hx, yi L for any x, y 2 R n+1 . It is easy to see that all Lorentz transformations form a group under composition, and this group is denoted by O(1, n), called the Lorentz group. The matrix representation of O(1, n) in R n+1 is defined as follows. Let J n = diag( 1, I n ) where I n is the n &#8677; n identity matrix and diag(&#8226;) denotes a diagonal matrix. Then, </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Riemannian Geometry of Hyperbolic Space</head><p>A commonly used Riemannian metric for L n &#8674; R n+1 is the restriction of the Lorentz inner product to the tangent space of L n . Note that even though the Lorentz inner product is not positive-definite, when restricted to the tangent space of L n , it is positive-definite. Hence, L n is a Riemannian manifold with constant negative sectional curvature. Furthermore, the group of isometries of L n is precisely O + (1, n) and the group of orientation-preserving isometries is SO + (1, n). We now state a few useful facts about the group of isometries that are used in this paper and refer the interested reader to <ref type="bibr">[12]</ref> for details. where the group action is defined as x 7 ! Ax for x 2 L n and A 2 SO + (1, n).</p><p>Fact 2. Let x = [1, 0, . . . , 0] T 2 L n . The isotropy subgroup G x is given by</p><p>where SO(n) is the group of n &#8677; n orthogonal matrices with determinant 1.</p><p>Hence, the hyperbolic space is a homogeneous Riemannian manifold and can be written as a quotient space,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fact 3 ( [31]). A Lorentz transformation</head><p>can be decomposed using a polar decomposition and expressed as</p><p>where R 2 SO(n), v 2 R n and c = p kvk 2 + 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The first component in the decomposition is called a</head><p>Lorentz rotation and the second component a Lorentz boost. See Figure <ref type="figure">2</ref> for example illustrations of the Lorentz rotation and the Lorentz boost respectively. Fact 4. Every Lorentz transformation matrix A 2</p><p>where P, Q 2 SO(n), &#8629; 2 R, I n 1 is the (n 1) &#8677; (n 1) identity matrix.</p><p>The matrix in the middle is the Lorentz boost along the first coordinate axis. This decomposition will be very useful in the optimization problem stated in Section 3.3, equation <ref type="bibr">(14)</ref>.</p><p>We now conclude this section by presenting the explicit closed form formulae for the exponential map and the geodesic distance. For any x 2 L n and v 2 T p L n (the tangent space of L n at x), the exponential map at x is given by Exp</p><p>Since L n is a negatively curved Riemannian manifold, its exponential map is invertible and the inverse of the exponential map, also called the Log map, is given by Log</p><p>where x, y 2 L n and &#10003; is the geodesic distance between x and y given by &#10003; = d L (x, y) = cosh 1 ( hx, yi L ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Nested Hyperbolic Spaces and Networks</head><p>In this section, we first present the construction of nested hyperboloids (NHs); an illustration of the NHs are given in Figure <ref type="figure">3</ref>. We also prove that the proposed NHs possess several nice properties, including the isometry property and the equivariance under the Lorentz transformations. Then we use the NH representations to design a novel graph convolutional network architecture, called Nested Hyperbolic Graph Convolutional Network (NHGCN).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The Nested Hyperboloid Representation</head><p>The key steps to the development of the NHs are the embedding of L m into L n for m &lt; n and the projection from L n to L m . The principle is to define an embedding of the corresponding groups of isometries, SO + (1, m) and</p><p>First, we consider the embedding &#9670;m :</p><p>where</p><p>b, and &#8676; 2 SO + (1, m + 1). The function adapted-GS(&#8226;) is an adaptation of the standard Gram-Schmidt process to orthonormalize vectors with respect to the Lorentz inner product defined earlier.</p><p>The Riemannian submersion (see <ref type="bibr">[20]</ref> for the definition of a Riemannian submersion) &#8673; :</p><p>where</p><p>This class of embeddings is quite general as it includes isometric embeddings as special cases.</p><p>Proposition 1. The embedding &#9670; m : L m ! L m+1 is isometric when r = 0.</p><p>Proof. It follows directly from the definitions of the Lorentz transformation and the geodesic distance on L m . Furthermore, the embedding ( <ref type="formula">9</ref>) is equivariant under Lorentz transformations.</p><p>The projection &#8673; m+1 : L m+1 ! L m corresponding to &#9670; m is given by,</p><p>for x 2 L m+1 . Hence, the reconstructed point</p><p>The unknowns &#8676; = [ &#8676; v] and r can then be obtained by minimizing the reconstruction error</p><p>The projection of x 2 L n into L m for n &gt; m can be obtained via the composition</p><p>where</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Nested Hyperbolic Graph Convolutional Network (NHGCN)</head><p>The Hyperbolic Graph Convolutional Network (HGCN) proposed in <ref type="bibr">[6]</ref> is a generalization of Euclidean Graph Network to a hyperbolic space. There are three different layers in HGCN: feature transformation, neighborhood aggregation and non-linear activation. We use our NH representation to define a hyperbolic feature transformation, the weighted centroid w.r.t the squared Lorentzian distance to define the neighborhood aggregation and use a tangent ReLU activation. This leads to a novel HGCN architecture. Figure <ref type="figure">4</ref> depicts the HGCN architecture. Each of the three distinct layers are described in detail below.</p><p>Hyperbolic Feature Transformation: Given x 2 L n , the hyperbolic feature transformation is defined using <ref type="bibr">(13)</ref> as follows</p><p>where W 2 R (m+1)&#8677;(n+1) . It is easy to prove that y 2 L m . At the l-th layer, the inputs are the hyperbolic representation x l 1 i from the previous layer and the feature transformation matrix is W l . The intermediate hyperbolic representation of i-th node is computed as follows</p><p>Hyperbolic Neighborhood Aggregation: In GCNs, the neighborhood aggregation is used to combine neighboring features by computing the weighted centroid of these features. The weighted centroid in hyperbolic space of a point set {x i } i=1 2 L n is obtained using the weighted Fr&#233;chet mean. However, it does not have closed form expression in hyperbolic space. We use hyperbolic neighborhood aggregation proposed in <ref type="bibr">[7,</ref><ref type="bibr">49]</ref>, where aggregated representation for a node x l i at l-th layer is the weighted centroid &#181; l i of its neighboring nodes {x l j } p j=1 2 L n l w.r.t squared Lorentzian distance, namely</p><p>where &#9003; l j is the weight for x l j and d 2 L (x, y) = 1 hx, yi L is the squared Lorentzian distance <ref type="bibr">[37]</ref>. Authors in <ref type="bibr">[27]</ref> proved that this problem has closed form solution given by,</p><p>Hyperbolic Nonlinear Activation: A nonlinear activation is required in our network since the feature transform is a linear operation. We choose to apply tangent ReLU to prevents our multi-layer network from collapsing into a single layer network. The tangent ReLU in the hyperbolic space is defined as,</p><p>Here 0 = [1, 0, . . . , 0] T 2 L n l (correspond to the origin in the Poincar&#233; model) is chosen as the base point to define the anchor point in the tangent ReLU.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Optimization</head><p>In this section, we will explain how to update parameters in network, i.e. transformation matrix W in <ref type="bibr">(14)</ref>. Instead of updating W directly, we find an alternative way by decomposing W into three matrices using <ref type="bibr">(5)</ref>. More specifically, we write</p><p>R and e P is the first m rows of a P 2 SO(n) which is from a Stiefel manifold <ref type="bibr">[9]</ref>. We can now update these factors sequentially in the optimization (see supplement material for details).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>In this section, we will first evaluate the NH model as a dimensionality reduction method compared with HoroPCA, tPCA and EPGA. We show that the proposed NH model outperforms all of these method on both synthetic data and real data in terms of reconstruction error. Then, we apply the proposed NHGCN to the problems of link prediction and node classification on four graph data sets described in <ref type="bibr">[6]</ref>. Our method yields results that are better or comparable to existing hyperbolic graph networks. The implementations 1 are based on Pymanopt <ref type="bibr">[26]</ref> and GeoTorch <ref type="bibr">[28]</ref> for dimensionality reduction and NHGCN respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Dimensionality Reduction in Hyperbolic Space</head><p>First we present synthetic data experiments followed by experiments on real data.</p><p>Synthetic Experiments As a dimensionality reduction method, we compare the NH model with three other competing methods: tPCA, EPGA, and HoroPCA. Note that the first two are applicable on any Riemannian manifolds and HoroPCA is proposed specifically for hyperbolic spaces as is our NH model. The major difference between NH and the aforementioned methods is that NH does not require the fitted submanifold to pass through the FM whereas the others do. This extra requirement can sometimes lead to failure in capturing the data trend as shown in Figure <ref type="figure">5</ref>.</p><p>Apart from visual inspection, we use the reconstruction error as a measure of the goodness of fit. To see how NH performs in comparison to others under different levels of noise, we generate synthetic data from the wrapped normal distribution <ref type="bibr">[30]</ref> on L 10 with variance ranging from 0.2 to 2. Then we apply different dimensionality reduction methods to reduce the dimension down to 2. The result is 1 <ref type="url">https://github.com/cvgmi/Nested-Hyperbolic-DimReduc-and-HNN</ref>.  shown in Figure <ref type="figure">6</ref>. The results for EPGA and NH are essentially the same since, for wrapped normal distribution, the data is distributed symmetrically around the FM and hence the assumption of submanifold passing through the FM is valid here. Even in this case, we observe a significant improvement of NH over tPCA and HoroPCA especially in the large variance scenario. The main reasons are that (i) tPCA uses local linearization which would lead to inaccuracies when the data is not tightly clustered around the FM and (ii) the HoroPCA seeks to maximize the projected variance on the submanifold, which, as is well known, not equivalent to minimizing the reconstruction error. There is a clear justification for the choice of using reconstruction error as the objective function since, we want a good approximation of the original data with the lower-dimensional representation. For additional experimental results, please see supplementary material.</p><p>Hyperbolic Embeddings of Trees For real data experiments, we consider reducing the dimensionality of trees that are embedded into a hyperbolic space. We validate our method on the four datasets described in <ref type="bibr">[38]</ref> including (i) a fully balanced tree, (ii) a phylogenetic tree, (iii) a biological graph comprising of diseases' relationships, and (iv) a graph of Computer Science (CS) Ph.D. advisor-advisee relationships. We also create two additional datasets by removing some edges in the balanced tree dataset. We apply the method in <ref type="bibr">[14]</ref> to embed the tree datasets into a Poincar&#233; ball of dimension 10 and then apply our NH model  <ref type="bibr">[49]</ref>, the authors did not test their network on the Airport dataset.</p><p>along with other competing methods to reduce the dimension down to 2. The results are reported in Table <ref type="table">1</ref>. In Table <ref type="table">1</ref>, we report the means, the standard deviations of the reconstruction errors and the running time in seconds for EPGA, HoroPCA and NH respectively. From the table, it is evident that our method performs the best and is better than HoroPCA, the SOTA. Specifically, HoroPCA is worse than the tPCA and EPGA in terms of reconstruction error, though it yields higher explained variance as shown in <ref type="bibr">[5]</ref>. The reason might be that HoroPCA seeks projections that maximize the explained variance which is not equivalent to minimizing the reconstruction error in the Riemannian manifold case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Nested Hyperbolic Graph Networks</head><p>To evaluate the power of the proposed NHGCN, we apply it to problems of link prediction (LP) and node classification (NC). We use four public domain datasets: Disease [6], Airport <ref type="bibr">[6]</ref>, PubMed <ref type="bibr">[32]</ref>, and Cora <ref type="bibr">[40]</ref>. We compare our NHGCN with many other graph neural networks and the results are reported in Table <ref type="table">2</ref>. For the LP, we report the means and the standard deviation of the area under the receiver operating characteristic (ROC) curve on the test data; for the problem of NC, we report the mean and the standard deviation of the F1 scores. As evident from the table, our results are comparable to the state-of-the-art (SOTA) and in three cases better. A noteworthy point about our model is that all the operations used in the model are intrinsic to L n unlike the others in Table <ref type="table">2</ref>. Intrinsic operations by definition should yield better accuracy. Thus, we attribute the lower accuracy of our model in Table <ref type="table">2</ref> to the sub-optimal optimization approach used here. This op-timization problem is on a semi-Riemannian manifold, an open problem that will be addressed in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In this paper, we presented a novel dimensionality reduction technique in hyperbolic spaces called the NH representation. NH representation was constructed using a projection operator that was shown to yield isometric embeddings and further was shown to be equivariant to the isometry group admitted by the hyperbolic space. Further, we empirically showed that it yields lower reconstruction error compared to the state-of-the-art (HorroPCA, EPGA, tPCA). Using the NH representation, we developed a novel fully HGCN and tested it on several data sets. Our NHGCN was shown to achieve comparable to superior performance w.r.t. several competing methods.</p></div></body>
		</text>
</TEI>
