skip to main content

Title: Real-Time Streaming Graph Embedding Through Local Actions
Recently, considerable research attention has been paid to graph embedding, a popular approach to construct representations of vertices in latent space. Due to the curse of dimensionality and sparsity in graphical datasets, this approach has become indispensable for machine learning tasks over large networks. The majority of the existing literature has considered this technique under the assumption that the network is static. However, networks in many applications, including social networks, collaboration networks, and recommender systems, nodes, and edges accrue to a growing network as streaming. A small number of very recent results have addressed the problem of embedding for dynamic networks. However, they either rely on knowledge of vertex attributes, su er high-time complexity or need to be re-trained without closed-form expression. Thus the approach of adapting the existing methods designed for static networks or dynamic networks to the streaming environment faces non-trivial technical challenges. These challenges motivate developing new approaches to the problems of streaming graph embedding. In this paper, we propose a new framework that is able to generate latent representations for new vertices with high e ciency and low complexity under speci ed iteration rounds. We formulate a constrained optimiza- tion problem for the modi cation of the representation resulting from a stream arrival. We show this problem has no closed-form solution and instead develop an online approximation solution. Our solution follows three steps: (1) identify vertices a ected by newly arrived ones, (2) generating latent features for new vertices, and (3) updating the latent features of the most a ected vertices. The new representations are guaranteed to be feasible in the original constrained optimization problem. Meanwhile, the solution only brings about a small change to existing representations and only slightly changes the value of the objective function. Multi-class clas- si cation and clustering on ve real-world networks demonstrate that our model can e ciently update vertex representations and simultaneously achieve comparable or even better performance compared with model retraining.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Companion Proceedings of the 2019 World Wide Web Conference (WWW ’19 Companion)
Page Range / eLocation ID:
285 to 293
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Graph clustering is a fundamental problem in social network analysis, the goal of which is to group vertices of a graph into a series of densely knitted clusters with each cluster well separated from all the others. Classical graph clustering methods take advantage of the graph topology to model and quantify vertex proximity. With the proliferation of rich graph contents, such as user profiles in social networks, and gene annotations in protein interaction networks, it is essential to consider both the structure and content information of graphs for high-quality graph clustering. In this paper, we propose a graph embedding approach to clustering content-enriched graphs. The key idea is to embed each vertex of a graph into a continuous vector space where the localized structural and attributive information of vertices can be encoded in a unified, latent representation. Specifically, we quantify vertex-wise attribute proximity into edge weights, and employ truncated, attribute-aware random walks to learn the latent representations for vertices. We evaluate our attribute-aware graph embedding method in real-world attributed graphs, and the results demonstrate its effectiveness in comparison with state-of-the-art algorithms. 
    more » « less
  2. null (Ed.)
    In traditional graph learning tasks, such as node classification, learning is carried out in a closed-world setting where the number of classes and their training samples are provided to help train models, and the learning goal is to correctly classify unlabeled nodes into classes already known. In reality, due to limited labeling capability and dynamic evolving of networks, some nodes in the networks may not belong to any existing/seen classes, and therefore cannot be correctly classified by closed-world learning algorithms. In this paper, we propose a new open-world graph learning paradigm, where the learning goal is to not only classify nodes belonging to seen classes into correct groups, but also classify nodes not belonging to existing classes to an unseen class. The essential challenge of the openworld graph learning is that (1) unseen class has no labeled samples, and may exist in an arbitrary form different from existing seen classes; and (2) both graph feature learning and prediction should differentiate whether a node may belong to an existing/seen class or an unseen class. To tackle the challenges, we propose an uncertain node representation learning approach, using constrained variational graph autoencoder networks, where the label loss and class uncertainty loss constraints are used to ensure that the node representation learning are sensitive to unseen class. As a result, node embedding features are denoted by distributions, instead of deterministic feature vectors. By using a sampling process to generate multiple versions of feature vectors, we are able to test the certainty of a node belonging to seen classes, and automatically determine a threshold to reject nodes not belonging to seen classes as unseen class nodes. Experiments on real-world networks demonstrate the algorithm performance, comparing to baselines. Case studies and ablation analysis also show the rationale of our design for open-world graph learning. 
    more » « less
  3. Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

    more » « less
  4. Embedding properties of network realizations of dissipative reduced order models Jörn Zimmerling, Mikhail Zaslavsky,Rob Remis, Shasri Moskow, Alexander Mamonov, Murthy Guddati, Vladimir Druskin, and Liliana Borcea Mathematical Sciences Department, Worcester Polytechnic Institute Abstract Realizations of reduced order models of passive SISO or MIMO LTI problems can be transformed to tridiagonal and block-tridiagonal forms, respectively, via dierent modications of the Lanczos algorithm. Generally, such realizations can be interpreted as ladder resistor-capacitor-inductor (RCL) networks. They gave rise to network syntheses in the rst half of the 20th century that was at the base of modern electronics design and consecutively to MOR that tremendously impacted many areas of engineering (electrical, mechanical, aerospace, etc.) by enabling ecient compression of the underlining dynamical systems. In his seminal 1950s works Krein realized that in addition to their compressing properties, network realizations can be used to embed the data back into the state space of the underlying continuum problems. In more recent works of the authors Krein's ideas gave rise to so-called nite-dierence Gaussian quadrature rules (FDGQR), allowing to approximately map the ROM state-space representation to its full order continuum counterpart on a judicially chosen grid. Thus, the state variables can be accessed directly from the transfer function without solving the full problem and even explicit knowledge of the PDE coecients in the interior, i.e., the FDGQR directly learns" the problem from its transfer function. This embedding property found applications in PDE solvers, inverse problems and unsupervised machine learning. Here we show a generalization of this approach to dissipative PDE problems, e.g., electromagnetic and acoustic wave propagation in lossy dispersive media. Potential applications include solution of inverse scattering problems in dispersive media, such as seismic exploration, radars and sonars. To x the idea, we consider a passive irreducible SISO ROM fn(s) = Xn j=1 yi s + σj , (62) assuming that all complex terms in (62) come in conjugate pairs. We will seek ladder realization of (62) as rjuj + vj − vj−1 = −shˆjuj , uj+1 − uj + ˆrj vj = −shj vj , (63) for j = 0, . . . , n with boundary conditions un+1 = 0, v1 = −1, and 4n real parameters hi, hˆi, ri and rˆi, i = 1, . . . , n, that can be considered, respectively, as the equivalent discrete inductances, capacitors and also primary and dual conductors. Alternatively, they can be viewed as respectively masses, spring stiness, primary and dual dampers of a mechanical string. Reordering variables would bring (63) into tridiagonal form, so from the spectral measure given by (62 ) the coecients of (63) can be obtained via a non-symmetric Lanczos algorithm written in J-symmetric form and fn(s) can be equivalently computed as fn(s) = u1. The cases considered in the original FDGQR correspond to either (i) real y, θ or (ii) real y and imaginary θ. Both cases are covered by the Stieltjes theorem, that yields in case (i) real positive h, hˆ and trivial r, rˆ, and in case (ii) real positive h,r and trivial hˆ,rˆ. This result allowed us a simple interpretation of (62) as the staggered nite-dierence approximation of the underlying PDE problem [2]. For PDEs in more than one variables (including topologically rich data-manifolds), a nite-dierence interpretation is obtained via a MIMO extensions in block form, e.g., [4, 3]. The main diculty of extending this approach to general passive problems is that the Stieltjes theory is no longer applicable. Moreover, the tridiagonal realization of a passive ROM transfer function (62) via the ladder network (63) cannot always be obtained in port-Hamiltonian form, i.e., the equivalent primary and dual conductors may change sign [1]. 100 Embedding of the Stieltjes problems, e.g., the case (i) was done by mapping h and hˆ into values of acoustic (or electromagnetic) impedance at grid cells, that required a special coordinate stretching (known as travel time coordinate transform) for continuous problems. Likewise, to circumvent possible non-positivity of conductors for the non-Stieltjes case, we introduce an additional complex s-dependent coordinate stretching, vanishing as s → ∞ [1]. This stretching applied in the discrete setting induces a diagonal factorization, removes oscillating coecients, and leads to an accurate embedding for moderate variations of the coecients of the continuum problems, i.e., it maps discrete coecients onto the values of their continuum counterparts. Not only does this embedding yields an approximate linear algebraic algorithm for the solution of the inverse problems for dissipative PDEs, it also leads to new insight into the properties of their ROM realizations. We will also discuss another approach to embedding, based on Krein-Nudelman theory [5], that results in special data-driven adaptive grids. References [1] Borcea, Liliana and Druskin, Vladimir and Zimmerling, Jörn, A reduced order model approach to inverse scattering in lossy layered media, Journal of Scientic Computing, V. 89, N1, pp. 136,2021 [2] Druskin, Vladimir and Knizhnerman, Leonid, Gaussian spectral rules for the three-point second dierences: I. A two-point positive denite problem in a semi-innite domain, SIAM Journal on Numerical Analysis, V. 37, N 2, pp.403422, 1999 [3] Druskin, Vladimir and Mamonov, Alexander V and Zaslavsky, Mikhail, Distance preserving model order reduction of graph-Laplacians and cluster analysis, Druskin, Vladimir and Mamonov, Alexander V and Zaslavsky, Mikhail, Journal of Scientic Computing, V. 90, N 1, pp 130, 2022 [4] Druskin, Vladimir and Moskow, Shari and Zaslavsky, Mikhail LippmannSchwingerLanczos algorithm for inverse scattering problems, Inverse Problems, V. 37, N. 7, 2021, [5] Mark Adolfovich Nudelman The Krein String and Characteristic Functions of Maximal Dissipative Operators, Journal of Mathematical Sciences, 2004, V 124, pp 49184934 Go back to Plenary Speakers Go back to Speakers Go back 
    more » « less
  5. Network alignment (NA) is a fundamental problem in many application domains – from social networks, through biology and communications, to neuroscience. The main objective is to identify common nodes and most similar connections across multiple networks (resp. graphs). Many of the existing efforts focus on efficient anchor node linkage by leveraging various features and optimizing network mapping functions with the pairwise similarity between anchor nodes. Despite the recent advances, there still exist two kinds of challenges: (1) entangled node embeddings, arising from the contradictory goals of NA: embedding proximal nodes in a closed form for representation in a single network vs. discriminating among them when mapping the nodes across networks; and (2) lack of interpretability about the node matching and alignment, essential for understanding prediction tasks. We propose dNAME (disentangled Network Alignment with Matching Explainability) – a novel solution for NA in heterogeneous networks settings, based on a matching technique that embeds nodes in a disentangled and faithful manner. The NA task is cast as an adversarial optimization problem which learns a proximity-preserving model locally around the anchor nodes, while still being discriminative. We also introduce a method to explain our semi-supervised model with the theory of robust statistics, by tracing the importance of each anchor node and its explanations on the NA performance. This is extensible to many other NA methods, as it provides model interpretability. Experiments conducted on several public datasets show that dNAME outperforms the state-of-the-art methods in terms of both network alignment precision and node matching ranking. 
    more » « less