skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling
Diffusion-based graph generative models are effective in generating high-quality small graphs. However, it is hard to scale them to large graphs that contain thousands of nodes. In this work, we propose EDGE, a new diffusion-based graph generative model that addresses generative tasks for large graphs. The model is developed by reversing a discrete diffusion process that randomly removes edges until obtaining an empty graph. It leverages graph sparsity in the diffusion process to improve computational efficiency. In particular, EDGE only focuses on a small portion of graph nodes and only adds edges between these nodes. Without compromising modeling ability, it makes much fewer edge predictions than previous diffusion-based generative models. Furthermore, EDGE can explicitly model the node degrees of training graphs and then gain performance improvement in capturing graph statistics. The empirical study shows that EDGE is much more efficient than competing methods and can generate large graphs with thousands of nodes. It also outperforms baseline models in generation quality: graphs generated by the proposed model have graph statistics more similar to those of training graphs.  more » « less
Award ID(s):
2239869
NSF-PAR ID:
10496782
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
JMLR.org
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Location:
Honolulu, HI
Sponsoring Org:
National Science Foundation
More Like this
  1. Diffusion-based graph generative models are effective in generating high-quality small graphs. However, it is hard to scale them to large graphs that contain thousands of nodes. In this work, we propose EDGE, a new diffusion-based graph generative model that addresses generative tasks for large graphs. The model is developed by reversing a discrete diffusion process that randomly removes edges until obtaining an empty graph. It leverages graph sparsity in the diffusion process to improve computational efficiency. In particular, EDGE only focuses on a small portion of graph nodes and only adds edges between these nodes. Without compromising modeling ability, it makes much fewer edge predictions than previous diffusion-based generative models. Furthermore, EDGE can explicitly model the node degrees of training graphs and then gain performance improvement in capturing graph statistics. The empirical study shows that EDGE is much more efficient than competing methods and can generate large graphs with thousands of nodes. It also outperforms baseline models in generation quality: graphs generated by the proposed model have graph statistics more similar to those of training graphs. 
    more » « less
  2. Abstract

    In this work, we explore multiplex graph (networks with different types of edges) generation with deep generative models. We discuss some of the challenges associated with multiplex graph generation that make it a more difficult problem than traditional graph generation. We propose TenGAN, the first neural network for multiplex graph generation, which greatly reduces the number of parameters required for multiplex graph generation. We also propose 3 different criteria for evaluating the quality of generated graphs: a graph-attribute-based, a classifier-based, and a tensor-based method. We evaluate its performance on 4 datasets and show that it generally performs better than other existing statistical multiplex graph generative models. We also adapt HGEN, an existing deep generative model for heterogeneous information networks, to work for multiplex graphs and show that our method generally performs better.

     
    more » « less
  3. Why do many modern neural-network-based graph generative models fail to reproduce typical real-world network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic Erdös-Rényi and stochastic block models, as well as modern generative models such as NetGAN, variational graph autoencoders, and CELL. We prove that subject to a bounded overlap condition, which ensures that the model does not simply memorize a single graph, edge independent models are inherently limited in their ability to generate graphs with high triangle and other subgraph densities. Notably, such high densities are known to appear in real-world social networks and other graphs. We complement our negative results with a simple generative model that balances overlap and accuracy, performing comparably to more complex models in reconstructing many graph statistics. 
    more » « less
  4. null (Ed.)
    This paper proposes a scalable multilevel framework for the spectral embedding of large undirected graphs. The proposed method first computes much smaller yet sparse graphs while preserving the key spectral (structural) properties of the original graph, by exploiting a nearly-linear time spectral graph coarsening approach. Then, the resultant spectrally-coarsened graphs are leveraged for the development of much faster algorithms for multilevel spectral graph embedding (clustering) as well as visualization of large data sets. We conducted extensive experiments using a variety of large graphs and datasets and obtained very promising results. For instance, we are able to coarsen the "coPapersCiteseer" graph with 0.43 million nodes and 16 million edges into a much smaller graph with only 13K (32X fewer) nodes and 17K (950X fewer) edges in about 16 seconds; the spectrally-coarsened graphs allow us to achieve up to 1,100X speedup for multilevel spectral graph embedding (clustering) and up to 60X speedup for t-SNE visualization of large data sets. 
    more » « less
  5. null (Ed.)
    Abstract Directed acyclic graphical models are widely used to represent complex causal systems. Since the basic task of learning such a model from data is NP-hard, a standard approach is greedy search over the space of directed acyclic graphs or Markov equivalence classes of directed acyclic graphs. As the space of directed acyclic graphs on p nodes and the associated space of Markov equivalence classes are both much larger than the space of permutations, it is desirable to consider permutation-based greedy searches. Here, we provide the first consistency guarantees, both uniform and high-dimensional, of a greedy permutation-based search. This search corresponds to a simplex-like algorithm operating over the edge-graph of a subpolytope of the permutohedron, called a directed acyclic graph associahedron. Every vertex in this polytope is associated with a directed acyclic graph, and hence with a collection of permutations that are consistent with the directed acyclic graph ordering. A walk is performed on the edges of the polytope maximizing the sparsity of the associated directed acyclic graphs. We show via simulated and real data that this permutation search is competitive with current approaches. 
    more » « less