skip to main content


Title: Long Range Graph Benchmark
Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB)1 with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.  more » « less
Award ID(s):
2039863
NSF-PAR ID:
10426040
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
NeurIPS
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Message passing Graph Neural Networks (GNNs) provide a powerful modeling framework for relational data. However, the expressive power of existing GNNs is upper-bounded by the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test, which means GNNs that are not able to predict node clustering coefficients and shortest path distances, and cannot differentiate between different d regular graphs. Here we develop a class of message passing GNNs, named Identity-aware Graph Neural Networks (ID-GNNs), with greater expressive power than the 1-WL test. ID-GNN offers a minimal but powerful solution to limitations of existing GNNs. ID-GNN extends existing GNN architectures by inductively considering nodes’ identities during message passing. To embed a given node, IDGNN first extracts the ego network centered at the node, then conducts rounds of heterogeneous message passing, where different sets of parameters are applied to the center node than to other surrounding nodes in the ego network. We further propose a simplified but faster version of ID-GNN that injects node identity information as augmented node features. Altogether, both versions of ID GNN represent general extensions of message passing GNNs, where experiments show that transforming existing GNNs to ID-GNNs yields on average 40% accuracy improvement on challenging node, edge, and graph property prediction tasks; 3% accuracy improvement on node and graph classification benchmarks; and 15% ROC AUC improvement on real-world link prediction tasks. Additionally, ID-GNNs demonstrate improved or comparable performance over other task-specific graph networks. 
    more » « less
  2. null (Ed.)
    Message passing Graph Neural Networks (GNNs) provide a powerful modeling framework for relational data. However, the expressive power of existing GNNs is upper-bounded by the 1-Weisfeiler-Lehman (1-WL) graph isomorphism test, which means GNNs that are not able to predict node clustering coefficients and shortest path distances, and cannot differentiate between different d-regular graphs. Here we develop a class of message passing GNNs, named Identity-aware Graph Neural Networks (ID-GNNs), with greater expressive power than the 1-WL test. ID-GNN offers a minimal but powerful solution to limitations of existing GNNs. ID-GNN extends existing GNN architectures by inductively considering nodes’ identities during message passing. To embed a given node, IDGNN first extracts the ego network centered at the node, then conducts rounds of heterogeneous message passing, where different sets of parameters are applied to the center node than to other surrounding nodes in the ego network. We further propose a simplified but faster version of ID-GNN that injects node identity information as augmented node features. Altogether, both versions of ID-GNN represent general extensions of message passing GNNs, where experiments show that transforming existing GNNs to ID-GNNs yields on average 40% accuracy improvement on challenging node, edge, and graph property prediction tasks; 3% accuracy improvement on node and graph classification benchmarks; and 15% ROC AUC improvement on real-world link prediction tasks. Additionally, ID-GNNs demonstrate improved or comparable performance over other task-specific graph networks. 
    more » « less
  3. There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., regularization, adversarial debiasing, and fair contrastive learning). How to achieve fairness in graphs from the model architecture perspective is less explored. More importantly, GNNs exhibit worse fairness performance compared to multilayer perception since their model architecture (i.e., neighbor aggregation) amplifies biases. To this end, we aim to achieve fairness via a new GNN architecture. We propose Fair Message Passing (FMP) designed within a unified optimization framework for GNNs. Notably, FMP explicitly renders sensitive attribute usage in forward propagation for node classification task using cross-entropy loss without data pre-processing. In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together.In this way, FMP scheme can aggregate useful information from neighbors and mitigate bias to achieve better fairness and prediction tradeoff performance. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets. The code is available at https://github.com/zhimengj0326/FMP.

     
    more » « less
  4. Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks. 
    more » « less
  5. Abstract—Materials Genomics initiative has the goal of rapidly synthesizing materials with a given set of desired properties using data science techniques. An important step in this direction is the ability to predict the outcomes of complex chemical reactions. Some graph-based feature learning algorithms have been proposed recently. However, the comprehensive relationship between atoms or structures is not learned properly and not explainable, and multiple graphs cannot be handled. In this paper, chemical reaction processes are formulated as translation processes. Both atoms and edges are mapped to vectors represent- ing the structural information. We employ the graph convolution layers to learn meaningful information of atom graphs, and further employ its variations, message passing networks (MPNN) and edge attention graph convolution network (EAGCN) to learn edge representations. Particularly, multi-view EAGCN groups and maps edges to a set of representations for the properties of the chemical bond between atoms from multiple views. Each bond is viewed from its atom type, bond type, distance and neighbor environment. The final node and edge representations are mapped to a sequence defined by the SMILES of the molecule and then fed to a decoder model with attention. To make full usage of multi-view information, we propose multi-view attention model to handle self correlation inside each atom or edge, and mutual correlation between edges and atoms, both of which are important in chemical reaction processes. We have evaluated our method on the standard benchmark datasets (that have been used by all the prior works), and the results show that edge embedding with multi-view attention achieves superior accuracy compared to existing techniques. 
    more » « less