skip to main content


Title: Multi-Class Imbalanced Graph Convolutional Network Learning
Networked data often demonstrate the Pareto principle (i.e., 80/20 rule) with skewed class distributions, where most vertices belong to a few majority classes and minority classes only contain a handful of instances. When presented with imbalanced class distributions, existing graph embedding learning tends to bias to nodes from majority classes, leaving nodes from minority classes under-trained. In this paper, we propose Dual-Regularized Graph Convolutional Networks (DRGCN) to handle multi-class imbalanced graphs, where two types of regularization are imposed to tackle class imbalanced representation learning. To ensure that all classes are equally represented, we propose a class-conditioned adversarial training process to facilitate the separation of labeled nodes. Meanwhile, to maintain training equilibrium (i.e., retaining quality of fit across all classes), we force unlabeled nodes to follow a similar latent distribution to the labeled nodes by minimizing their difference in the embedding space. Experiments on real-world imbalanced graphs demonstrate that DR-GCN outperforms the state-of-the-art methods in node classification, graph clustering, and visualization.  more » « less
Award ID(s):
1828181 1763452
NSF-PAR ID:
10199469
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)

    Networked data often demonstrate the Pareto principle (i.e., 80/20 rule) with skewed class distributions, where most vertices belong to a few majority classes and minority classes only contain a handful of instances. When presented with imbalanced class distributions, existing graph embedding learning tends to bias to nodes from majority classes, leaving nodes from minority classes under-trained. In this paper, we propose Dual-Regularized Graph Convolutional Networks (DR-GCN) to handle multi-class imbalanced graphs, where two types of regularization are imposed to tackle class imbalanced representation learning. To ensure that all classes are equally represented, we propose a class-conditioned adversarial training process to facilitate the separation of labeled nodes. Meanwhile, to maintain training equilibrium (i.e., retaining quality of fit across all classes), we force unlabeled nodes to follow a similar latent distribution to the labeled nodes by minimizing their difference in the embedding space. Experiments on real-world imbalanced graphs demonstrate that DR-GCN outperforms the state-of-the-art methods in node classification, graph clustering, and visualization.

     
    more » « less
  2. Node classification is of great importance among various graph mining tasks. In practice, real-world graphs generally follow the long-tail distribution, where a large number of classes only consist of limited labeled nodes. Although Graph Neural Networks (GNNs) have achieved significant improvements in node classification, their performance decreases substantially in such a few-shot scenario. The main reason can be attributed to the vast generalization gap between meta-training and meta-test due to the task variance caused by different node/class distributions in meta-tasks (i.e., node-level and class-level variance). Therefore, to effectively alleviate the impact of task variance, we propose a task-adaptive node classification framework under the few-shot learning setting. Specifically, we first accumulate meta-knowledge across classes with abundant labeled nodes. Then we transfer such knowledge to the classes with limited labeled nodes via our proposed task-adaptive modules. In particular, to accommodate the different node/class distributions among meta-tasks, we propose three essential modules to perform node-level, class-level, and task-level adaptations in each meta-task, respectively. In this way, our framework can conduct adaptations to different meta-tasks and thus advance the model generalization performance on meta-test tasks. Extensive experiments on four prevalent node classification datasets demonstrate the superiority of our framework over the state-of-the-art baselines. Our code is provided at https://github.com/SongW-SW/TENT https://github.com/SongW-SW/TENT. 
    more » « less
  3. null (Ed.)
    In traditional graph learning tasks, such as node classification, learning is carried out in a closed-world setting where the number of classes and their training samples are provided to help train models, and the learning goal is to correctly classify unlabeled nodes into classes already known. In reality, due to limited labeling capability and dynamic evolving of networks, some nodes in the networks may not belong to any existing/seen classes, and therefore cannot be correctly classified by closed-world learning algorithms. In this paper, we propose a new open-world graph learning paradigm, where the learning goal is to not only classify nodes belonging to seen classes into correct groups, but also classify nodes not belonging to existing classes to an unseen class. The essential challenge of the openworld graph learning is that (1) unseen class has no labeled samples, and may exist in an arbitrary form different from existing seen classes; and (2) both graph feature learning and prediction should differentiate whether a node may belong to an existing/seen class or an unseen class. To tackle the challenges, we propose an uncertain node representation learning approach, using constrained variational graph autoencoder networks, where the label loss and class uncertainty loss constraints are used to ensure that the node representation learning are sensitive to unseen class. As a result, node embedding features are denoted by distributions, instead of deterministic feature vectors. By using a sampling process to generate multiple versions of feature vectors, we are able to test the certainty of a node belonging to seen classes, and automatically determine a threshold to reject nodes not belonging to seen classes as unseen class nodes. Experiments on real-world networks demonstrate the algorithm performance, comparing to baselines. Case studies and ablation analysis also show the rationale of our design for open-world graph learning. 
    more » « less
  4. For real-world graph data, the node class distribution is inherently imbalanced and long-tailed, which naturally leads to a few-shot learning scenario with limited nodes labeled for newly emerging classes. Existing efforts are carefully designed to solve such a few-shot learning problem via data augmentation, learning transferable initialization, to name a few. However, most, if not all, of them are based on a strong assumption that all the test nodes must exclusively come from novel classes, which is impractical in real-world applications. In this paper, we study a broader and more realistic problem named generalized few-shot node classification, where the test samples can be from both novel classes and base classes. Compared with the standard fewshot node classification, this new problem imposes several unique challenges, including asymmetric classification and inconsistent preference. To counter those challenges, we propose a shot-aware graph neural network (STAGER) equipped with an uncertainty-based weight assigner module for adaptive propagation. To formulate this problem from the meta-learning perspective, we propose a new training paradigm named imbalanced episodic training to ensure the label distribution is consistent between the training and test scenarios. Experiment results on four real-world datasets demonstrate the efficacy of our model, with up to 14% accuracy improvement over baselines. 
    more » « less
  5. In many real-world classification applications such as fake news detection, the training data can be extremely imbalanced, which brings challenges to existing classifiers as the majority classes dominate the loss functions of classifiers. Oversampling techniques such as SMOTE are effective approaches to tackle the class imbalance problem by producing more synthetic minority samples. Despite their success, the majority of existing oversampling methods only consider local data distributions when generating minority samples, which can result in noisy minority samples that do not fit global data distributions or interleave with majority classes. Hence, in this paper, we study the class imbalance problem by simultaneously exploring local and global data information since: (i) the local data distribution could give detailed information for generating minority samples; and (ii) the global data distribution could provide guidance to avoid generating outliers or samples that interleave with majority classes. Specifically, we propose a novel framework GL-GAN, which leverages the SMOTE method to explore local distribution in a learned latent space and employs GAN to capture the global information, so that synthetic minority samples can be generated under even extremely imbalanced scenarios. Experimental results on diverse real data sets demonstrate the effectiveness of our GL-GAN framework in producing realistic and discriminative minority samples for improving the classification performance of various classifiers on imbalanced training data. Our code is available at https://github.com/wentao-repo/GL-GAN. 
    more » « less