Graph neural networks (GNNs) have shown remarkable performance on diverse graph mining tasks. While sharing the same message passing framework, our study shows that different GNNs learn distinct knowledge from the same graph. This implies potential performance improvement by distilling the complementary knowledge from multiple models. However, knowledge distillation (KD) transfers knowledge from high-capacity teachers to a lightweight student, which deviates from our scenario: GNNs are often shallow. To transfer knowledge effectively, we need to tackle two challenges: how to transfer knowledge from compact teachers to a student with the same capacity; and, how to exploit student GNN's own learning ability. In this paper, we propose a novel adaptive KD framework, called BGNN, which sequentially transfers knowledge from multiple GNNs into a student GNN. We also introduce an adaptive temperature module and a weight boosting module. These modules guide the student to the appropriate knowledge for effective learning. Extensive experiments have demonstrated the effectiveness of BGNN. In particular, we achieve up to 3.05% improvement for node classification and 6.35% improvement for graph classification over vanilla GNNs. 
                        more » 
                        « less   
                    
                            
                            RELIANT: Fair Knowledge Distillation for Graph Neural Networks
                        
                    
    
            Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility. Open-source code can be found at https://github.com/yushundong/RELIANT. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10414123
- Date Published:
- Journal Name:
- Proceedings of the 2023 SIAM International Conference on Data Mining
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Graph Neural Networks (GNNs) have shown superior performance in analyzing attributed networks in various web-based applications such as social recommendation and web search. Nevertheless, in high-stake decision-making scenarios such as online fraud detection, there is an increasing societal concern that GNNs could make discriminatory decisions towards certain demographic groups. Despite recent explorations on fair GNNs, these works are tailored for a specific GNN model. However, myriads of GNN variants have been proposed for different applications, and it is costly to fine-tune existing debiasing algorithms for each specific GNN architecture. Different from existing works that debias GNN models, we aim to debias the input attributed network to achieve fairer GNNs through feeding GNNs with less biased data. Specifically, we propose novel definitions and metrics to measure the bias in an attributed network, which leads to the optimization objective to mitigate bias. We then develop a framework EDITS to mitigate the bias in attributed networks while maintaining the performance of GNNs in downstream tasks. EDITS works in a model-agnostic manner, i.e., it is independent of any specific GNN. Experiments demonstrate the validity of the proposed bias metrics and the superiority of EDITS on both bias mitigation and utility maintenance. Open-source implementation: https://github.com/yushundong/EDITS.more » « less
- 
            Graph Neural Networks (GNNs) have shown satisfying performance in various graph analytical problems. Hence, they have become the de facto solution in a variety of decision-making scenarios. However, GNNs could yield biased results against certain demographic subgroups. Some recent works have empirically shown that the biased structure of the input network is a significant source of bias for GNNs. Nevertheless, no studies have systematically scrutinized which part of the input network structure leads to biased predictions for any given node. The low transparency on how the structure of the input network influences the bias in GNN outcome largely limits the safe adoption of GNNs in various decision-critical scenarios. In this paper, we study a novel research problem of structural explanation of bias in GNNs. Specifically, we propose a novel post-hoc explanation framework to identify two edge sets that can maximally account for the exhibited bias and maximally contribute to the fairness level of the GNN prediction for any given node, respectively. Such explanations not only provide a comprehensive understanding of bias/fairness of GNN predictions but also have practical significance in building an effective yet fair GNN model. Extensive experiments on real-world datasets validate the effectiveness of the proposed framework towards delivering effective structural explanations for the bias of GNNs. Open-source code can be found at https://github.com/yushundong/REFEREE.more » « less
- 
            Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the largescale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for taskspecific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).more » « less
- 
            Brain-Computer interfaces (BCIs) are typically designed to be lightweight and responsive in real-time to provide users timely feedback. Classical feature engineering is computationally efficient but has low accuracy, whereas the recent neural networks (DNNs) improve accuracy but are computationally expensive and incur high latency. As a promising alternative, the low-dimensional computing (LDC) classifier based on vector symbolic architecture (VSA), achieves small model size yet higher accuracy than classical feature engineering methods. However, its accuracy still lags behind that of modern DNNs, making it challenging to process complex brain signals. To improve the accuracy of a small model, knowledge distillation is a popular method. However, maintaining a constant level of distillation between the teacher and student models may not be the best way for a growing student during its progressive learning stages. In this work, we propose a simple scheduled knowledge distillation method based on curriculum data order to enable the student to gradually build knowledge from the teacher model, controlled by an scheduler. Meanwhile, we employ the LDC/VSA as the student model to enhance the on-device inference efficiency for tiny BCI devices that demand low latency. The empirical results have demonstrated that our approach achieves better tradeoff between accuracy and hardware efficiency compared to other methods.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    