NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Clustering-Augmented Fraud Detection on Graphs Using Label-Aware Feature Aggregation

Jing, Shixiong; Chen, Lingwei; Wu, Dinghao (December 2024, The 16th Asian Conference on Machine Learning (Conference Track))

Fraud detection has emerged as a pivotal process in different fields (e.g., e-commerce, social networks). Since interactions among entities provide valuable insights into fraudulent activities, such behaviors can be naturally represented as graphs, where graph neural networks (GNNs) have been developed as prominent models to boost the efficacy of fraud detection. However, the application of GNNs in this domain encounters significant challenges, primarily due to class imbalance and a mixture of homophily and heterophily of fraud graphs. To address these challenges, in this paper, we propose LACA, which implements fraud detection on graphs using Label-Aware feature aggregation to advance GNN training, which is regularized by Clustering Augmented optimization. Specifically, label-aware feature aggregation simplifies adaptive aggregation in homophily-heterophily mixed neighborhoods, preventing gradient domination by legitimate nodes and mitigating class imbalance in message passing. Clustering-augmented optimization provides fine-grained subclass semantics to improve detection performance, and yields additional benefit in addressing class imbalance. Extensive experiments on four fraud datasets demonstrate that LACA can significantly improve fraud detection performance on graphs with different imbalance ratios and homophily ratios, outperforming state-of-the-art GNN models.
more » « less
Full Text Available
DRILL: Dual-Reasoning Large Language Models for Phishing Email Detection with Limited Data

Greenewald, Calvin; Ashmore, Bradley; Poon, Chien-Sing; Chen, Lingwei (December 2024, International Conference on Neural Information Processing)

As phishing emails pose a growing threat to individuals and organizations alike, there is an urgent need to develop more accurate detection methods. Large Language Models (LLMs) have recently garnered major attention in this line of research; however, they often require large-scale data for fine-tuning, which is impractical in real-world application scenarios. This paper proposes DRILL, a new simple and efficient mechanism, for dual-reasoning LLMs to detect phishing emails with extremely small data. DRILL distills the reasoning ability from an LLM into a target small LM model, while integrating trainable perturbations to manipulate the inputs, which in turn adaptively enhances the inference ability of the target LM. Extensive experiments are conducted on multiple real-world email datasets, and the evaluation results demonstrate that DRILL can benefit from dual LMs, which significantly reduces training parameters and data required, while maintaining state-of-the-art performance in phishing email detection with limited data.
more » « less
Full Text Available
Leveraging Homophily-Augmented Energy Propagation for Bot Detection on Graphs

Ashmore, Bradley; Chen, Lingwei (July 2024, International Conference on Database Systems for Advanced Applications, Springer Nature Singapore)

As the developers of malware continuously evolve their attacks and infection methods, so to must bot detection methods advance. Graph Neural Networks (GNNs) have emerged as a promising detection method. However, in most cases communications graphs reflecting bot-infected networks are plagued with class imbalance and a high level of heterophily. Graph oversampling techniques employed to tackle class imbalance on graphs have drawbacks, such as introducing noisy topological structures or exacerbating heterophily within the graph. Out-of-distribution detection (ODD) is considered as an alternative solution to address data imbalance issues, but when applied to graphs, it assumes that the underlying graph structure does not interfere with the learning of data distributions. In this paper, we present the first application of ODD methods for bot detection in a network. We propose a new energy-based ODD model, which surpasses existing ODD methods, including those tailored for ODD on graph data, and effectively mitigates performance degradation caused by graph heterophily. We substantiate our claims through extensive experiments on the TON IoT dataset, which comprises real captured bot data. The experimental results demonstrate that our model achieves state-of-the-art performance in bot detection on graphs with high graph heterophily and extreme class imbalance.
more » « less
Full Text Available
H^2GNN: Graph Neural Networks with Homophilic and Heterophilic Feature Aggregations

Jing, Shixiong; Chen, Lingwei; Li, Quan; Wu, Dinghao (July 2024, International Conference on Database Systems for Advanced Applications, Springer Nature Singapore)

Graph neural networks (GNNs) rely on the assumption of graph homophily, which, however, does not hold in some real-world scenarios. Graph heterophily compromises them by smoothing node representations and degrading their discrimination capabilities. To address this limitation, we propose H^2GNN, which implements Homophilic and Heterophilic feature aggregations to advance GNNs in graphs with homophily or heterophily. H^2GNN proceeds by combining local feature separation and adaptive message aggregation, where each node separates local features into similar and dissimilar feature vectors, and aggregates similarities and dissimilarities from neighbors based on connection property. This allows both similar and dissimilar features for each node to be effectively preserved and propagated, and thus mitigates the impact of heterophily on graph learning process. As dual feature aggregations introduce extra model complexity, we also offer a simplified implementation of H^2GNN to reduce training time. Extensive experiments on seven benchmark datasets have demonstrated that H^2GNN can significantly improve node classification performance in graphs with different homophily ratios, which outperforms state-of-the-art GNN models.
more » « less
Full Text Available
DOS-GNN: Dual-Feature Aggregations with Over-Sampling for Class-Imbalanced Fraud Detection On Graphs

https://doi.org/10.1109/IJCNN60899.2024.10650494

Jing, Shixiong; Chen, Lingwei; Li, Quan; Wu, Dinghao (June 2024, International Joint Conference on Neural Networks)

As fraudulent activities have shot up manifolds, fraud detection has emerged as a pivotal process in different fields (e.g., e-commerce, online reviews, and social networks). Since interactions among entities provide valuable insights into fraudulent activities, such behaviors can be naturally represented as graph structures, where graph neural networks (GNNs) have been developed as prominent models to boost the efficacy of fraud detection. In graph-based fraud detection, handling imbalanced datasets poses a significant challenge, as the minority class often gets overshadowed, diminishing the performance of conventional GNNs. While oversampling has recently been adapted for imbalanced graphs, it contends with issues such as graph heterophily and noisy edge synthesis. To address these limitations, this paper introduces DOS-GNN, incorporating Dual-feature aggregation with Over-Sampling to advance GNNs for class-imbalanced fraud detection on graphs. This model exploits feature separation and dual-feature aggregation to mitigate the impact of heterophily and acquire refined node embeddings that facilitate fraud oversampling to balance class distribution without the need for edge synthesis. Extensive experiments on four large and real-world fraud datasets demonstrate that DOS-GNN can significantly improve fraud detection performance on graphs with different imbalance ratios and homophily ratios, outperforming state-of-the-art GNN models.
more » « less
Full Text Available
Pseudo-Labeling with Graph Active Learning for Few-shot Node Classification

https://doi.org/10.1109/ICDM58522.2023.00133

Li, Quan; Chen, Lingwei; Jing, Shixiong; Wu, Dinghao (December 2023, IEEE International Conference on Data Mining)

Graphs have emerged as one of the most important and powerful data structures to perform content analysis in many fields. In this line of work, node classification is a classic task, which is generally performed using graph neural networks (GNNs). Unfortunately, regular GNNs cannot be well generalized into the real-world application scenario when the labeled nodes are few. To address this challenge, we propose a novel few-shot node classification model that leverages pseudo-labeling with graph active learning. We first provide a theoretical analysis to argue that extra unlabeled data benefit few-shot classification. Inspired by this, our model proceeds by performing multi-level data augmentation with consistency and contrastive regularizations for better semi-supervised pseudo-labeling, and further devising graph active learning to facilitate pseudo-label selection and improve model effectiveness. Extensive experiments on four public citation networks have demonstrated that our model can effectively improve node classification accuracy with considerably few labeled data, which significantly outperforms all state-of-the-art baselines by large margins.
more » « less
Full Text Available
HOVER: Homophilic Oversampling via Edge Removal for Class-Imbalanced Bot Detection on Graphs

https://doi.org/10.1145/3583780.3615264

Ashmore, Bradley; Chen, Lingwei (October 2023, ACM International Conference on Information and Knowledge Management)

As malicious bots reside in a network to disrupt network stability, graph neural networks (GNNs) have emerged as one of the most popular bot detection methods. However, in most cases these graphs are significantly class-imbalanced. To address this issue, graph oversampling has recently been proposed to synthesize nodes and edges, which still suffers from graph heterophily, leading to suboptimal performance. In this paper, we propose HOVER, which implements Homophilic Oversampling Via Edge Removal for bot detection on graphs. Instead of oversampling nodes and edges within initial graph structure, HOVER designs a simple edge removal method with heuristic criteria to mitigate heterophily and learn distinguishable node embeddings, which are then used to oversample minority bots to generate a balanced class distribution without edge synthesis. Experiments on TON IoT networks demonstrate the state-of-the-art performance of HOVER on bot detection with high graph heterophily and extreme class imbalance.
more » « less
Full Text Available

Search for: All records