<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Federated Graph Anomaly Detection via Contrastive Self-Supervised Learning</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>05/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10634052</idno>
					<idno type="doi">10.1109/tnnls.2024.3414326</idno>
					<title level='j'>IEEE Transactions on Neural Networks and Learning Systems</title>
<idno>2162-237X</idno>
<biblScope unit="volume">36</biblScope>
<biblScope unit="issue">5</biblScope>					

					<author>Xiangjie Kong</author><author>Wenyi Zhang</author><author>Hui Wang</author><author>Mingliang Hou</author><author>Xin Chen</author><author>Xiaoran Yan</author><author>Sajal K Das</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Federated Graph Anomaly Detection via Contrastive</head><p>Self-Supervised Learning Xiangjie Kong , Senior Member, IEEE, Wenyi Zhang , Hui Wang , Mingliang Hou , Xin Chen , Xiaoran Yan , Member, IEEE, and Sajal K. Das , Fellow, IEEE Abstract-Attribute graph anomaly detection aims to identify nodes that significantly deviate from the majority of normal nodes, and has received increasing attention due to the ubiquity and complexity of graph-structured data in various realworld scenarios. However, current mainstream anomaly detection methods are primarily designed for centralized settings, which may pose privacy leakage risks in certain sensitive situations. Although federated graph learning offers a promising solution by enabling collaborative model training in distributed systems while preserving data privacy, a practical challenge arises as each client typically possesses a limited amount of graph data. Consequently, naively applying federated graph learning directly to anomaly detection tasks in distributed environments may lead to suboptimal performance results. We propose a federated graph anomaly detection framework via contrastive self-supervised learning (CSSL) [federated CSSL anomaly detection framework (FedCAD)] to address these challenges. FedCAD updates anomaly node information between clients via federated learning (FL) interactions. First, FedCAD uses pseudo-label discovery to determine the anomaly node of the client preliminarily. Second, FedCAD employs a local anomaly neighbor embedding aggregation strategy. This strategy enables the current client to aggregate the neighbor embeddings of anomaly nodes from other clients, thereby amplifying the distinction between anomaly nodes and their neighbor nodes. Doing so effectively sharpens the contrast between positive and negative instance pairs within contrastive learning, thus enhancing the efficacy and precision of anomaly detection through such a learning paradigm. Finally, the efficiency of FedCAD is demonstrated by experimental results on four real graph datasets.</p><p>Index Terms-Anomaly detection, attributed networks, contrastive self-supervised learning (CSSL), federated learning <ref type="bibr">(FL)</ref>.</p><p>Attribute matrix. A &#8712; R n&#215;n Adjacency matrix.</p><p>Attribute vector of the ith node.</p><p>pair. v i Target node of P i . G i Local subgraph of P i . y i &#8712; {0, 1} Label of P i . A i &#8712; R c&#215;c Adjacency matrix of G i . T c Number of training epochs of FedAD. E i &#8712; R c&#215;d Embedding matrix of G i . e lg i &#8712; R d Embedding vector of G i . e tn i &#8712; R d Embedding vector of v i . s i</p><p>Predicted score of P i . W (&#8467;) &#8712; R (&#8467;-1) &#215; d (&#8467;)  Learnable parameter of the &#8467;th layer of GCN module. W (d) &#8712; R d&#215;d Learnable parameter of discriminator module. M Number of clients. T fc Rounds of communication between the client and the server. T fcd Number of training epochs of local discriminator module in federated contrastive self-supervised learning anomaly detection framework (Fed-CAD).</p><p>Neighbor embedding matrix of client C i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>A NOMALY, a fundamental research problem in machine learning, has been extensively explored across various domains, such as image and time series analysis <ref type="bibr">[1]</ref>. Its primary goal is to identify data points or entities that exhibit unexpected behavior or do not conform to typical patterns. Traditional anomaly detection methods have mainly focused on single-dimensional data or simple relationships. However, the importance of attribute graph anomaly detection has become increasingly evident with the advent of complex relational data in areas such as social networks, IoT, and biological networks.</p><p>Attribute graphs represent complex data structures that include nodes, edges, and various attribute information attached to each node and edge. In contrast to conventional graphs that only represent interactions between nodes, attribute graphs incorporate rich feature information for each node, allowing for the modeling of more intricate interaction systems <ref type="bibr">[2]</ref>. Anomaly detection on attribute graphs involves identifying abnormal nodes, edges, or substructures within this more complex context. Unlike single-dimensional or tabular data, anomaly detection on attribute graphs poses more significant challenges. The detection of abnormal nodes in attribute graphs holds significant implications for various security-related applications and has become an urgent research topic in recent years. Examples include social network analysis, financial fraud detection, and cybersecurity. With the advancements in graph neural networks (GNNs), attribute graph anomaly detection has made significant progress and has demonstrated promising performance in many real-world complex scenarios <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>.</p><p>Despite the progress made by conventional centralized anomaly detection methods, they often require direct access to potentially sensitive user data, raising serious privacy concerns <ref type="bibr">[3]</ref>. The following is an example of a common scenario. Considering the practical problems of the financial industry, for a variety of reasons, urban residents visit different banks. Therefore, their customer information, transaction network, and default history are only stored in the banks they visit. A robust anomaly detection model to perform efficient inference across the customer network, which contains all subgraphs from different banks, is a key requirement for banks wishing to collaborate to conduct comprehensive credit assessments of their customers and identify a common industry blacklist. Concerns about user privacy and conflicts of interest, it is difficult for all banks to share the customer network for training anomaly detection models. Two challenges arise in the above situation. The first challenge is data privacy protection. How do we train anomaly detection models in a distributed system while protecting data privacy? The second challenge is the small client subgraph. In a distributed system, the data scale of each client is small, and the data between clients may overlap. How do we increase the data scale of each client to improve the effect of anomaly detection? Federated graph learning emerges as a compelling alternative to this issue by facilitating collaborative model training across distributed systems without compromising the privacy of individual data owners <ref type="bibr">[4]</ref>. It does so by allowing clients to maintain their data locally and only share model parameters or encrypted gradients. However, when directly applied to anomaly detection tasks in distributed environments where each client holds a relatively small portion of the overall graph data, federated graph learning may suffer from performance degradation due to the lack of comprehensive data exposure during the learning process <ref type="bibr">[5]</ref>.</p><p>In this article, we propose FedCAD, a federated graph anomaly detection framework that integrates the ideas of federated graph learning and anomaly detection. For the first challenge of data privacy protection, a standard solu-tion is to support computation over encrypted data using cryptography-based techniques, such as homomorphic encryption. While providing a solid security guarantee, these techniques have a substantial computational overhead, making them an inappropriate choice. FedCAD solves this challenge by designing federated parameter aggregation and anomaly information updates between clients based on federated learning (FL) without heavy cryptographic operations. First, FedCAD indirectly obtains the graph structure of other clients through federated parameter aggregation and integrates it into its client graph structure. Second, FedCAD does not directly share node features but obtains the neighbor subgraph of the target node through contrastive learning sampling and embeds the subgraph into the low-dimensional embedding vector before sharing. Meanwhile, the server only uploads the abnormal node neighborhood subgraph embeddings, but not the normal node neighborhood subgraph embeddings. These two points ensure data privacy security.</p><p>For the second challenge, that the client-side subgraphs are small in size, an evident approach is to scale-up the data per client, with the company or organization augmenting the scale of the dataset through other channels. However, expanding the scale of the dataset is costly. Another method is to perform data enhancement. However, most current data enhancement methods are suitable for computer vision (CV) and natural language processing (NLP) fields. Because graphs are complicated and non-Euclidean, few data enhancement techniques are available for graph data. In addition, the data generated by data augmentation are mostly negative examples, degrading the model performance in anomaly detection tasks. FedCAD solves this challenge by using a local anomaly neighbor embedding aggregate mechanism, which enables the current client to aggregate the anomaly node neighbor embeddings of other clients, expands the difference between the anomaly node and its neighbor nodes, and makes the difference between positive instances pairs and negative instances pairs in contrastive learning obvious, to improve the effectiveness of subsequent contrastive learning anomaly detection.</p><p>In this article, we propose FedCAD. The following is a summary of our major contributions.</p><p>1) We propose FedCAD, a federated graph anomaly detection framework. FedCAD can collaborate on distributed graph data to train high-quality graph anomaly detection models while protecting data privacy. 2) We propose an anomaly information update strategy that enables each client to aggregate anomaly node neighbor embeddings from other clients during federated processing. By doing so, it expands the difference between the anomaly node and its neighbor nodes, and makes the difference between positive instances pairs and negative instances pairs in contrastive learning obvious, to improve the effectiveness of subsequent contrastive learning anomaly detection. 3) We evaluate the proposed FedCAD on four datasets. The results verify the performance of the algorithm and its superiority over baseline methods. The structure of the remainder of this article is as follows. We analyze the related work in Section II. The preliminary definitions, notations, and problem statements are explained in Section III. We illustrate the technical details of FedCAD in Section IV. We show and analyze the experimental results in Section V. We make a conclusion of this article in Section VI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>In this section, we outline the most related work: anomaly detection on attributed networks, contrastive self-supervised learning (CSSL), and federated graph learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Anomaly Detection on Attributed Networks</head><p>Due to the wide application of attribute networks in complex system modeling, attribute network anomaly detection has gradually emerged in recent years. Some shallow methods are used for attribute network anomaly detection. Perozzi and Akoglu <ref type="bibr">[6]</ref> detect anomalies on the attribute network by extracting the self-network information of each node. Peng et al. <ref type="bibr">[7]</ref> add CUR decomposition to the residual analysis to perform anomaly detection on the attribute network to alleviate the adverse effect of noise features on anomaly detection. Unfortunately, because of the limitations of shallow mechanisms, these models cannot capture complex interactions between different information patterns, especially when the dimensionality of the features is high <ref type="bibr">[8]</ref>.</p><p>GNNs are a class of deep neural networks <ref type="bibr">[9]</ref> specifically designed to model relational patterns inherent in non-Euclidean graph-structured data. Subsequently, a succession of spectral-based GNN variants emerged, harnessing filter from graph signal processing viewpoint <ref type="bibr">[10]</ref>. Among these, graph convolutional network (GCN) stands out by implementing a localized first-order approximation of the spectral graph convolution operations <ref type="bibr">[11]</ref>. This strategic approach enables the efficient extraction of meaningful node representations. Due to the rapid development of deep learning-based anomaly detection, researchers have also proposed anomaly detection methods on attribute networks. Ding et al. <ref type="bibr">[8]</ref> calculated anomalies based on the reconstruction error of the adjacency matrix and attribute matrix by an autoencoder with a GCN layer. Fan et al. <ref type="bibr">[12]</ref> propose a framework that learns the cross-modal interaction between the network structure and node attributes. Li et al. <ref type="bibr">[13]</ref> employ a spectrogram autoencoder to extract each node's latent embedding and a Gaussian mixture model (GMM) to detect each node. Yu et al. <ref type="bibr">[14]</ref> employ random walk sampling and an autoencoder model to dynamically learn network embedding and use a cluster-based detector to detect anomalies. Liu et al. <ref type="bibr">[15]</ref> overcome the shortcomings of autoencoders that cannot fully utilize data and cannot operate on large-scale network data. It uses a CSSL-based anomaly detection framework to capture the connection between the target node and its adjacent substructures. The anomaly node is detected statistically by measuring the consistency of each instance pair with its output score through contrastive learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Contrastive Self-Supervised Learning</head><p>CSSL is a pivotal branch of self-supervised learning in which models are trained to discern similarities and dissimilarities among nodes without explicit labels <ref type="bibr">[16]</ref>. The remarkable achievements of CSSL in CV have spurred its progressive adoption within graph-based learning paradigms.</p><p>DGI constructs contrastive instance pairs by pairing a node's representation with a graph-level summary vector, exploiting graph corruption to generate negative samples and thereby enabling the node representation learning in an unsupervised setting <ref type="bibr">[17]</ref>. Graph contrastive coding (GCC) initializes GNNs by pretraining on universal graph datasets, using a strategy that involves sampling two subgraphs per node as a positive instance pair and utilizing the information noise contrastive estimation (InfoNCE) loss function for representation learning <ref type="bibr">[18]</ref>. Graphical mutual information (GMI) focuses on optimizing the agreement between a node's embeddings and the raw attributes of its neighbors and between the embeddings of connected nodes <ref type="bibr">[19]</ref>. However, most CSSL approaches do not directly target anomaly detection despite their efficacy in learning data representations. To adapt to anomaly detection, CSSL framework for anomaly detection on attributed network (CoLA) adapts the entire contrastive learning framework to calculate anomaly scores for individual nodes <ref type="bibr">[15]</ref>. Recognizing that conventional instance pair definitions might inadequately capture node anomalies, CoLA proposes a new form of contrastive instance pair tailored for graph contrastive learning. This novel design emphasizes local information of each node over global properties. While extensive research has demonstrated the broad applicability of graph contrastive learning, the challenge of applying it to anomaly detection in attributed networks in distributed environments remains a novel and complex research problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Federated Graph Learning</head><p>Federated learning is a novel distributed learning paradigm that can jointly model with multiple clients <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref>. Federated learning has been extensively studied in many fields, for instance, image recognition <ref type="bibr">[22]</ref>, vehicle localization <ref type="bibr">[23]</ref>, machinery fault diagnosis <ref type="bibr">[24]</ref>, wireless communication <ref type="bibr">[25]</ref>, and edge computing <ref type="bibr">[26]</ref>. Federated graph learning is a direction of federated learning and has received extensive attention. Federated graph learning is an emerging technology that can jointly train machine learning models on graph datasets distributed to multiple clients to ensure user privacy <ref type="bibr">[27]</ref>. Caldarola et al. <ref type="bibr">[28]</ref> use GCN to model interdomain interactions and study cross-domain heterogeneity in FL. He et al. <ref type="bibr">[29]</ref> propose a general federation framework suitable for graph embedding algorithms. Jiang et al. <ref type="bibr">[30]</ref>, Chen et al. <ref type="bibr">[31]</ref>, and Wu et al. <ref type="bibr">[32]</ref> study the privacy issue of federated GNNs. Wang et al. <ref type="bibr">[33]</ref> introduce model-agnostic metalearning (MAML) into graph FL to maintain model generality while dealing with non-independent and identically distributed (IID) graph data. Zhang et al. <ref type="bibr">[34]</ref> study the missing neighbor synthesis issue in the subgraph FL context. Xie et al. <ref type="bibr">[35]</ref> study graph classification to reduce heterogeneity in the structure and features of graphs owned by local systems. Peng et al. <ref type="bibr">[36]</ref> focus on research to strengthen the privacy protection of federated learning and study the use of differential privacy for federated knowledge graph embedding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PRELIMINARIES A. Notations and Definitions</head><p>The notations are summarized in Nomenclature. For notation, we use bold lowercase representing a vector (e.g., x), bold uppercase representing a matrix (e.g., X), and calligraphy letter denoting a set (e.g., V).</p><p>Definition 1 (Attributed Networks): G = (V, E, X) is denoted as an attributed network, where V is the node set, E is the edge set, and X &#8712; R n&#215; f is the attribute matrix. The ith row vector x i &#8712; R f of the attribute matrix represents the attribute vector of the ith node. The structure of the attribute network is represented as a binary adjacency matrix A &#8712; R n&#215;n , where the existence of a link between nodes v i and v j is represented by A i, j = 1, otherwise A i, j = 0. An attributed network can also be denoted as G = (A, X).</p><p>Definition 2 (Federated Learning Client Subgraph): We have a server and M clients in the FL system.</p><p>Note that there are certain differences in the number of nodes and graph structure of each client, but there are also some overlapping nodes between the clients. That is, for any graph G C k , there exits k &#824; = t, so that V C k &#8745; V C t &#824; = &#8709;. This setup conforms to the distribution of graph data in the real world.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Problem Statement</head><p>1) Node Anomaly Detection in Distributed Systems: Suppose there are M clients, G C k denotes the data owned by the client C k , the goal is to train an anomaly score function f in distributed systems. The anomaly nodes can be detected by ranking each node according to its anomaly score. We employ an unsupervised anomaly detection method. In the training phase, we only give the attribute network G containing abnormal nodes and do not give the abnormal labels of the nodes.</p><p>In this work, we aim to propose a federated anomaly detection framework on attributed networks, which can collaboratively learn on isolated subgraphs in different clients to train a high-quality anomaly detection model while protecting data privacy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. DESIGN OF FEDCAD</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Overall Architecture</head><p>We propose FedCAD, a novel federated CSSL anomaly detection framework on attributed networks, as shown in Fig. <ref type="figure">1</ref>.</p><p>First, we adopt an anomaly detection framework on attributed networks, which can model the relationship between the target node and its partial neighboring substructure by training a contrastive learning model, revealing various anomalies in the graph. Second, we integrate the anomaly detection algorithm into the federated learning framework to implement a common federated graph anomaly detection framework, called FedAD. It simply combines the local anomaly detection model with the classic federated learning algorithm Fedavg [37], allows the local client to upload the parameters of the anomaly detection model, performs the average aggregation of the parameters in the server to update the parameters, and finally sends the updated parameters to each client. Third, based on FedAD, we propose FedCAD. FedCAD employs federated learning to upload the vector embedding of the neighbor subgraph of the target node sampled by contrastive learning in the client to the server. The server aggregates abnormal information between clients and sends the aggregated vector embedding set to each client. The client performs average aggregation, thereby improving the difference in the relationship between the target node and the neighbor subgraph in contrastive learning and improving the effectiveness of local anomaly detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Local Anomaly Detection Module</head><p>Inspired by CoLA <ref type="bibr">[15]</ref>, we adopt an anomaly detection module, which is shown in Fig. <ref type="figure">2</ref>. The overall procedure is depicted in Algorithm 1. 1) Instance Pair Sampling: We adopt a contrastive instance pair. The first element is a single node called the "target node." The target node randomly traverses every node on the graph in an epoch. The second element is a subgraph sampled from the "target node." Node anomalies in attribute networks are usually reflected as inconsistencies between the target node and its local neighbors. 2) GCN-Based Contrastive Learning Model: The instance pair P i can be denoted as follows:</p><p>where v i is the target node, and G i is the subgraph sampled from v i . G i can be denoted as</p><p>and y i is the label of P i . y i can be denoted as follows: 1:</p><p>Initialize the contrastive learning model parameters (W (0) , . . . , W (d) ). 5:</p><p>Sample positive instance pairs (P (+) 1 , . . . , P (+) B ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6:</head><p>Sample negative instance pairs (P (-) 1 , . . . , P (-) B ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>7:</head><p>Calculate the scores (s (+) 1 , . . . , s (+) B , s (-) 1 , . . . , s (-) B ) by Eq. 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>8:</head><p>Calculate L cml by Eq. 8.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>9:</head><p>Update the parameters (W (0) , . . . , W (d) ) of contrastive learning model via back propagation. 10: end for 11: // The inference stage. 12: for v i &#8712; V do 13: Sample positive instance pairs (P (+)  i,1 , . . . , P (+) i,R ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>14:</head><p>Sample negative instance pairs (P (-) i,1 , . . . , P (-) i,R ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>15:</head><p>Calculate the scores (s (+) i,1 , . . . , s (+) i,R , s (-) i,1 , . . . , s (-) i,R ) by Eq. 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>16:</head><p>Calculate the anomaly score f (v i ) by Eq. 10. 17: end for In positive instance pair, the target node and the subgraph are connected. In negative instance pair, the target node and the subgraph are not connected. First, the local subgraph is fed into GCN to obtain the local subgraph embedding. The GCN model can be written as follows:</p><p>where Di is the degree matrix of the local subgraph, X i is the attribute matrix, and E i is the embeddings of subgraph nodes.</p><p>Then, the embeddings of subgraph nodes E i are transformed into a local subgraph embedding vector e lg i by the average pooling method. Specifically, the method is written as follows:</p><p>where n i is the number of nodes in the subgraph G i . We also map the target node to the same embedding space to compare them</p><p>where W (0) is the mapping weight matrix of GCN. x v i is the attribute vector of target node, and e tn i is the target node embedding. We evaluate the final prediction score by comparing e </p><p>3) Anomaly Score Computation: Typically, well-trained GNN models tend to capture the matching patterns of normal samples more effectively, as they make up a large proportion of the training data and have distinct regularities and features. Conversely, abnormal samples present a greater challenge due to their variety and irregularity, making it harder for the model to fit their patterns. Ideally, for a normal node, the predicted score of its positive pair s (+) i should approximate 1, while the negative one s (-) i approaches 0. For an abnormal node, the discrimination between the predicted scores of its positive and negative pairs would be lower since the model is not able to distinguish its matching pattern well. With the properties mentioned above, we can define the anomaly score for each node as simply the difference between its negative and positive scores. This score is negative for normal nodes, whereas it is close to 0 for abnormal nodes. Hence, the higher the anomaly score, the greater the probability of the node being anomalous. The anomaly score for each node v i is written as follows:</p><p>where s (+)   i is the predicted score of the positive pair, and s (-)   i is the predicted score of the negative pair. However, a single-sampled local subgraph of the target node cannot describe the entire neighborhood distribution, which causes the performance to decrease. To solve this problem, we repeatedly run multiple rounds to calculate the average prediction scores. The final anomaly score for each node is written as follows:</p><p>where R is the sampling round.     </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. FedAD Algorithm</head><p>McMahan et al. <ref type="bibr">[37]</ref> proposed FedAvg. The FedAvg is the first basic FL algorithm. We integrate the anomaly detection algorithm into the federated learning framework to implement a simple federated graph anomaly detection framework, called FedAD, as shown in Fig. <ref type="figure">3</ref> </p><p>Authorized licensed use limited to: Missouri University of Science and Technology. Downloaded on September 04,2025 at 22:07:18 UTC from IEEE Xplore. Restrictions apply. where &#179; is the learning rate, and the updated local model parameters W t C k are uploaded to the server. 3) Global Model Parameters Update: The server aggregates the parameters of the updated local model uploaded by each client to generate the updated global model parameters W t+1 G . The formula is</p><p>where FedAD combines the local anomaly detection model with the classic federated learning algorithm FedAvg. First, FedAD is an integral part of our proposed federated anomaly detection FedCAD, providing FedCAD with the anomaly node score, the target node embedding vector, and the neighbor node embedding matrix. Second, FedAD is an ablation version of FedCAD, which is used to verify the effectiveness of pseudolabel discovery, local anomaly neighbor embedding aggregate, and local discriminator module.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Anomaly Information Update Between Clients Based on FL</head><p>In graphs, connections may generally exist between nodes and their neighbors. Conversely, an anomaly usually occurs when there is a mismatch between the target node and its neighbors, violating the network's original matching pattern.</p><p>Contrastive learning, with its intrinsic discriminative mechanism, is adept at capturing abnormal mismatch patterns. However, the limited scale of data within each client presents practical challenges for our study. In the context of banking operations, for example, bank branches are typically distributed throughout the country, with some branches located in sparsely populated areas possessing relatively small user data. This scarcity of graph data leads to an overabundance of duplicate nodes when forming positive and negative instance pairs. Consequently, when contrastive learning samples negative instance pairs from such limited data, the neighboring nodes in these pairs are highly likely to overlap with those in the positive instance pairs. As a result, this leads to inconspicuous distinctions between positive and negative instance pairs, ultimately diminishing the effectiveness of contrastive learning. Despite our attempts to mitigate this issue through repeated execution of R rounds of contrast instance pair sampling in the local anomaly detection module, this approach fails to address the problem comprehensively.</p><p>To solve the problem, we propose FedCAD. FedCAD extra aggregates the neighbor node embeddings of anomalous nodes from other clients during the federated learning process. This strategy enlarges the difference between abnormal nodes and their neighboring counterparts, rendering the contrast between positive and negative instance pairs more apparent,</p><p>The rounds of communication between the client and the server: T f c , The number of training epochs of Local Discriminator Module: T f cd , The confidence threshold for determining the pseudo label: &#955;. Output: Anomaly score function: f (&#8226;). Server initialize global model parameters W 0 G , distributing them to each client. Run the Algorithm 2. while not converge and t f c &#8712; 1, 2, . . . , T f c do // Client: Calculate the pseudo labels and extract the neighbor embedding matrix. Run the inference stage of Algorithm 1. Calculate the pseudo labels &#562; C k by Eq. 13. Extract the neighbor embedding matrix U C k . Upload &#562; C k and U C k to the server. // Server: local anomaly neighbor embedding aggregate.</p><p>Aggregate the local anomaly neighbor embedding matrix by Eq. 14.</p><p>Distribute</p><p>Client: local discriminator module training. // Training phase. Average the aggregated neighbor embedding matrices by Eq. 15. for not converge and t f cd &#8712; 1, 2, . . . , T f cd do Calculate the predicted scores s i of v i by Eq. 16. Calculate L f in by Eq. 17. Update the local discriminator module parameters W t f c C k via back propagation. end for // Inference phase. Calculate the predicted scores s i of v i by Eq. 16.</p><p>Calculate the anomaly score f (v i ) by Eq. 9. end while thus improving the effectiveness of contrastive anomaly detection. We first design a pseudo-label discovery on clients and local anomaly node embedding update on the server. Then, we feed the obtained results into the discriminator to get the final abnormal node score S. The algorithm of the FedCAD framework is depicted in Algorithm 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Pseudo-Label Discovery</head><p>Based on the anomaly node score of FedAD, we try to discover pseudo labels of the anomaly node. To get the pseudo labels, we specifically extract these prediction scores from the final anomalous node score S. For the prediction results from S k of the ith node in S, if the predicted score is higher than the threshold, then it is a pseudo label</p><p>where &#562; k C i is the pseudo label of the kth target node of the client C i , and &#955; &#8712; (-1, 0) is the confidence threshold for </p><p>determining the pseudo label. Then, we extract the neighbor node embedding matrix U C i of all target nodes, which consists of the local subgraph embedding vector e lg i . e lg i is converted from the readout module in FedAD, which is shown in <ref type="bibr">(5)</ref>. We set the vector in U k C i with &#562; k C i = 0 as the 0 vectors. There are two main reasons. First, it can effectively protect data privacy. Only uploading the abnormal node neighbor embedding vector can effectively protect data privacy from being leaked. Second, it greatly reduces the communication overhead of federated learning. The proportion of abnormal nodes is small, and only uploading the neighbor embedding vector of abnormal nodes can save a lot of communication overhead. In the final step, each client uploads U and &#562; to the server.</p><p>1) Local Anomaly Neighbor Embedding Aggregate: After the server receives U and &#562; from each client update, the server obtains the pseudo label of the abnormal node that coincides with each client and other clients. After that, the server aggregates the abnormality neighbor embeddings of all clients according to the pseudo label of the coincident node. The process of local anomaly neighbor embedding aggregate is shown in Fig. <ref type="figure">4</ref>. For the sake of illustration, consider an arbitrary client C i among a total of M clients. Here, C j , where i &#824; = j, represents any other distinct client from the set of all clients.</p><p>1) The pseudo label &#562; C i takes the intersection with the pseudo label &#562; C j of other clients to obtain the set of coincident abnormal node pseudo labels</p><p>We copy M copies of the neighbor embedding matrix U C i of the target node, and then replace the neighbor embedding matrix U C i with the neighbor embedding matrix U C j of other clients according to the obtained coincident abnormal node pseudo label set &#562; C j C i , which is written as follows:</p><p>where I is the identity matrix and &#187; is the Hadamard product, i.e., the multiplication of the corresponding positions of the two matrices. We then obtain the aggregated neighbor embedding matrix set</p><p>3) The server repeatedly executes the above process for each neighbor embedding uploaded by each client. 2) Local Discriminator Module: Fig. <ref type="figure">5</ref> shows the process of the local discriminator module. We send the aggregated neighbor embedding matrix set to their respective clients. Each client averages the set of aggregated neighbor embedding matrices, which enlarges the difference between the anomalous target node and its neighbors</p><p>where U C i is the averaged neighbor embedding matrix. u lg i denotes the averaged neighbor embedding vector of the ith node. Then, we obtain the target node embedding vector e tn i , which is generated by <ref type="bibr">(6)</ref>. Next, we feed u lg i and e tn i into the local discriminator module for training. The local discriminator module also uses a sample bilinear scoring function, which is written as follows:</p><p>Here, we attempt to maximize similarities between the ground-truth label y i and the prediction result s i . Therefore, we adopt the standard BCE loss as our objective function</p><p>Finally, we compute the local anomaly score using the anomaly score calculation formula, as shown in <ref type="bibr">(9)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Complexity Analysis</head><p>In FedCAD, we analyze the time complexity of this framework considering two main components: FedAD and the anomaly information update between clients based on FL. In FedAD, we consider two main components: the local anomaly detection module and FedAvg. For the local anomaly detection module, the time complexity of each RWR subgraph sampling is O(c &#182;), where c is the number of sampled neighbor nodes and &#182; is the mean degree of the network. We run R rounds of sampling for each node, then the total time  complexity becomes O(cn &#182; R), where n is the number of graph nodes in the client. The time complexity of the GCN module is O(c 2 n R). The time complexity of the local anomaly detection module is O(cn R(c + &#182;))</p><p>. For FedAvg, it consists of two components: client and server. For the client, all clients run in parallel for each epoch, with the same time complexity as the local anomaly detection module. For the server, only the average aggregation operation of local parameters is performed, and the time complexity is O(M). The rounds of communication between the client and the server are T fc . Therefore, the overall time complexity of FedAD is O(T fc (cn R(c + &#182;) + M)). For the anomaly information updates between clients based on FL, the time complexity is mainly generated by the local anomaly node embedding update, which is</p><p>To reduce the communication complexity, when the client uploads the neighbor node embedding matrix, FedCAD only uploads the abnormal node neighbor embedding vector, and the normal node neighbor embedding vector is not uploaded. Since the proportion of abnormal nodes is small, it can save a lot of communication overhead.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Datasets</head><p>We employ four datasets that are commonly used to detect anomalies in attributed networks, with the basic information shown in Table <ref type="table">I</ref>.</p><p>1) Cora: Cora is an academic citation network. The node represents the published paper, and the edge represents the citation relationship between the papers. The attribute vector is represented by bag-of-words. 2) Wiki: Wiki is a web page link network derived from the English Wikipedia website. Each node is a web page explaining the term. The edge is as hyperlink reference between web pages. The content of web pages is transformed into bag-of-words representations of the initial node features. 3) BlogCatalog: BlogCatalog is a blog sharing website.</p><p>Each node represents the users of websites. The edge represents the following relationships between users. Node attributes consist of a list of user tags. 4) Flickr: Flickr is an image hosting and sharing website.</p><p>Each node represents the users of websites. The edge represents the following relationships between users.</p><p>Node attributes consist of a list of user tags and photo tags.</p><p>The aforementioned datasets lack ground-truth anomalies, hence to evaluate them, artificial anomalies must be injected into the attribute network. We refer to the node attribute anomalies injection method that has been used in the previous research <ref type="bibr">[15]</ref> to generate node attribute anomalies for each dataset. We randomly pick a node v i as the target node. Then, we select another n nodes V = (v 1 , v 2 , . . . , v n ) as a candidate set. Here, we set n = 50. For each v j &#8712; V , between the attribute vectors of x i and x j we compute their Euclidean distance. Then, we pick the node v j with the largest Euclidean distance from the node v i as the anomaly node. We change x i to x j .</p><p>We ultimately obtain the perturbed network using the abovementioned injection technique, the number of abnormal nodes is shown in Table <ref type="table">I</ref>. In our experiments, all class labels are removed and the anomaly labels y i are only visible in the inference stage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Experimental Settings</head><p>In this section, we outline the critical aspects of our experimental setup. This includes presenting the comparative baselines, discussing the evaluation metrics used, and detailing the parameter setting in our framework.</p><p>1) Baselines: We compare our proposed FedCAD framework with the following commonly used methods for anomaly detection.</p><p>1) DOMINANT <ref type="bibr">[8]</ref>: In DOMINANT, the adjacency matrix and the attribute matrix are concurrently reconstructed using an autoencoder. By calculating the reconstruction error, it evaluates the abnormality of each node. 2) AnomalyDAE <ref type="bibr">[12]</ref>: AnomalyDAE is an anomaly detection framework via dual autoencoders, which learns the cross-modal interactions during node attribute reconstruction by taking attribute embeddings as inputs to attribute decoders. It detects anomalies by measuring the reconstruction error. 3) CoLA <ref type="bibr">[15]</ref>: CoLA is an anomaly detection framework to catch the connections between the target node and its adjacent substructures based on CSSL in an unsupervised manner. The anomaly of each node is estimated statistically by measuring the consistency of each instance pair with its output score through contrastive learning. 4) FedAD: FedAD combines the localized model utilizing the weighted-average approach to obtain the global model. This method is an ablation version of FedCAD, which is used to verify the effectiveness of pseudo-label discovery, local anomaly neighbor embedding aggregate, and local discriminator module. 2) Evaluation Metrics: We adopt receiver operating characteristic-area under the curve (ROC-AUC) as an evaluation metric to evaluate the performance of our proposed framework. ROC-AUC is commonly used as an evaluation metric for anomaly detection frameworks <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>. The ROC curve consists of a false positive rate (FPR) and a true positive rate (TPR). The ground-truth labels and the results of the anomaly detection are used to calculate TPR and FPR. The closer the AUC is to 1, the better the performance of the method.</p><p>3) Parameter Settings: We set the number of nodes in the sampling graph as 4. We fixed the embedding dimension as 64. We fixed the batch size B = 256. We use the Adam to train the model. We set train epochs as 400 on the BlogCatalog and Flickr. We set train epochs as 100 on the Cora and Wiki. We set the learning rate as 0.001 on Cora, Wiki, and Flickr. We set the learning rate as 0.003 on the BlogCatalog. The learning rate for the local discriminator module in FedCAD is 0.05. For Cora, Wiki, BlogCatalog, and Flickr, the confidence threshold for determining the pseudo label is -0.3, -0.5, -0.5, and -0.4, respectively. For each dataset, we sample 256 rounds during the inference phase to obtain accurate detection results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Anomaly Detection Results</head><p>The graph data of each client is obtained using random sampling results of experimental datasets with various scales to imitate the graph data distribution in the actual world. We set up four clients whose data proportions are [50%, 60%, 70%, 80%], respectively. We repeated the experiment five times to ensure that the framework is authentic and reliable.</p><p>The ROC curves of the client containing 50%, 60%, 70%, and 80% of the data size are shown in Figs. <ref type="figure">6</ref><ref type="figure">7</ref><ref type="figure">8</ref><ref type="figure">9</ref>, respectively. Table <ref type="table">II</ref> displays the AUC results for the four benchmark datasets for comparison by computing the area under the ROC curves. FedCAD significantly outperformed FedAD and also extremely surpassed other baselines. For the experimental results, we conducted the following analysis.</p><p>FedCAD performs better than other baselines. There are two main reasons.</p><p>1) Federated learning cooperates with multiple clients to learn together, potentially increasing data size for model training. An anomaly detection model learned from cooperative learning has better performance than a model trained using only the data of each client alone.</p><p>2) The proposed local anomaly node embedding update can expand the difference between abnormal nodes and their neighbors. It improves the contrastive learning performance and further improves the FedCAD performance. Compared with FedAD, FedCAD first uses pseudo-label discovery to infer abnormal nodes. Then use pseudo labels to perform local anomaly node embedding updates to expand the difference between abnormal nodes and their neighbors, improve the quality of the instance pairs, and obtain a high-quality anomaly detection model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Parameter Study</head><p>In this section, we explore the effect of three critical parameters on the performance of the proposed framework: the data size of clients, the number of clients, and the confidence threshold. We perform these experiments on the four datasets.</p><p>1) Data Size of Client: In this experiment, our goal is to explore the effect of data sizes on FedCAD. While keeping all other experimental parameters constant, we investigated two different scales of client configurations. Initially, we explored a setup involving three clients. Within this scenario, we designed three sets of experimental schemes. In these experiments, the proportion of data nodes held by each client (referred to as the sampling ratio) was specifically set as follows: the first group utilized [20%, 30%, 40%], the second group employed [20%, 40%, 60%], and the third group used [30%, 50%, 70%]. Subsequently, we increased the number of clients to five, ensuring that all other conditions unrelated to the sampling rate remained unchanged. We assigned a different sampling ratio for these five clients as [20%, 40%, 50%, 70%, 90%]. The purpose of these adjustments was to systematically observe and analyze the impact on the overall performance of changes in the number of clients and their respective proportions of data nodes. The experimental results are shown in Table <ref type="table">III</ref>. For the experimental results, we conducted the following analysis.</p><p>1) It can be observed from [20%, 30%, 40%] that FedCAD can still achieve the best results even if the data size of each client is small. This is mainly attributed to that the local anomaly neighbor embedding aggregate module in FedCAD can enlarge the difference between the anomalous target node and its neighbors. 2) FedCAD still performs better than FedAD in most cases, especially below [20%, 40%, 50%, 70%, 90%], which    indicates that the proposed local anomaly neighbor embedding aggregate effectively enlarges the difference between anomaly nodes and their neighbors, and learns a high-quality FedCAD model. 3) Compared with Table <ref type="table">II</ref>, FedCAD can still achieve good results and maintain its advantages despite the different number of clients and data size.</p><p>2) Confidence Threshold: Next, we study the impact of confidence threshold &#955; on our proposed FedCAD framework. The confidence threshold &#955; in ( <ref type="formula">13</ref>) is used to control the number of pseudo labels. To study how &#955; affects the performance of FedCAD, we tune &#955; &#8712; [-0.1, -0.9] with step size 0.1. The results are shown in Fig. <ref type="figure">10</ref>. As can be seen from the result, choosing a suitable &#955; value can continuously improve the final AUC value. From the results, [-0.3, -0.5] is a desirable interval.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3) Number of Clients:</head><p>In the FedCAD framework, the number of clients is a crucial parameter significantly affecting performance and scalability. In this experiment, we aim to investigate the impact of the number of clients on FedCAD. We evaluate the performance of FedCAD on four datasets, keeping other parameters constant while varying the number of clients from 4 to 20. Notably, we also include FedAD as a baseline method for comparative analysis. As illustrated in Fig. <ref type="figure">11</ref>, the results show that FedCAD consistently demonstrates substantial performance improvements over FedAD</p><p>TABLE II AUC VALUES COMPARISON ON FOUR BENCHMARK DATASETS AT FOUR CLIENTS TABLE III AUC VALUES COMPARISON ON FOUR BENCHMARK DATASETS UNDER THE DIFFERENT OF CLIENTS AND DATA SIZE on all datasets, demonstrating robustness to variations in the number of clients. However, as the number of clients reaches higher values, there is some degradation in the average performance between different clients, observed in both FedCAD and FedAD. This trend primarily arises due to the growing disparity in graph data distribution across clients, leading to more pronounced non-IID challenges. Nevertheless, FedCAD continues to exhibit relatively more minor performance fluctuations than FedAD, effectively demonstrating the scalability of the proposed FedCAD method. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>In this article, we design FedCAD for the task of distributed graph anomaly detection. First, FedCAD utilizes federated learning and does not require centralized client data for training. Furthermore, the sampling neighbor subgraph of the target node obtained by contrastive learning is embedded in the low-dimensional embedding vector before</p><p>TABLE IV AUC VALUES COMPARISON FOR GLOBAL GRAPH ANOMALY DETECTION ON FOUR BENCHMARK DATASETS</p><p>being uploaded to the server. Second, FedCAD proposes a client-side abnormal neighbor aggregation mechanism based on contrastive learning. Specifically, each client aggregates the sampled neighbor subgraph embedding vectors of the other clients' abnormal nodes on average, improving local anomaly detection performance. This mechanism achieves the effect of data enhancement and solves the challenge of the data scale of each client being small. Experimental results demonstrate that FedCAD outperforms a range of baseline methods.</p><p>Despite FedCAD exhibiting superior performance, many areas can still be improved. The first area that can be improved is additional communication costs. The communication costs are high due to frequent parameter transfers between the federated learning client and server. In FedCAD, the transfer of vector embedding also increases communication costs. The second area that can be improved is to apply FedCAD to heterogeneous graphs. FedCAD can only be applied to the homogeneous graph. Due to the different types of nodes and links in the heterogeneous graph, the relationships between the nodes have various meanings. Therefore, FedCAD cannot be directly applied to heterogeneous graph. Future work can focus on appropriately reducing communication costs and applying it to the heterogeneous graph.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX GLOBAL GRAPH ANOMALY DETECTION WITH FEDERATED LEARNING</head><p>We performed the global graph anomaly detection with federated learning on the server and obtained the results of the model trained on the global graph. Note that it is often very difficult to gather data together in the real world due to privacy security and industry competition. It is designed specifically for the academic experiment to test the performance of global graph anomaly detection with federated learning. The global graph data is collected from each client. Global graph anomaly detection with federated learning can approach or even exceed the performance of centralized training models by collecting data from clients.</p><p>Table <ref type="table">IV</ref> displays the AUC results for the four benchmark datasets for comparison by computing the area under the ROC curve. Our proposed FedAD achieves the best anomaly detection performance in all four datasets. The main reason is that due to the particularity of graph data, each client's graph data can be regarded as a sample of the larger graph data, so there are overlapping nodes between clients. Each client uses the sampled graph data to train a local model, which amounts to data augmentation during training and therefore outperforms the baseline using a single graph data.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Missouri University of Science and Technology. Downloaded on September 04,2025 at 22:07:18 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
