<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Unveiling Anomalous Edges and Nominal Connectivity of Attributed Networks</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>11/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10273952</idno>
					<idno type="doi">10.1109/IEEECONF51394.2020.9443464</idno>
					<title level='j'>Asilomar Conference on Signals Systems and Computers</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Konstantinos D. Polyzos</author><author>Costas Mavromatis</author><author>Vassilis N. Ioannidis</author><author>Georgios B. Giannakis</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Uncovering anomalies in attributed networks has recently gained popularity due to its importance in unveiling outliers and flagging adversarial behavior in a gamut of data and network science applications including the Internet of Things (IoT), finance, security, to list a few. The present work deals with uncovering anomalous edges in attributed graphs using two distinct formulations with complementary strengths, which can be easily distributed, and hence efficient. The first relies on decomposing the graph data matrix into low rank plus sparse components to markedly improve performance. The second broadens the scope of the first by performing robust recovery of the unperturbed graph, which enhances the anomaly identification performance. The novel methods not only capture anomalous edges linking nodes of different communities, but also spurious connections between any two nodes with different features. Experiments conducted on real and synthetic data corroborate the effectiveness of both methods in the anomaly identification task.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Unveiling rare or anomalous data is of paramount importance in numerous applications such as recommender systems, finance, security, health care, autonomous driving and insurance <ref type="bibr">[1]</ref>. Anomaly identification methods can spot network intrusions and failures, spam reviews, credit card frauds, malware and rare disease outbreaks.</p><p>The present work focuses on anomalies that appear in data adhering to a graph structure. Consider a graph that consists of nodes and edges, where each node is associated with an attributed vector. In an IoT network for example, the nodes represent mobile devices and the edges capture the communication among devices. The graph connectivity may be perturbed e.g. by an adversary who adds edges possibly at random. Examples of such anomalous edges include links among nodes belonging to different communities or among nodes with uncorrelated attributes. Leveraging the graph structure and the correlation of nodal features will enable identification of anomalous edges.</p><p>Anomalies can emerge in either static or dynamic graphs. Dynamic ones comprise sequences of graphs and the corresponding methods leverage temporal correlation too <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>,</p><p>The work in this paper was supported by the NSF grants 1901134, 171141, and 1500713.</p><p>Emails: {polyz003, mavro016, ioann006, georgios}@umn.edu * Equal Contribution. <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>. The focus here will be on static graphs, but dynamic graphs are also included in our future research agenda.</p><p>Most anomaly detection methods for static graphs pursue community-based approaches. They primarily rely on matrix factorization techniques that decompose the graph matrix into a low rank component and a residual matrix <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>. Matrix factorization is widely employed for dimensionality reduction <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref> and (graph) clustering <ref type="bibr">[13]</ref>. The approach in <ref type="bibr">[10]</ref> spots anomalous connections if they appear in the residual matrix, after the non-negative matrix factorization.</p><p>Albeit interesting, the aforementioned methods do not account for attributes. On the other hand, approaches have been reported that incorporate nodal features to unveil anomalous nodes <ref type="bibr">[14]</ref>, <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>, <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref>. These are suitable to spot anomalous nodes within a certain community, but not anomalous edges across communities <ref type="bibr">[21]</ref>.</p><p>Contribution. The present work proposes two novel approaches that rely on low rank and sparse factorization, and judiciously account for nodal attributes to spot anomalous edges in a given perturbed graph. The second approach further recovers the ground-truth topology, which turns out to markedly enhance identification performance. Accounting for smoothness across nodal features also boosts identification performance. Different from existing approaches, the novel methods not only identify anomalous links between different communities, but also mendacious edges between any two nodes based on the nodal features. Finally, two optimization methods with complementary strengths are developed to unveil the anomalous edges and robustly recover the underlying nominal topology. Although the first method is computationally more efficient, the second unveils more accurately the perturbed edges. Complexity analysis of the developed methods is also provided.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. UNVEILING ANOMALOUS EDGES</head><p>Consider a graph that consists of N nodes and E edges connecting pairs of nodes. Connectivity is captured by the N &#215; N adjacency matrix A. The initial (a.k.a. nominal) graph is perturbed by inserting (anomalous) edges linking nodes with correlated or uncorrelated nodal features. Let &#195; denote the perturbed adjacency, and &#7868; the set of perturbed edges. The N &#215; N Laplacian matrix is where 1 is the all-ones vector and diag(&#8226;) is a diagonal matrix. Each node n also holds an F &#215; 1 feature vector x n , and all feature vectors are collected in the N &#215; F feature matrix X. For example, in a network that consists of voters, attributes may correspond to each voter's age and state of residency. Goal. Given the perturbed Laplacian matrix L &#8712; R N &#215;N and the feature matrix X, this paper aims to unveil the anomalous edges in &#7868; and recover the unperturbed Laplacian matrix L. Fig. <ref type="figure">1</ref> shows a perturbed social voting network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Anomalous edges across communities</head><p>Anomalous edges often connect nodes belonging to different communities <ref type="bibr">[1]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>. For example, in a typical social voting network, each node is either Republican or Democrat, and connections between nodes across the isle are rare <ref type="bibr">[1]</ref>.</p><p>Factorizing the Laplacian matrix of a graph is known to reveal the community structure <ref type="bibr">[7]</ref>, <ref type="bibr">[13]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[22]</ref>. Specifically, the low-rank approximation of the Laplacian yields the communities, while the low rank captures the number of communities. To retrieve anomalous edges connecting different clusters, it is possible to augment the decomposition with a sparse matrix containing edges that connect different communities, and hence do not obey the low-rank model <ref type="bibr">[3]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[10]</ref>. Upon accounting for the low-rank approximation matrix R and the sparse matrix S, our formulation amounts to</p><p>where &#8226; F denotes the Frobenius norm; &#8226; 1 the 1 norm;</p><p>&#8226; * the nuclear norm; and, &#955;, &#181; &gt; 0 are properly chosen regularization parameters. The 1 norm controls sparsity of S, while the nuclear norm equals the 1 norm applied to the vector of singular values of R, which promotes R to be low rank <ref type="bibr">[23]</ref>. The nonzero entries of S reveal the anomalous edges that do not adhere to the low rank model.</p><p>Albeit interesting, the approach in (2) does not leverage the feature matrix X. In the social voting network paradigm, two individuals with opposing political views may be friends because they live in the same state or belong to the same age group (cf. Fig. <ref type="figure">1</ref>). The next section outlines our modeling approach that incorporates nodal features when available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Smoothness over features</head><p>Smoothness of signal models has been widely adopted by several learning methods in the form of regularization for e.g., (semi-)supervised learning over graphs <ref type="bibr">[24]</ref>. Intuitively, X is deemed smooth if features of connected vertices have similar values. This principle suggests, for instance, estimating one person's age by looking at their friends' age <ref type="bibr">[25]</ref>, <ref type="bibr">[26]</ref>.</p><p>Smooth features among neighboring nodes are captured by the so-termed Laplacian regularizer <ref type="bibr">[25]</ref>, <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref> </p><p>where a ij = 0 only if nodes i and j are connected, Tr denotes matrix trace, and T stands for transposition. The value of ( <ref type="formula">3</ref>) is small if neighboring vertices have similar signal values and high otherwise. Thus, minimizing (3) penalizes differences between features of connected nodes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. LEVERAGING SPARSITY, SMOOTHNESS AND LOW</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RANK</head><p>Incorporating feature smoothness, (2) boils down to (cf. ( <ref type="formula">3</ref>))</p><p>where &#947; &gt; 0 tunes the degree of smoothness. The nonzero entries of S reveal edges that do not adhere to the low rank model with smooth features. The next subsections provide alternative formulations of (4), and corresponding optimization solvers with complementary benefits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Anomalous edge identification via low rank factorization</head><p>As both the nuclear and 1 norms are non-differentiable at the origin, the optimization problem (4) is convex but nonsmooth. Although iterative solvers of (4) are available, they rely on performing a costly singular value decomposition per iteration, which incurs prohibitive computational complexity, especially for large-scale graphs <ref type="bibr">[2]</ref>, <ref type="bibr">[28]</ref>.</p><p>A prudent strategy to bypass this hurdle relies on the factorization R = UV T , where U, V are N &#215;R matrices with R denoting an upper bound on the rank of R. Specifically, it holds that <ref type="bibr">[29]</ref>, <ref type="bibr">[30]</ref> </p><p>Combining ( <ref type="formula">5</ref>) with ( <ref type="formula">4</ref>), the optimization in (4) reduces to</p><p>Although ( <ref type="formula">6</ref>) is non-convex, alternating minimization solvers can be employed with closed-form updates. Different from (4), problem ( <ref type="formula">6</ref>) can be solved in a distributed fashion.</p><p>We developed an alternating least-squares (ALS) solver of (6) whose steps are listed under Algorithm 1. Our ALS is provably convergent, but the proof is omitted due to space limitations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Anomalous edge identification aided by Laplacian recovery</head><p>The formulation in (4), focused solely on retrieving the anomalous edges. However, one can also approximate the unperturbed graph structure, and thus aid the identification of anomalous edges. This subsection deals with robust estimation of the nominal graph topology too, and further improving identification of the anomalous edges.</p><p>Graph connections capture correlations among nodal feature vectors; see <ref type="bibr">[31]</ref> for a comprehensive survey. Learning the graph structure by leveraging the statistical properties of the features will endow estimation with increased robustness to anomalies. Popular approaches to estimating graph structure are closely related to the sparse inverse covariance matrix estimation <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>, <ref type="bibr">[34]</ref>. The graph Laplacian can be approximated similarly by estimating the sparse inverse covariance matrix <ref type="bibr">[35]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[36]</ref>. Specifically, the sparse inverse covariance selection problem <ref type="bibr">[35]</ref> is formulated as follows</p><p>where Tr(XX T &#920;) -log det(&#920;) minimizes the negative log-likelihood, while &#920; 1 promotes sparsity. The term Tr(XX T &#920;) is closely related to the smoothing term Tr(X T RX) of ( <ref type="formula">4</ref>), and holds the feature correlation information. Although &#920; captures feature correlations (&#920; ij = 0 if nodes i and j have conditionally correlated features), it is not necessarily a valid Laplacian.</p><p>To cope with this challenge, we couple ( <ref type="formula">2</ref>) and ( <ref type="formula">7</ref>), and introduce due constraints to recover the unperturbed graph Laplacian and the anomalies min R,S,&#920; 0</p><p>where (8b), (8c), (8d) constrain the learned R to be positive semidefinite and thus a valid Laplacian, while R -&#920; 2 F seeks a Laplacian that stays close to &#920; but also couples ( <ref type="formula">2</ref>) with ( <ref type="formula">7</ref>) (closeness is tuned by the scalar &#954; &gt; 0). The number of edges in the sought Laplacian is benchmarked by m (cf. (8b)), while &#945; &gt; 0 weighs the degree of smoothness in ( <ref type="formula">2</ref>) and <ref type="bibr">(7)</ref>.</p><p>The problem in ( <ref type="formula">8</ref>) is convex and can be solved using either well known optimization solvers (e.g interior point schemes), or, the alternating direction method of multipliers (ADMM) method that can also afford for distributed implementation; we will henceforth refere to this ADMM solver of ( <ref type="formula">8</ref>)) as Algorithm 2. Note also that upon recovering the Laplacian, the adjacency matrix can be readily obtained.</p><p>Algorithm 1 ALS for <ref type="bibr">(6)</ref> 1: Initialize l = 0, U (l) , V (l) , S (l) 2: while not converged do 3:</p><p>6: l = l + 1 7: end while Computational complexity. The U and V updates under Algorithm 1 are available in closed form, while S is updated entry wise using the soft threshold operator <ref type="bibr">[37]</ref>. Considering R F N , the most computationally expensive operation is matrix multiplication between N &#215; N and N &#215; R matrices. Therefore, the overall complexity per step is O(N 2 R) <ref type="bibr">[38]</ref>.</p><p>Algorithm 2 minimizes a convex function under (8b), (8c), and (8d) that can be expressed as linear constraints <ref type="bibr">[39]</ref>. This implies that (8) can be efficiently solved in O(log( <ref type="formula">1</ref>)) interior point sub-iterations; see e.g., <ref type="bibr">[40]</ref>, <ref type="bibr">[41]</ref>, <ref type="bibr">[42]</ref>, <ref type="bibr">[43]</ref>, <ref type="bibr">[44]</ref>. Accounting also for the SVD computations O(N 3 ), and matrix multiplications, the overall complexity is O(log( 1 )N 3 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. NUMERICAL TESTS</head><p>This section evaluates the anomaly identification performance of the novel methods with real and synthetic graph datasets.</p><p>Stochastic block model (SBM). SBMs are synthetic networks that consist of C communities. The feature matrix is formed by the F eigenvectors of the Laplacian matrix that correspond to the F smallest eigenvalues. Such features are smooth with respect to the graph <ref type="bibr">[25]</ref>. The generated stochastic block model networks are denoted as SBM(C,F ).</p><p>Aarhus network. The Aarhus dataset is a social network that consists of 32 employees (nodes) from the Computer Science Department of Aarhus University and represents their lunch relations (edges) <ref type="bibr">[45]</ref>. The 32&#215;26 feature matrix X is a binary matrix that represents the lunch relationships between the 32 employees from Computer Science department and another group of 26 people.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Evaluation results</head><p>In the SBM, anomalous edges are randomly added between nodes that belong to different communities, while in the Aarhus dataset, anomalous edges are inserted randomly between any two nodes. After the optimization, edges with negative (nonzero) values in S are collected in the set &#202;. Edges in &#202; are sorted in ascending order so that possible anomalous edges are ordered by probability of correct detection. The tests are repeated 10 times and the average percentage values of the following metrics are reported.  Hit rate@10 (H@10), which indicates how many edges are correctly identified as anomalous, given the first 10 candidate anomalous edges of &#202;.</p><p>Mean reciprocal rank (MRR). Each edge of &#202; receives a reciprocal rank score (if correctly identified as anomalous) based on its position in &#202;: 1 for first place, 1/2 for second place, 1/3 for third place, and so on. MRR is the average of reciprocal ranks of all correctly identified anomalous edges.</p><p>Table <ref type="table">I</ref> illustrates the best evaluation results for different methods after systematic parameter search. The following baseline methods are considered. The method in (2) that is employed by various approaches, e.g <ref type="bibr">[7]</ref>, and does not consider feature smoothness and a random-guess method that selects a random subset of the edges as anomalous.</p><p>Both proposed methods significantly outperform the baseline, which speaks for the importance of smoothness constraints for anomaly identification. The improvements on H@10 accuracy metric compared to the baseline ranges from 1.9 times higher (Aarhus, Algorithm 1) to 3.8 times higher (SBM(8,4), Algorithm 2). Improvements on MRR ranges from 1.2 times (SBM <ref type="bibr">(8,</ref><ref type="bibr">4)</ref>, Algorithm 1) to 3.5 times higher (Aarhus, Algorithm 2). The highest values of H@10 and MRR are observed in the Aarhus dataset (&#8764; 72% and 88.4% respectively). In all cases, our second algorithm outperforms the first one in the MRR metric due to its ability to approximate the exact topology of the original graph, which enhances the anomaly identification performance. On the other hand, although Algorithm 2 achieves better anomaly identification performance, it is less computationally efficient compared to Algorithm 1.</p><p>Finally, Figure <ref type="figure">2</ref> illustrates the estimated graph topology by the second method, compared to the ground truth topology on two datasets. As a baseline method, method (7) is used, which is employed by various approaches, e.g. <ref type="bibr">[46]</ref>. These qualitative results confirm that the second method captures the main topological structure of the original graph, while its topology estimation is closer to the ground truth topology than the baseline method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSIONS AND FUTURE WORK</head><p>The present work aims at identifying anomalous edges on attributed networks. Two novel approaches are formulated to jointly leverage the low rank plus sparse decomposition of the Laplacian of the perturbed graph, along with feature smoothness. Extensive numerical tests corroborate that the novel adaptation of feature smoothness markedly improves the anomaly identification performance and the proposed methods outperform the baseline. Future research directions include robust topology identification, robust community detection by employing ego-tensors <ref type="bibr">[47]</ref>, and anomaly identification on dynamic and knowledge graphs.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University of Minnesota. Downloaded on July 09,2021 at 00:39:30 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
