This content will become publicly available on April 25, 2023

Graph Sanitation with Application to Node Classification
The past decades have witnessed the prosperity of graph mining, with a multitude of sophisticated models and algorithms designed for various mining tasks, such as ranking, classification, clustering and anomaly detection. Generally speaking, the vast majority of the existing works aim to answer the following question, that is, given a graph, what is the best way to mine it? In this paper, we introduce the graph sanitation problem, to an- swer an orthogonal question. That is, given a mining task and an initial graph, what is the best way to improve the initially provided graph? By learning a better graph as part of the input of the mining model, it is expected to benefit graph mining in a variety of settings, ranging from denoising, imputation to defense. We formulate the graph sanitation problem as a bilevel optimization problem, and fur- ther instantiate it by semi-supervised node classification, together with an effective solver named GaSoliNe. Extensive experimental results demonstrate that the proposed method is (1) broadly appli- cable with respect to various graph neural network models and flexible graph modification strategies, (2) effective in improving the node classification accuracy on both the original and contaminated graphs in various perturbation scenarios. In particular, more »
Authors:
; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10332504
Journal Name:
Graph Sanitation with Application to Node Classification
Page Range or eLocation-ID:
1136 to 1147
2. Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M$+$ nodes; (2) preserving moremore »