skip to main content

Title: N2N: Network Derivative Mining
Network mining plays a pivotal role in many high-impact application domains, including information retrieval, healthcare, social network analysis, security and recommender systems. State-of-the-art offers a wealth of sophisticated network mining algorithms, many of which have been widely adopted in real-world with superior empirical performance. Nonetheless, they often lack effective and efficient ways to characterize how the results of a given mining task relate to the underlying network structure. In this paper, we introduce network derivative mining problem. Given the input network and a specific mining algorithm, network derivative mining finds a derivative network whose edges measure the influence of the corresponding edges of the input network on the mining results. We envision that network derivative mining could be beneficial in a variety of scenarios, ranging from explainable network mining, adversarial network mining, sensitivity analysis on network structure, active learning, learning with side information to counterfactual learning on networks. We propose a generic framework for network derivative mining from the optimization perspective and provide various instantiations for three classic network mining tasks, including ranking, clustering, and matrix completion. For each mining task, we develop effective algorithm for constructing the derivative network based on influence function analysis, with numerous optimizations to ensure a linear complexity in both time and space. Extensive experimental evaluation on real-world datasets demonstrates the efficacy of the proposed framework and algorithms.  more » « less
Award ID(s):
1947135 1715385 1651203 2003924
Author(s) / Creator(s):
Date Published:
Journal Name:
Page Range / eLocation ID:
861 to 870
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network embedding, which learns the low-dimensional representations of nodes, has gained significant research attention. Despite its superior empirical success, often measured by the prediction performance of downstream tasks (e.g., multi-label classification), it is unclear why a given embedding algorithm outputs the specific node representations, and how the resulting node representations relate to the structure of the input network. In this paper, we propose to discern the edge influence as the first step towards understanding skip-gram basd network embedding methods. For this purpose, we propose an auditing framework NEAR, whose key part includes two algorithms (NEAR-ADD and NEAR-DEL) to effectively and efficiently quantify the influence of each edge. Based on the algorithms, we further identify high-influential edges by exploiting the linkage between edge influence and the network structure. Experimental results demonstrate that the proposed algorithms (NEAR-ADD and NEAR-DEL) are significantly faster (up to 2, 000×) than straightforward methods with little quality loss. Moreover, the proposed framework can efficiently identify the most influential edges for network embedding in the context of downstream prediction task and adversarial attacking. 
    more » « less
  2. Jihe Wang, Yi He (Ed.)
    Influence propagation is a network phenomenon governing how information is diffused in a network. With the advent of deep learning, there has been growing interest in applying graph neural networks to extract salient feature representation of the nodes for a variety of network mining tasks, such as forecasting the virality of information cascade. Given the importance of social influence, this paper presents a novel deep learning framework called IP-GNN for simulating the information propagation process in a complex network and learning a node representation that embeds information about the diffusion process under the linear threshold model. Our framework employs a modified graph convolutional network architecture with adaptive diffusion kernel to capture long-range propagation of information along with an entropy-regularized mixture of loss functions to ensure accurate prediction and faster convergence of the learning algorithm. Experimental results on 4 real-world datasets show that the model accurately mimics the output of the linear threshold model, achieving an average accuracy that exceeds 90\% on all datasets. 
    more » « less
  3. null (Ed.)
    Ranking on networks plays an important role in many high-impact applications, including recommender systems, social network analysis, bioinformatics and many more. In the age of big data, a recent trend is to address the variety aspect of network ranking. Among others, two representative lines of research include (1) heterogeneous information network with different types of nodes and edges, and (2) network of networks with edges at different resolutions. In this paper, we propose a new network model named Network of Heterogeneous Information Networks (NeoHIN for short) that is capable of simultaneously modeling both different types of nodes/edges, and different edge resolutions. We further propose two new ranking algorithms on NeoHIN based on the cross-domain consistency principle. Experiments on synthetic and real-world networks show that our proposed algorithms are (1) effective, which outperform other existing methods, and (2) efficient, without additional time cost per iteration to their counterparts. 
    more » « less
  4. Multi-sourced networks naturally appear in many application domains, ranging from bioinformatics, social networks, neuroscience to management. Although state-of-the-art offers rich models and algorithms to find various patterns when input networks are given, it has largely remained nascent on how vulnerable the mining results are due to the adversarial attacks. In this paper, we address the problem of attacking multi-network mining through the way of deliberately perturbing the networks to alter the mining results. The key idea of the proposed method (ADMIRING) is effective influence functions on the Sylvester equation defined over the input networks, which plays a central and unifying role in various multi-network mining tasks. The proposed algorithms bear two main advantages, including (1) effectiveness, being able to accurately quantify the rate of change of the mining results in response to attacks; and (2) generality, being applicable to a variety of multi-network mining tasks ( e.g., graph kernel, network alignment, cross-network node similarity) with different attacking strategies (e.g., edge/node removal, attribute alteration). 
    more » « less
  5. Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm. 
    more » « less