skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2003924

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% F1-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph. 
    more » « less
  2. Network alignment is a fundamental task in many high-impact applications. Most of the existing approaches either explicitly or implicitly consider the alignment matrix as a linear transformation to map one network to another, and might overlook the complicated alignment relationship across networks. On the other hand, node representation learning based alignment methods are hampered by the incomparability among the node representations of different networks. In this paper, we propose a unified semi-supervised deep model (ORIGIN) that simultaneously finds the non-rigid network alignment and learns node representations in multiple networks in a mutually beneficial way. The key idea is to learn node representations by the effective graph convolutional networks, which subsequently enable us to formulate network alignment as a point set alignment problem. The proposed method offers two distinctive advantages. First (node representations), unlike the existing graph convolutional networks that aggregate the node information within a single network, we can effectively aggregate the auxiliary information from multiple sources, achieving far-reaching node representations. Second (network alignment), guided by the highquality node representations, our proposed non-rigid point set alignment approach overcomes the bottleneck of the linear transformation assumption. We conduct extensive experiments that demonstrate the proposed non-rigid alignment method is (1) effective, outperforming both the state-of-the-art linear transformation-based methods and node representation based methods, and (2) efficient, with a comparable computational time between the proposed multi-network representation learning component and its single-network counterpart. 
    more » « less
  3. Most existing feature selection methods select the top-ranked features according to certain criterion. However, without considering the redundancy among the features, the selected ones are frequently highly correlated with each other, which is detrimental to the performance. To tackle this problem, we propose a framework regarding adaptive redundancy minimization (ARM) for the feature selection. Unlike other feature selection methods, the proposed model has the following merits: (1) The redundancy matrix is adaptively constructed instead of presetting it as the priori information. (2) The proposed model could pick out the discriminative and nonredundant features via minimizing the global redundancy of the features. (3) ARM can reduce the redundancy of the features from both supervised and unsupervised perspectives. 
    more » « less
  4. Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact low-dimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the node-subgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to sub-optimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the cross-resolution cross-network context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multi-network downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficient and scalable, with a O(nlog(n)) time complexity for the base algorithm and a linear time complexity w.r.t. the number of nodes and edges of input networks for the accelerated version. Extensive experiments on real-world data show that our methods (1) are able to enable and enhance a variety of multi-network mining tasks, and (2) scale up to million-node networks. 
    more » « less
  5. Network embedding, which learns the low-dimensional representations of nodes, has gained significant research attention. Despite its superior empirical success, often measured by the prediction performance of downstream tasks (e.g., multi-label classification), it is unclear why a given embedding algorithm outputs the specific node representations, and how the resulting node representations relate to the structure of the input network. In this paper, we propose to discern the edge influence as the first step towards understanding skip-gram basd network embedding methods. For this purpose, we propose an auditing framework NEAR, whose key part includes two algorithms (NEAR-ADD and NEAR-DEL) to effectively and efficiently quantify the influence of each edge. Based on the algorithms, we further identify high-influential edges by exploiting the linkage between edge influence and the network structure. Experimental results demonstrate that the proposed algorithms (NEAR-ADD and NEAR-DEL) are significantly faster (up to 2, 000×) than straightforward methods with little quality loss. Moreover, the proposed framework can efficiently identify the most influential edges for network embedding in the context of downstream prediction task and adversarial attacking. 
    more » « less
  6. Network mining plays a pivotal role in many high-impact application domains, including information retrieval, healthcare, social network analysis, security and recommender systems. State-of-the-art offers a wealth of sophisticated network mining algorithms, many of which have been widely adopted in real-world with superior empirical performance. Nonetheless, they often lack effective and efficient ways to characterize how the results of a given mining task relate to the underlying network structure. In this paper, we introduce network derivative mining problem. Given the input network and a specific mining algorithm, network derivative mining finds a derivative network whose edges measure the influence of the corresponding edges of the input network on the mining results. We envision that network derivative mining could be beneficial in a variety of scenarios, ranging from explainable network mining, adversarial network mining, sensitivity analysis on network structure, active learning, learning with side information to counterfactual learning on networks. We propose a generic framework for network derivative mining from the optimization perspective and provide various instantiations for three classic network mining tasks, including ranking, clustering, and matrix completion. For each mining task, we develop effective algorithm for constructing the derivative network based on influence function analysis, with numerous optimizations to ensure a linear complexity in both time and space. Extensive experimental evaluation on real-world datasets demonstrates the efficacy of the proposed framework and algorithms. 
    more » « less
  7. Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction. Nevertheless, deep neural network leads to a rough representation regarding the inherent relationship of the data points. Therefore, it is still difficult for deep neural network to exploit the effective structure for direct clustering. To address this issue,we propose a robust embedded deep K-means clustering (REDKC) method. The proposed RED-KC approach utilizes the δ-norm metric to constrain the feature mapping process of the auto-encoder network, so that data are mapped to a latent feature space, which is more conducive to the robust clustering. Compared to the existing auto-encoder networks with the fixed prior, the proposed RED-KC is adaptive during the process of feature mapping. More importantly, the proposed RED-KC embeds the clustering process with the autoencoder network, such that deep feature extraction and clustering can be performed simultaneously. Accordingly, a direct and efficient clustering could be obtained within only one step to avoid the inconvenience of multiple separate stages, namely, losing pivotal information and correlation. Consequently, extensive experiments are provided to validate the effectiveness of the proposed approach. 
    more » « less