skip to main content

Search for: All records

Award ID contains: 1715385

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Algorithmic bias and fairness in the context of graph mining have largely remained nascent. The sparse literature on fair graph mining has almost exclusively focused on group-based fairness notation. However, the notion of individual fairness, which promises the fairness notion at a much finer granularity, has not been well studied. This paper presents the first principled study of Individual Fairness on gRaph Mining (InFoRM). First, we present a generic definition of individual fairness for graph mining which naturally leads to a quantitative measure of the potential bias in graph mining results. Second, we propose three mutually complementary algorithmic frameworks to mitigate the proposed individual bias measure, namely debiasing the input graph, debiasing the mining model and debiasing the mining results. Each algorithmic framework is formulated from the optimization perspective, using effective and efficient solvers, which are applicable to multiple graph mining tasks. Third, accommodating individual fairness is likely to change the original graph mining results without the fairness consideration. We conduct a thorough analysis to develop an upper bound to characterize the cost (i.e., the difference between the graph mining results with and without the fairness consideration). We perform extensive experimental evaluations on real-world datasets to demonstrate the efficacy andmore »generality of the proposed methods.« less
  2. Finding node associations across different networks is the cornerstone behind a wealth of high-impact data mining applications. Traditional approaches are often, explicitly or implicitly, built upon the linearity and/or consistency assumptions. On the other hand, the recent network embedding based methods promise a natural way to handle the non-linearity, yet they could suffer from the disparate node embedding space of different networks. In this paper, we address these limitations and tackle cross-network node associations from a new angle, i.e., cross-network transformation. We ask a generic question: Given two different networks, how can we transform one network to another? We propose an end-to-end model that learns a composition of nonlinear operations so that one network can be transformed to another in a hierarchical manner. The proposed model bears three distinctive advantages. First (composite transformation), it goes beyond the linearity/consistency assumptions and performs the cross-network transformation through a composition of nonlinear computations. Second (representation power), it can learn the transformation of both network structures and node attributes at different resolutions while identifying the cross-network node associations. Third (generality), it can be applied to various tasks, including network alignment, recommendation, cross-layer dependency inference. Extensive experiments on different tasks validate and verify the effectivenessmore »of the proposed model.« less
  3. Bug localization plays an important role in software quality control. Many supervised machine learning models have been developed based on historical bug-fix information. Despite being successful, these methods often require sufficient historical data (i.e., labels), which is not always available especially for newly developed software projects. In response, cross-project bug localization techniques have recently emerged whose key idea is to transferring knowledge from label-rich source project to locate bugs in the target project. However, a major limitation of these existing techniques lies in that they fail to capture the specificity of each individual project, and are thus prone to negative transfer.To address this issue, we propose an adversarial transfer learning bug localization approach, focusing on only transferring the common characteristics (i.e., public information) across projects. Specifically, our approach (CooBa) learns the indicative public information from cross-project bug reports through a shared encoder, and extracts the private information from code files by an individual feature extractor for each project. CooBa further incorporates adversarial learning mechanism to ensure that public information shared between multiple projects could be effectively extracted. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed CooBa significantly outperforms the state of the art techniques.

  4. Teams can be often viewed as a dynamic system where the team configuration evolves over time (e.g., new members join the team; existing members leave the team; the skills of the members improve over time). Consequently, the performance of the team might be changing due to such team dynamics. A natural question is how to plan the (re-)staffing actions (e.g., recruiting a new team member) at each time step so as to maximize the expected cumulative performance of the team. In this paper, we address the problem of real-time team optimization by intelligently selecting the best candidates towards increasing the similarity between the current team and the high-performance teams according to the team configuration at each time-step. The key idea is to formulate it as a Markov Decision process (MDP) problem and leverage recent advances in reinforcement learning to optimize the team dynamically. The proposed method bears two main advantages, including (1) dynamics, being able to model the dynamics of the team to optimize the initial team towards the direction of a high-performance team via performance feedback; (2) efficacy, being able to handle the large state/action space via deep reinforcement learning based value estimation. We demonstrate the effectiveness of themore »proposed method through extensive empirical evaluations.« less
  5. Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% F1-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.
  6. Network alignment is a fundamental task in many high-impact applications. Most of the existing approaches either explicitly or implicitly consider the alignment matrix as a linear transformation to map one network to another, and might overlook the complicated alignment relationship across networks. On the other hand, node representation learning based alignment methods are hampered by the incomparability among the node representations of different networks. In this paper, we propose a unified semi-supervised deep model (ORIGIN) that simultaneously finds the non-rigid network alignment and learns node representations in multiple networks in a mutually beneficial way. The key idea is to learn node representations by the effective graph convolutional networks, which subsequently enable us to formulate network alignment as a point set alignment problem. The proposed method offers two distinctive advantages. First (node representations), unlike the existing graph convolutional networks that aggregate the node information within a single network, we can effectively aggregate the auxiliary information from multiple sources, achieving far-reaching node representations. Second (network alignment), guided by the highquality node representations, our proposed non-rigid point set alignment approach overcomes the bottleneck of the linear transformation assumption. We conduct extensive experiments that demonstrate the proposed non-rigid alignment method is (1) effective, outperformingmore »both the state-of-the-art linear transformation-based methods and node representation based methods, and (2) efficient, with a comparable computational time between the proposed multi-network representation learning component and its single-network counterpart.« less
  7. Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction. Nevertheless, deep neural network leads to a rough representation regarding the inherent relationship of the data points. Therefore, it is still difficult for deep neural network to exploit the effective structure for direct clustering. To address this issue,we propose a robust embedded deep K-means clustering (REDKC) method. The proposed RED-KC approach utilizes the δ-norm metric to constrain the feature mapping process of the auto-encoder network, so that data are mapped to a latent feature space, which is more conducive to the robust clustering. Compared to the existing auto-encoder networks with the fixed prior, the proposed RED-KC is adaptive during the process of feature mapping. More importantly, the proposed RED-KC embeds the clustering process with the autoencoder network, such that deep feature extraction and clustering can be performed simultaneously. Accordingly, a direct and efficient clustering could be obtained within only one step to avoid the inconvenience of multiple separate stages, namely, losing pivotal information and correlation. Consequently, extensive experiments are provided to validate the effectiveness of the proposed approach.
  8. Network embedding, which learns the low-dimensional representations of nodes, has gained significant research attention. Despite its superior empirical success, often measured by the prediction performance of downstream tasks (e.g., multi-label classification), it is unclear why a given embedding algorithm outputs the specific node representations, and how the resulting node representations relate to the structure of the input network. In this paper, we propose to discern the edge influence as the first step towards understanding skip-gram basd network embedding methods. For this purpose, we propose an auditing framework NEAR, whose key part includes two algorithms (NEAR-ADD and NEAR-DEL) to effectively and efficiently quantify the influence of each edge. Based on the algorithms, we further identify high-influential edges by exploiting the linkage between edge influence and the network structure. Experimental results demonstrate that the proposed algorithms (NEAR-ADD and NEAR-DEL) are significantly faster (up to 2, 000×) than straightforward methods with little quality loss. Moreover, the proposed framework can efficiently identify the most influential edges for network embedding in the context of downstream prediction task and adversarial attacking.