skip to main content


Title: Origraph: Interactive Network Wrangling
Networks are a natural way of thinking about many datasets. The data on which a network is based, however, is rarely collected in a form that suits the analysis process, making it necessary to create and reshape networks. Data wrangling is widely acknowledged to be a critical part of the data analysis pipeline, yet interactive network wrangling has received little attention in the visualization research community. In this paper, we discuss a set of operations that are important for wrangling network datasets and introduce a visual data wrangling tool, Origraph, that enables analysts to apply these operations to their datasets. Key operations include creating a network from source data such as tables, reshaping a network by introducing new node or edge classes, filtering nodes or edges, and deriving new node or edge attributes. Our tool, Origraph, enables analysts to execute these operations with little to no programming, and to immediately visualize the results. Origraph provides views to investigate the network model, a sample of the network, and node and edge attributes. In addition, we introduce interfaces designed to aid analysts in specifying arguments for sensible network wrangling operations. We demonstrate the usefulness of Origraph in two Use Cases: first, we investigate gender bias in the film industry, and then the influence of money on the political support for the war in Yemen.  more » « less
Award ID(s):
1751238 1835904
NSF-PAR ID:
10148288
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Conference on Visual Analytics Science and Technology (VAST)
Page Range / eLocation ID:
81 to 92
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Edges in many real-world social/information networks are associated with rich text information (e.g., user-user communications or user-product reviews). However, mainstream network representation learning models focus on propagating and aggregating node attributes, lacking specific designs to utilize text semantics on edges. While there exist edge-aware graph neural networks, they directly initialize edge attributes as a feature vector, which cannot fully capture the contextualized text semantics of edges. In this paper, we propose Edgeformers, a framework built upon graph-enhanced Transformers, to perform edge and node representation learning by modeling texts on edges in a contextualized way. Specifically, in edge representation learning, we inject network information into each Transformer layer when encoding edge texts; in node representation learning, we aggregate edge representations through an attention mechanism within each node’s ego-graph. On five public datasets from three different domains, Edgeformers consistently outperform state-of-the-art baselines in edge classification and link prediction, demonstrating the efficacy in learning edge and node representations, respectively. 
    more » « less
  2. null (Ed.)
    Network embedding aims to automatically learn the node representations in networks. The basic idea of network embedding is to first construct a network to describe the neighborhood context for each node, and then learn the node representations by designing an objective function to preserve certain properties of the constructed context network. The vast majority of the existing methods, explicitly or implicitly, follow a pointwise design principle. That is, the objective can be decomposed into the summation of the certain goodness function over each individual edge of the context network. In this paper, we propose to go beyond such pointwise approaches, and introduce the ranking-oriented design principle for network embedding. The key idea is to decompose the overall objective function into the summation of a goodness function over a set of edges to collectively preserve their relative rankings on the context network. We instantiate the ranking-oriented design principle by two new network embedding algorithms, including a pairwise network embedding method PaWine which optimizes the relative weights of edge pairs, and a listwise method LiWine which optimizes the relative weights of edge lists. Both proposed algorithms bear a linear time complexity, making themselves scalable to large networks. We conduct extensive experimental evaluations on five real datasets with a variety of downstream learning tasks, which demonstrate that the proposed approaches consistently outperform the existing methods. 
    more » « less
  3. Abstract

    A key problem in social network analysis is to identify nonhuman interactions. State‐of‐the‐art bot‐detection systems like Botometer train machine‐learning models on user‐specific data. Unfortunately, these methods do not work on data sets in which only topological information is available. In this paper, we propose a new, purely topological approach. Our method removes edges that connect nodes exhibiting strong evidence of non‐human activity from publicly available electronic‐social‐network datasets, including, for example, those in the Stanford Network Analysis Project repository (SNAP). Our methodology is inspired by classic work in evolutionary psychology by Dunbar that posits upper bounds on the total strength of the set of social connections in which a single human can be engaged. We model edge strength with Easley and Kleinberg's topological estimate; label nodes as “violators” if the sum of these edge strengths exceeds a Dunbar‐inspired bound; and then remove the violator‐to‐violator edges. We run our algorithm on multiple social networks and show that our Dunbar‐inspired bound appears to hold for social networks, but not for nonsocial networks. Our cleaning process classifies 0.04% of the nodes of the Twitter‐2010 followers graph as violators, and we find that more than 80% of these violator nodes have Botometer scores of 0.5 or greater. Furthermore, after we remove the roughly 15 million violator‐violator edges from the 1.2‐billion‐edge Twitter‐2010 follower graph, 34% of the violator nodes experience a factor‐of‐two decrease in PageRank. PageRank is a key component of many graph algorithms such as node/edge ranking and graph sparsification. Thus, this artificial inflation would bias algorithmic output, and result in some incorrect decisions based on this output.

     
    more » « less
  4. null (Ed.)
    Finding node associations across different networks is the cornerstone behind a wealth of high-impact data mining applications. Traditional approaches are often, explicitly or implicitly, built upon the linearity and/or consistency assumptions. On the other hand, the recent network embedding based methods promise a natural way to handle the non-linearity, yet they could suffer from the disparate node embedding space of different networks. In this paper, we address these limitations and tackle cross-network node associations from a new angle, i.e., cross-network transformation. We ask a generic question: Given two different networks, how can we transform one network to another? We propose an end-to-end model that learns a composition of nonlinear operations so that one network can be transformed to another in a hierarchical manner. The proposed model bears three distinctive advantages. First (composite transformation), it goes beyond the linearity/consistency assumptions and performs the cross-network transformation through a composition of nonlinear computations. Second (representation power), it can learn the transformation of both network structures and node attributes at different resolutions while identifying the cross-network node associations. Third (generality), it can be applied to various tasks, including network alignment, recommendation, cross-layer dependency inference. Extensive experiments on different tasks validate and verify the effectiveness of the proposed model. 
    more » « less
  5. Graph neural networks (GNNs) have emerged as a powerful tool for modeling graph data due to their ability to learn a concise representation of the data by integrating the node attributes and link information in a principled fashion. However, despite their promise, there are several practical challenges that must be overcome to effectively use them for node classification problems. In particular, current approaches are vulnerable to different kinds of biases inherent in the graph data. First, if the class distribution is imbalanced, then the GNNs' loss function is biased towards classifying the majority class correctly rather than the minority class, which hurts the performance of the latter class. Second, due to homophily effect, the learned representation and subsequent downstream tasks may favor certain demographic groups over others when applied to social network data. To mitigate such biases, we propose a novel framework called Fairness-Aware Cost Sensitive Graph Convolutional Network (FACS-GCN) for classifying nodes in networks with skewed class distributions. Our approach combines a cost-sensitive exponential loss with an adversarial learning component to alleviate the ill-effects of both biases. The framework employs a stagewise additive modeling approach to ensure there is no significant loss in accuracy when imparting fairness into the GNN. Experimental results on 6 benchmark graph data demonstrate the effectiveness of FACS-GCN against comparable baseline methods in terms of promoting fairness while maintaining a high model accuracy on the majority of the datasets. 
    more » « less