We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.
more »
« less
Interactive Bicluster Aggregation in Bipartite Graphs
Exploring coordinated relationships is important for sense making of data in various fields, such as intelligence analysis. To support such investigations, visual analysis tools use biclustering to mine relationships in bipartite graphs and visualize the resulting biclusters with standard graph visualization techniques. Due to overlaps among biclusters, such visualizations can be cluttered (e.g., with many edge crossings), when there are a large number of biclusters. Prior work attempted to resolve this problem by automatically ordering nodes in a bipartite graph. However, visual clutter is still a serious problem, since the number of displayed biclusters remains unchanged. We propose bicluster aggregation as an alternative approach, and have developed two methods of interactively merging biclusters. These interactive bicluster aggregations help organize similar biclusters and reduce the number of displayed biclusters. Initial expert feedback indicates potential usefulness of these techniques in practice.
more »
« less
- Award ID(s):
- 1850036
- PAR ID:
- 10133204
- Date Published:
- Journal Name:
- 2019 IEEE Visualization Conference (VIS)
- Page Range / eLocation ID:
- 246 to 250
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider how to generate graphs of arbitrary size whose chromatic numbers can be chosen (or are well-bounded) for testing graph coloring algorithms on parallel computers. For the distance-1 graph coloring problem, we identify three classes of graphs with this property. The first is the Erdős-Rényi random graph with prescribed expected degree, where the chromatic number is known with high probability. It is also known that the Greedy algorithm colors this graph using at most twice the number of colors as the chromatic number. The second is a random geometric graph embedded in hyperbolic space where the size of the maximum clique provides a tight lower bound on the chromatic number. The third is a deterministic graph described by Mycielski, where the graph is recursively constructed such that its chromatic number is known and increases with graph size, although the size of the maximum clique remains two. For Jacobian estimation, we bound the distance-2 chromatic number of random bipartite graphs by considering its equivalence to distance-1 coloring of an intersection graph. We use a “balls and bins” probabilistic analysis to establish a lower bound and an upper bound on the distance-2 chromatic number. The regimes of graph sizes and probabilities that we consider are chosen to suit the Jacobian estimation problem, where the number of columns and rows are asymptotically nearly equal, and have number of nonzeros linearly related to the number of columns. Computationally we verify the theoretical predictions and show that the graphs are often be colored optimally by the serial and parallel Greedy algorithms.more » « less
-
Shaharudin, Shazlin (Ed.)Objective To apply biclustering, a methodology originally developed for analysis of gene expression data, to simultaneously cluster observations and clinical features to explore candidate phenotypes of knee osteoarthritis (KOA) for the first time. Methods Data from the baseline Osteoarthritis Initiative (OAI) visit were cleaned, transformed, and standardized as indicated (leaving 6461 knees with 86 features). Biclustering produced submatrices of the overall data matrix, representing similar observations across a subset of variables. Statistical validation was determined using the novel SigClust procedure. After identifying biclusters, relationships with key outcome measures were assessed, including progression of radiographic KOA, total knee arthroplasty, loss of joint space width, and worsening Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) scores, over 96 months of follow-up. Results The final analytic set included 6461 knees from 3330 individuals (mean age 61 years, mean body mass index 28 kg/m 2 , 57% women and 86% White). We identified 6 mutually exclusive biclusters characterized by different feature profiles at baseline, particularly related to symptoms and function. Biclusters represented overall better (#1), similar (#2, 3, 6), and poorer (#4, 5) prognosis compared to the overall cohort of knees, respectively. In general, knees in biclusters #4 and 5 had more structural progression (based on Kellgren-Lawrence grade, total knee arthroplasty, and loss of joint space width) but tended to have an improvement in WOMAC pain scores over time. In contrast, knees in bicluster #1 had less incident and progressive KOA, fewer total knee arthroplasties, less loss of joint space width, and stable pain scores compared with the overall cohort. Significance We identified six biclusters within the baseline OAI dataset which have varying relationships with key outcomes in KOA. Such biclusters represent potential phenotypes within the larger cohort and may suggest subgroups at greater or lesser risk of progression over time.more » « less
-
Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging topic. In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem. The first module is a semantic transformation module utilized to capture semantic embedded relation features, by translating visual features and linguistic features into a common semantic space. The other module is a graph self-attention module introduced to embed a joint graph representation through assigning various importance weights to neighboring nodes. Finally, accurate scene graphs are produced by the relation inference module to recognize all entities and the corresponding relations. We evaluate our proposed method on the widely-adopted Visual Genome dataset, and the results demonstrate the effectiveness and superiority of our model.more » « less
-
Bipartite graphs are a powerful tool for modeling the interactions between two distinct groups. These bipartite relationships often feature small, recurring structural patterns called motifs which are building blocks for community structure. One promising structure is the induced 6-cycle which consists of three nodes on each node set forming a cycle where each node has exactly two edges. In this paper, we study the problem of counting and utilizing induced 6-cycles in large bipartite networks. We first consider two adaptations inspired by previous works for cycle counting in bipartite networks. Then, we introduce a new approach for node triplets which offer a systematic way to count the induced 6-cycles, used in BATCHTRIPLETJOIN. Our experimental evaluation shows that BATCHTRIPLETJOIN is significantly faster than the other algorithms while being scalable to large graph sizes and number of cores. On a network with 112M edges, BATCHTRIPLETJOIN is able to finish the computation in 78 mins by using 52 threads. In addition, we provide a new way to identify anomalous node triplets by comparing and contrasting the butterfly and induced 6-cycle counts of the nodes. We showcase several case studies on real-world networks from Amazon Kindle ratings, Steam game reviews, and Yelp ratings.more » « less
An official website of the United States government

