We consider the community search problem defined upon a large graph G: given a query vertex q in G, to find as output all the densely connected subgraphs of G, each of which contains the query v. As an online, query-dependent variant of the well-known community detection problem, community search enables personalized community discovery that has found widely varying applications in real-world, large-scale graphs. In this paper, we study the community search problem in the truss-based model aimed at discovering all dense and cohesive k-truss communities to which the query vertex q belongs. We introduce a novel equivalence relation, k-truss equivalence, to model the intrinsic density and cohesiveness of edges in k-truss communities. Consequently, all the edges of G can be partitioned to a series of k-truss equivalence classes that constitute a space-efficient, truss-preserving index structure, EquiTruss. Community search can be henceforth addressed directly upon EquiTruss without repeated, time-demanding accesses to the original graph, G, which proves to be theoretically optimal. In addition, EquiTruss can be efficiently updated in a dynamic fashion when G evolves with edge insertion and deletion. Experimental studies in real-world, large-scale graphs validate the efficiency and effectiveness of EquiTruss, which has achieved at least an order of magnitude speedup in community search over the state-of-the-art method, TCP-Index.
more »
« less
Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection
Abstract Dynamic (temporal) graphs are a convenient mathematical abstraction for many practical complex systems including social contacts, business transactions, and computer communications. Community discovery is an extensively used graph analysis kernel with rich literature for static graphs. However, community discovery in a dynamic setting is challenging for two specific reasons. Firstly, the notion of temporal community lacks a widely accepted formalization, and only limited work exists on understanding how communities emerge over time. Secondly, the added temporal dimension along with the sheer size of modern graph data necessitates new scalable algorithms. In this paper, we investigate how communities evolve over time based on several graph metrics under a temporal formalization. We compare six different algorithmic approaches for dynamic community detection for their quality and runtime. We identify that a vertex-centric (local) optimization method works as efficiently as the classical modularity-based methods. To its advantage, such local computation allows for the efficient design of parallel algorithms without incurring a significant parallel overhead. Based on this insight, we design a shared-memory parallel algorithmDyComPar, which demonstrates between 4 and 18 fold speed-up on a multi-core machine with 20 threads, for several real-world and synthetic graphs from different domains.
more »
« less
- Award ID(s):
- 2323533
- PAR ID:
- 10494206
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Applied Network Science
- Volume:
- 8
- Issue:
- 1
- ISSN:
- 2364-8228
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Many algorithms for analyzing parallel programs, for example to detect deadlocks or data races or to calculate the execution cost, are based on a model variously known as a cost graph, computation graph or dependency graph, which captures the parallel structure of threads in a program. In modern parallel programs, computation graphs are highly dynamic and depend greatly on the program inputs and execution details. As such, most analyses that use these graphs are either dynamic analyses or are specialized static analyses that gather a subset of dependency information for a specific purpose. This paper introduces graph types, which compactly represent all of the graphs that could arise from program execution. Graph types are inferred from a parallel program using a graph type system and inference algorithm, which we present drawing on ideas from Hindley-Milner type inference, affine logic and region type systems. We have implemented the inference algorithm over a subset of OCaml, extended with parallelism primitives, and we demonstrate how graph types can be used to accelerate the development of new graph-based static analyses by presenting proof-of-concept analyses for deadlock detection and cost analysis.more » « less
-
A fundamental building block in any graph algorithm is agraph container -- a data structure used to represent the graph. Ideally, a graph container enables efficient access to the underlying graph, has low space usage, and supports updating the graph efficiently. In this paper, we conduct an extensive empirical evaluation of graph containers designed to support running algorithms on large graphs. To our knowledge, this is the firstapples-to-applescomparison of graph containers rather than overall systems, which include confounding factors such as differences in algorithm implementations and infrastructure. We measure the running time of 10 highly-optimized algorithms across over 20 different containers and 10 graphs. Somewhat surprisingly, we find that the average algorithm running time does not differ much across containers, especially those that support dynamic updates. Specifically, a simple container based on an off-the-shelf B-tree is only 1.22× slower on average than a highly optimized static one. Moreover, we observe that simplifying a graph-container Application Programming Interface (API) to only a few simple functions incurs a mere 1.16× slowdown compared to a complete API. Finally, we also measure batch-insert throughput in dynamic-graph containers for a full picture of their performance. To perform the benchmarks, we introduce BYO, a unified framework that standardizes evaluations of graph-algorithm performance across different graph containers. BYO extends the Graph Based Benchmark Suite (Dhulipala et al. 18), a state-of-the-art graph algorithm benchmark, to easily plug into different dynamic graph containers and enable fair comparisons between them on a large suite of graph algorithms. While several graph algorithm benchmarks have been developed to date, to the best of our knowledge, BYO is the first system designed to benchmark graph containers.more » « less
-
An evolving graph maintains the history of changes of graph topology and attribute values over time. Such a graph has a specific temporal and structural resolution. It is often useful to modify this resolution during analysis, for example, to consider communities rather than individual nodes, or to quantify changes at the level of days rather than hours. We propose attribute-based zoom and temporal window-based zoom -- two operators that support exploratory analysis of an evolving graph at different levels of resolution. We develop several alternative physical representations of an evolving property graph -- a temporal generalization of a property graph --- and detail how to implement the proposed zoom operators using dataflow operations. These different physical representations allow us to explore the trade-offs in temporal and structural locality with respect to the performance of the zoom operators. We implement the operators in Apache Spark, evaluate them on real evolving graph datasets, and demonstrate scalability to billion-edge graphs.more » « less
-
Networks (or graphs) are used to model the dyadic relations between entities in complex systems. Analyzing the properties of the networks reveal important characteristics of the underlying system. However, in many disciplines, including social sciences, bioinformatics, and technological systems, multiple relations exist between entities. In such cases, a simple graph is not sufficient to model these multiple relations, and a multilayer network is a more appropriate model. In this paper, we explore community detection in multilayer networks. Specifically, we propose a novel network decoupling strategy for efficiently combining the communities in the different layers using the Boolean primitives AND, OR, and NOT. Our proposed method, network decoupling, is based on analyzing the communities in each network layer individually and then aggregating the analysis results. We (i) describe our network decoupling algorithms for finding communities, (ii) present how network decoupling can be used to express different types of communities in multilayer networks, and (iii) demonstrate the effectiveness of using network decoupling for detecting communities in real-world and synthetic data sets. Compared to other algorithms for detecting communities in multilayer networks, our proposed network decoupling method requires significantly lower computation time while producing results of high accuracy. Based on these results, we anticipate that our proposed network decoupling technique will enable a more detailed analysis of multilayer networks in an efficient manner.more » « less
An official website of the United States government

