skip to main content


This content will become publicly available on August 23, 2024

Title: Efficient community detection in multilayer networks using boolean compositions
Networks (or graphs) are used to model the dyadic relations between entities in complex systems. Analyzing the properties of the networks reveal important characteristics of the underlying system. However, in many disciplines, including social sciences, bioinformatics, and technological systems, multiple relations exist between entities. In such cases, a simple graph is not sufficient to model these multiple relations, and a multilayer network is a more appropriate model. In this paper, we explore community detection in multilayer networks. Specifically, we propose a novel network decoupling strategy for efficiently combining the communities in the different layers using the Boolean primitives AND, OR, and NOT. Our proposed method, network decoupling, is based on analyzing the communities in each network layer individually and then aggregating the analysis results. We (i) describe our network decoupling algorithms for finding communities, (ii) present how network decoupling can be used to express different types of communities in multilayer networks, and (iii) demonstrate the effectiveness of using network decoupling for detecting communities in real-world and synthetic data sets. Compared to other algorithms for detecting communities in multilayer networks, our proposed network decoupling method requires significantly lower computation time while producing results of high accuracy. Based on these results, we anticipate that our proposed network decoupling technique will enable a more detailed analysis of multilayer networks in an efficient manner.  more » « less
Award ID(s):
1955971 1956373
NSF-PAR ID:
10463611
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Frontiers in Big Data
Volume:
6
ISSN:
2624-909X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fu, Feng (Ed.)

    With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest thatKRTAP3-1,KRTAP3-3, andKRTAP3-5share regulatory elements in skin and pancreas. Furthermore, we find thatCELA3AandCELA3Bshare associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.

     
    more » « less
  2. We study the problem of representation learning for multiple types of entities in a co-ordered network where order relations exist among entities of the same type, and association relations exist across entities of different types. The key challenge in learning co-ordered network embedding is to preserve order relations among entities of the same type while leveraging on the general consistency in order relations between different entity types. In this paper, we propose an embedding model, CO2Vec, that addresses this challenge using mutually reinforced order dependencies. Specifically, CO2Vec explores indirect order dependencies as supplementary evidence to enhance order representation learning across different types of entities. We conduct extensive experiments on both synthetic and real world datasets to demonstrate the robustness and effectiveness of CO2Vec against several strong baselines in link prediction task. We also design a comprehensive evaluation framework to study the performance of CO2Vec under different settings. In particular, our results show the robustness of CO2Vec with the removal of order relations from the original networks. 
    more » « less
  3. null (Ed.)
    Understanding who blames or supports whom in news text is a critical research question in computational social science. Traditional methods and datasets for sentiment analysis are, however, not suitable for the domain of political text as they do not consider the direction of sentiments expressed between entities. In this paper, we propose a novel NLP task of identifying directed sentiment relationship between political entities from a given news document, which we call directed sentiment extraction. From a million-scale news corpus, we construct a dataset of news sentences where sentiment relations of political entities are manually annotated. We present a simple but effective approach for utilizing a pretrained transformer, which infers the target class by predicting multiple question-answering tasks and combining the outcomes. We demonstrate the utility of our proposed method for social science research questions by analyzing positive and negative opinions between political entities in two major events: 2016 U.S. presidential election and COVID-19. The newly proposed problem, data, and method will facilitate future studies on interdisciplinary NLP methods and applications. 
    more » « less
  4. Many machine learning problems come in the form of networks with relational data between entities, and one of the key unsupervised learning tasks is to detect communities in such a network. We adopt the mixed-membership stochastic blockmodel as the underlying probabilistic model, and give conditions under which the memberships of a subset of nodes can be uniquely identified. Our method starts by constructing a second-order graph moment, which can be shown to converge to a specific product of the true parameters as the size of the network increases. To correctly recover the true membership parameters, we formulate an optimization problem using insights from convex geometry. We show that if the true memberships satisfy a so-called sufficiently scattered condition, then solving the proposed problem correctly identifies the ground truth. We also propose an efficient algorithm for detecting communities, which is significantly faster than prior work and with better convergence properties. Experiments on synthetic and real data justify the validity of the proposed learning framework for network data. 
    more » « less
  5. The complex relationships in an urban environment can be captured through multiple interrelated sources of data. These relationships form multilayer networks, that are also spatially embedded in an area, could be used to identify latent patterns. In this work, we propose a low-dimensional representation learning approach that considers multiple layers of a multiplex network simultaneously and is able to encode similarities between nodes across different layers. In particular, we introduce a novel neural network architecture to jointly learn low-dimensional representations of each network node from multiple layers of a network. This process simultaneously fuses knowledge of various data sources to better capture the characteristics of the nodes. To showcase the proposed method we focus on the problem of identifying the functionality of an urban region. Using a variety of public data sources for New York City, we design a multilayer network and evaluate our approach. Our results indicate that our proposed approach can improve the accuracy of traditional approaches in an unsupervised task. 
    more » « less