skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning low-rank latent mesoscale structures in networks
Abstract Researchers in many fields use networks to represent interactions between entities in complex systems. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. In this paper, we present an approach to describe low-rank mesoscale structures in networks. We find that many real-world networks possess a small set of latent motifs that effectively approximate most subgraphs at a fixed mesoscale. Such low-rank mesoscale structures allow one to reconstruct networks by approximating subgraphs of a network using combinations of latent motifs. Employing subgraph sampling and nonnegative matrix factorization enables the discovery of these latent motifs. The ability to encode and reconstruct networks using a small set of latent motifs has many applications in network analysis, including network comparison, network denoising, and edge inference.  more » « less
Award ID(s):
2023239 2232241 1922952
PAR ID:
10484080
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network data has become widespread, larger, and more complex over the years. Traditional network data is dyadic, capturing the relations among pairs of entities. With the need to model interactions among more than two entities, significant research has focused on higher-order networks and ways to represent, analyze, and learn from them. There are two main directions to studying higher-order networks. One direction has focused on capturing higher-order patterns in traditional (dyadic) graphs by changing the basic unit of study from nodes to small frequently observed subgraphs, called motifs. As most existing network data comes in the form of pairwise dyadic relationships, studying higher-order structures within such graphs may uncover new insights. The second direction aims to directly model higher-order interactions using new and more complex representations such as simplicial complexes or hypergraphs. Some of these models have long been proposed, but improvements in computational power and the advent of new computational techniques have increased their popularity. Our goal in this paper is to provide a succinct yet comprehensive summary of the advanced higher-order network analysis techniques. We provide a systematic review of the foundations and algorithms, along with use cases and applications of higher-order networks in various scientific domains. 
    more » « less
  2. Abstract When people are asked to recall their social networks, theoretical and empirical work tells us that they rely on shortcuts, or heuristics. Cognitive social structures (CSSs) are multilayer social networks where each layer corresponds to an individual’s perception of the network. With multiple perceptions of the same network, CSSs contain rich information about how these heuristics manifest, motivating the question,Can we identify people who share the same heuristics?In this work, we propose a method for identifyingcognitive structureacross multiple network perceptions, analogous to how community detection aims to identifysocial structurein a network. To simultaneously model the joint latent social and cognitive structure, we study CSSs as three-dimensional tensors, employing low-rank nonnegative Tucker decompositions (NNTuck) to approximate the CSS—a procedure closely related to estimating a multilayer stochastic block model (SBM) from such data. We propose the resulting latent cognitive space as an operationalization of the sociological theory ofsocial cognitionby identifying individuals who sharerelational schema. In addition to modeling cognitivelyindependent,dependent, andredundantnetworks, we propose a specific model instance and related statistical test for testing when there issocial-cognitive agreementin a network: when the social and cognitive structures are equivalent. We use our approach to analyze four different CSSs and give insights into the latent cognitive structures of those networks. 
    more » « less
  3. Summary Latent space models are frequently used for modelling single-layer networks and include many popular special cases, such as the stochastic block model and the random dot product graph. However, they are not well developed for more complex network structures, which are becoming increasingly common in practice. In this article we propose a new latent space model for multiplex networks, i.e., multiple heterogeneous networks observed on a shared node set. Multiplex networks can represent a network sample with shared node labels, a network evolving over time, or a network with multiple types of edges. The key feature of the proposed model is that it learns from data how much of the network structure is shared between layers and pools information across layers as appropriate. We establish identifiability, develop a fitting procedure using convex optimization in combination with a nuclear-norm penalty, and prove a guarantee of recovery for the latent positions provided there is sufficient separation between the shared and the individual latent subspaces. We compare the model with competing methods in the literature on simulated networks and on a multiplex network describing the worldwide trade of agricultural products. 
    more » « less
  4. Abstract Background Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. Results We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Conclusion Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies. 
    more » « less
  5. Leveraging protein-protein interaction networks to identify groups of proteins and their common functionality is an important problem in bioinformatics. Systems-level analysis of protein-protein interactions is made possible through network science and modeling of high-throughput data. From these analyses, small protein complexes are traditionally represented graphically as complete graphs or dense clusters of nodes. However, there are certain graph theoretic properties that have not been extensively studied in PPI networks, especially as they pertain to cluster discovery, such as planarity. Planarity of graphs have been used to reflect the physical constraints of real-world systems outside of bioinformatics, in areas such as mapping and imaging. Here, we investigate the planarity property in network models of protein complexes. We hypothesize that complexes represented as PPI subgraphs will tend to be planar, reflecting the actual physical interface and limits of components in the complex. When testing the planarity of known complex subgraphs in S. cerevisiae and selected mammalian PPIs, we find that a majority of validated complexes possess this planar property. We discuss the biological motivation of planar versus nonplanar subgraphs, observing that planar subgraphs tend to have longer protein components. Functional classification of planar versus nonplanar complex subgraphs reveals differences in annotation of these groups relating to cellular component organization, structural molecule activity, catalytic activity, and nucleic acid binding. These results provide a new quantitative and biologically motivated measure of real protein complexes in the network model, important for the development of future complex-finding algorithms in PPIs. Accounting for this property paves the way to new means for discovering new protein complexes and uncovering the functionality of unknown or novel proteins. s 
    more » « less