Identification of community structures in complex network is of crucial importance for understanding the system’s function, organization, robustness and security. Here, we present a novel Ollivier-Ricci curvature (ORC) inspired approach to community identification in complex networks. We demonstrate that the intrinsic geometric underpinning of the ORC offers a natural approach to discover inherent community structures within a network based on interaction among entities. We develop an ORC-based community identification algorithm based on the idea of sequential removal of negatively curved edges symptomatic of high interactions (e.g., traffic, attraction). To illustrate and compare the performance with other community identification methods, we examine the ORC-based algorithm with stochastic block model artificial networks and real-world examples ranging from social to drug-drug interaction networks. The ORC-based algorithm is able to identify communities with either better or comparable performance accuracy and to discover finer hierarchical structures of the network. This opens new geometric avenues for analysis of complex networks dynamics.
Many complex networks in the real world have community structures – groups of well-connected nodes with important functional roles. It has been well recognized that the identification of communities bears numerous practical applications. While existing approaches mainly apply statistical or graph theoretical/combinatorial methods for community detection, in this paper, we present a novel geometric approach which enables us to borrow powerful classical geometric methods and properties. By considering networks as geometric objects and communities in a network as a geometric decomposition, we apply curvature and discrete Ricci flow, which have been used to decompose smooth manifolds with astonishing successes in mathematics, to break down communities in networks. We tested our method on networks with ground-truth community structures, and experimentally confirmed the effectiveness of this geometric approach.more » « less
- NSF-PAR ID:
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Reports
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Neural networks (NN) have become an important tool for prediction tasks—both regression and classification—in environmental science. Since many environmental-science problems involve life-or-death decisions and policy making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions. Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions. However, in recent years the computer-science field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science. We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely, to answer the question:
Once we obtain an uncertainty estimate (using any approach), how do we know whether it is good or bad?To answer this question, we highlight four evaluation graphics and eight evaluation scores that are well suited for evaluating and comparing uncertainty estimates (NN based or otherwise) for environmental-science applications. We demonstrate the UQ approaches and UQ-evaluation methods for two real-world problems: 1) estimating vertical profiles of atmospheric dewpoint (a regression task) and 2) predicting convection over Taiwan based on Himawari-8satellite imagery (a classification task). We also provide Jupyter notebooks with Python code for implementing the UQ approaches and UQ-evaluation methods discussed herein. This article provides the environmental-science community with the knowledge and tools to start incorporating the large number of emerging UQ methods into their research. Significance Statement
Neural networks are used for many environmental-science applications, some involving life-or-death decision-making. In recent years new methods have been developed to provide much-needed uncertainty estimates for NN predictions. We seek to accelerate the adoption of these methods in the environmental-science community with an accessible introduction to 1) methods for computing uncertainty estimates in NN predictions and 2) methods for evaluating such estimates.
Cellular biological networks represent the molecular interactions that shape function of living cells. Uncovering the organization of a biological network requires efficient and accurate algorithms to determine the components, termed communities, underlying specific processes. Detecting functional communities is challenging because reconstructed biological networks are always incomplete due to technical bias and biological complexity, and the evaluation of putative communities is further complicated by a lack of known ground truth. To address these challenges, we developed a geometric-based detection framework based on Ollivier-Ricci curvature to exploit information about network topology to perform community detection from partially observed biological networks. We further improved this approach by integrating knowledge of gene function, termed side information, into the Ollivier-Ricci curvature algorithm to aid in community detection. This approach identified essential conserved and varied biological communities from partially observed
Arabidopsisprotein interaction datasets better than the previously used methods. We show that Ollivier-Ricci curvature with side information identified an expanded auxin community to include an important protein stability complex, the Cop9 signalosome, consistent with previous reported links to auxin response and root development. The results show that community detection based on Ollivier-Ricci curvature with side information can uncover novel components and novel communities in biological networks, providing novel insight into the organization and function of complex networks.
Understanding the underlying structure of a gene regulatory network is crucial to understand the biological functions of genes or groups of genes. A common strategy to investigate it is to find community structure of these networks. However, methods of finding these communities are often sensitive to noise in the gene expression data and the inherent stochasticity of the community detection algorithms. Here we introduce an approach for identifying functional groups and their hierarchical organization in gene co-expression networks from expression data. A network describing the relatedness in the expression profiles of genes is first inferred using an information theoretic approach. Community structure within the inferred network is found by using
modularity maximization. This community structure is further refined using three-body structural correlations to robustly identify important functional gene communities. We apply this approach to the expression data of E. coligenes and identify 25 robust groups, many of which show key associations with important biological functions as demonstrated by gene ontology term enrichment analysis. Thus, our approach makes specific and novel predictions about the function of these genes.
A major goal of community ecology is understanding the processes responsible for generating biodiversity patterns along spatial and environmental gradients. In stream ecosystems, system‐specific conceptual frameworks have dominated research describing biodiversity change along longitudinal gradients of river networks. However, support for these conceptual frameworks has been mixed, mainly applicable to specific stream ecosystems and biomes, and these frameworks have placed less emphasis on general mechanisms driving biodiversity patterns. Rethinking biodiversity patterns and processes in stream ecosystems with a focus on the overarching mechanisms common across ecosystems will provide a more holistic understanding of why biodiversity patterns vary along river networks. In this study, we apply the theory of ecological communities (TEC) conceptual framework to stream ecosystems to focus explicitly on the core ecological processes structuring communities: dispersal, speciation, niche selection, and ecological drift. Using a unique case study from high‐elevation networks of connected lakes and streams, we sampled stream invertebrate communities in the Sierra Nevada, California, USA to test established stream ecology frameworks and compared them with the TEC framework. Local diversity increased and β‐diversity decreased moving downstream from the headwaters, consistent with the river continuum concept and the small but mighty framework of mountain stream biodiversity. Local diversity was also structured by distance below upstream lakes, where diversity increased with distance below upstream lakes, in support of the serial discontinuity concept. Despite some support for the biodiversity patterns predicted from the stream ecology frameworks, no single framework was fully supported, suggesting “context dependence.” By framing our results under the TEC, we found that species diversity was structured by niche selection, where local diversity was highest in environmentally favorable sites. Local diversity was also highest in sites with small community sizes, countering the predicted effects of ecological drift. Moreover, higher β‐diversity in the headwaters was influenced by dispersal and niche selection, where environmentally harsh and spatially isolated sites exhibit higher community variation. Taken together our results suggest that combining system‐specific ecological frameworks with the TEC provides a powerful approach for inferring the mechanisms driving biodiversity patterns and provides a path toward generalization of biodiversity research across ecosystems.