Abstract Background Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. Results We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Conclusion Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.
more »
« less
Edge-colored directed subgraph enumeration on the connectome
Abstract Following significant advances in image acquisition, synapse detection, and neuronal segmentation in connectomics, researchers have extracted an increasingly diverse set of wiring diagrams from brain tissue. Neuroscientists frequently represent these wiring diagrams as graphs with nodes corresponding to a single neuron and edges indicating synaptic connectivity. The edges can contain “colors” or “labels”, indicating excitatory versus inhibitory connections, among other things. By representing the wiring diagram as a graph, we can begin to identify motifs, the frequently occurring subgraphs that correspond to specific biological functions. Most analyses on these wiring diagrams have focused on hypothesized motifs—those we expect to find. However, one of the goals of connectomics is to identify biologically-significant motifs that we did not previously hypothesize. To identify these structures, we need large-scale subgraph enumeration to find the frequencies of all unique motifs. Exact subgraph enumeration is a computationally expensive task, particularly in the edge-dense wiring diagrams. Furthermore, most existing methods do not differentiate between types of edges which can significantly affect the function of a motif. We propose a parallel, general-purpose subgraph enumeration strategy to count motifs in the connectome. Next, we introduce a divide-and-conquer community-based subgraph enumeration strategy that allows for enumeration per brain region. Lastly, we allow for differentiation of edges by types to better reflect the underlying biological properties of the graph. We demonstrate our results on eleven connectomes and publish for future analyses extensive overviews for the 26 trillion subgraphs enumerated that required approximately 9.25 years of computation time.
more »
« less
- PAR ID:
- 10350721
- Date Published:
- Journal Name:
- Scientific Reports
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2045-2322
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Graph Neural Networks (GNNs) have been widely deployed in various real-world applications. However, most GNNs are black-box models that lack explanations. One strategy to explain GNNs is through counterfactual explanation, which aims to find minimum perturbations on input graphs that change the GNN predictions. Existing works on GNN counterfactual explanations primarily concentrate on the local-level perspective (i.e., generating counterfactuals for each individual graph), which suffers from information overload and lacks insights into the broader cross-graph relationships. To address such issues, we propose GlobalGCE, a novel global-level graph counterfactual explanation method. GlobalGCE aims to identify a collection of subgraph mapping rules as counterfactual explanations for the target GNN. According to these rules, substituting certain significant subgraphs with their counterfactual subgraphs will change the GNN prediction to the desired class for most graphs (i.e., maximum coverage). Methodologically, we design a significant subgraph generator and a counterfactual subgraph autoencoder in our GlobalGCE, where the subgraphs and the rules can be effectively generated. Extensive experiments demonstrate the superiority of our GlobalGCE compared to existing baselines.more » « less
-
Recent advances in high-resolution connectomics provide researchers access to accurate reconstructions of vast neuronal circuits and brain networks for the first time. Neuroscientists anticipate analyzing these networks to gain a better understanding of information processing in the brain. In particular, scientists are interested in identifying specific network motifs, i.e., repeating subgraphs of the larger brain network that are believed to be neuronal building blocks. To analyze these motifs, it is crucial to review instances of a motif in the brain network and then map the graph structure to the detailed 3D reconstructions of the involved neurons and synapses. We present Vimo, an interactive visual approach to analyze neuronal motifs and motif chains in large brain networks. Experts can sketch network motifs intuitively in a visual interface and specify structural properties of the involved neurons and synapses to query large connectomics datasets. Motif instances (MIs) can be explored in high-resolution 3D renderings of the involved neurons and synapses. To reduce visual clutter and simplify the analysis of MIs, we designed a continuous focus&context metaphor inspired by continuous visual abstractions [MAAB∗18] that allows the user to transition from the highly-detailed rendering of the anatomical structure to views that emphasize the underlying motif structure and synaptic connectivity. Furthermore, Vimo supports the identification of motif chains where a motif is used repeatedly to form a longer synaptic chain. We evaluate Vimo in a user study with seven domain experts and an in-depth case study on motifs in the central complex (CX) of the fruit fly brain.more » « less
-
Connectomics, a subfield of neuroscience, reconstructs structural and functional brain maps at synapse-level resolution. These complex spatial maps consist of tree-like neurons interconnected by synapses. Motif analysis is a widely used method for identifying recurring subgraph patterns in connectomes. These motifs, thus, potentially represent fundamental units of information processing. However, existing computational tools often oversimplify neurons as mere nodes in a graph, disregarding their intricate morphologies. In this paper, we introduceMoMo, a novel interactive visualization framework for analyzingneuron morphology-aware motifsin large connectome graphs. First, we propose an advanced graph data structure that integrates both neuronal morphology and synaptic connectivity. This enables highly efficient, parallel subgraph isomorphism searches, allowing for interactive morphological motif queries. Second, we develop a sketch-based interface that facilitates the intuitive exploration of morphology-based motifs within our new data structure. Users can conduct interactive motif searches on state-of-the-art connectomes and visualize results as interactive 3D renderings. We present a detailed goal and task analysis for motif exploration in connectomes, incorporating neuron morphology. Finally, we evaluateMoMothrough case studies with four domain experts, who asses the tool’s usefulness and effectiveness in motif exploration, and relevance to real-world neuroscience research. The source code forMoMois availablehere.more » « less
-
Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% F1-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.more » « less
An official website of the United States government

