skip to main content

Title: Automating parameter selection to avoid implausible biological pathway models

A common way to integrate and analyze large amounts of biological “omic” data is through pathway reconstruction: using condition-specific omic data to create a subnetwork of a generic background network that represents some process or cellular state. A challenge in pathway reconstruction is that adjusting pathway reconstruction algorithms’ parameters produces pathways with drastically different topological properties and biological interpretations. Due to the exploratory nature of pathway reconstruction, there is no ground truth for direct evaluation, so parameter tuning methods typically used in statistics and machine learning are inapplicable. We developed the pathway parameter advising algorithm to tune pathway reconstruction algorithms to minimize biologically implausible predictions. We leverage background knowledge in pathway databases to select pathways whose high-level structure resembles that of manually curated biological pathways. At the core of this method is a graphlet decomposition metric, which measures topological similarity to curated biological pathways. In order to evaluate pathway parameter advising, we compare its performance in avoiding implausible networks and reconstructing pathways from the NetPath database with other parameter selection methods across four pathway reconstruction algorithms. We also demonstrate how pathway parameter advising can guide reconstruction of an influenza host factor network. Pathway parameter advising is method agnostic; it more » is applicable to any pathway reconstruction algorithm with tunable parameters.

« less
Publication Date:
Journal Name:
npj Systems Biology and Applications
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Although knowledge of biological pathways is essential for interpreting results from computational biology studies, the growing number of pathway databases complicates efforts to efficiently perform pathway analysis due to high redundancies among pathways from different databases, and inconsistencies in how pathways are created and named. We introduce the PAthway Communities (PAC) framework, which reconciles pathways from different databases and reduces pathway redundancy by revealing informative groups with distinct biological functions. Uniquely applying the Louvain community detection algorithm to a network of 4847 pathways from KEGG, REACTOME and Gene Ontology databases, we identify 35 distinct and automatically annotated communities of pathways and show that they are consistent with expert-curated pathway categories. Further, we demonstrate that our pathway community network can be queried with new gene sets to provide biological context in terms of related pathways and communities. Our approach, combined with an interpretable web tool we provide, will help computational biologists more efficiently contextualize and interpret their biological findings.

  2. Abstract Motivation

    Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised.


    In this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions.

    Availability and implementation

    EnGRaiN software andmore »the datasets used in the study are publicly available at the github repository:

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  3. Abstract

    The non-random interaction pattern of a protein–protein interaction network (PIN) is biologically informative, but its potentials have not been fully utilized in omics studies. Here, we propose a network-permutation-based association study (NetPAS) method that gauges the observed interactions between two sets of genes based on the comparison between permutation null models and the empirical networks. This enables NetPAS to evaluate relationships, constrained by network topology, between gene sets related to different phenotypes. We demonstrated the utility of NetPAS in 50 well-curated gene sets and comparison of association studies using Z-scores, modified Zʹ-scores, p-values and Jaccard indices. Using NetPAS, a weighted human disease network was generated from the association scores of 19 gene sets from OMIM. We also applied NetPAS in gene sets derived from gene ontology and pathway annotations and showed that NetPAS uncovered functional terms missed by DAVID and WebGestalt. Overall, we show that NetPAS can take topological constraints of molecular networks into account and offer new perspectives than existing methods.

  4. Abstract Motivation

    Spectral unmixing methods attempt to determine the concentrations of different fluorophores present at each pixel location in an image by analyzing a set of measured emission spectra. Unmixing algorithms have shown great promise for applications where samples contain many fluorescent labels; however, existing methods perform poorly when confronted with autofluorescence-contaminated images.


    We propose an unmixing algorithm designed to separate fluorophores with overlapping emission spectra from contamination by autofluorescence and background fluorescence. First, we formally define a generalization of the linear mixing model, called the affine mixture model (AMM), that specifically accounts for background fluorescence. Second, we use the AMM to derive an affine nonnegative matrix factorization method for estimating fluorophore endmember spectra from reference images. Lastly, we propose a semi-blind sparse affine spectral unmixing (SSASU) algorithm that uses knowledge of the estimated endmembers to learn the autofluorescence and background fluorescence spectra on a per-image basis. When unmixing real-world spectral images contaminated by autofluorescence, SSASU greatly improved proportion indeterminacy as compared to existing methods for a given relative reconstruction error.

    Availability and implementation

    The source code used for this paper was written in Julia and is available with the test data at

  5. We investigate the problem of simultaneous parameter identification and mapping of a spatially distributed field using a mobile sensor network. We first develop a parametrized model that represents the spatially distributed field. Based on the model, a recursive least squares algorithm is developed to achieve online parameter identification. Next, we design a global state observer, which uses the estimated parameters, together with data collected by the mobile sensor network, to real-timely reconstruct the whole spatial-temporal varying field. Since the performance of the parameter identification and map reconstruction algorithms depends on the trajectories of the mobile sensors, we further develop a Lyapunov redesign based online trajectory planning algorithm for the mobile sensor network so that the mobile sensors can use local real-time information to guide them to move along information-rich paths that can improve the performance of the parameter identification and map construction. Lastly, a cooperative filtering scheme is developed to provide the state estimates of the spatially distributed field, which enables the recursive least squares method. To test the proposed algorithms in realistic scenarios, we first build a CO2 diffusion field in a lab and construct a sensor network to measure the field concentration over time. We then validate themore »algorithms in the reconstructed CO2 field in simulation. Simulation results demonstrate the efficiency of the proposed method.« less