EnGRaiN : a supervised ensemble learning method for recovery of large-scale gene regulatory networks

Aluru, Maneesha; Shrivastava, Harsh; Chockalingam, Sriram P. (ORCID:0000000313587691); Shivakumar, Shruti; Aluru, Srinivas; Martelli, ed., Pier Luigi

doi:10.1093/bioinformatics/btab829

Abstract Motivation

Reconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised.

Results

In this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions.

Availability and implementation

EnGRaiN software and the datasets used in the study are publicly available at the github repository: https://github.com/AluruLab/EnGRaiN.

Supplementary information

Supplementary data are available at Bioinformatics online.

More Like this