NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MCPNet: a parallel maximum capacity-based genome-scale gene network construction framework

https://doi.org/10.1093/bioinformatics/btad373

Pan, Tony C.; Chockalingam, Sriram P.; Aluru, Maneesha; Aluru, Srinivas; Cowen, ed., Lenore (June 2023, Bioinformatics)

Abstract MotivationGene network reconstruction from gene expression profiles is a compute- and data-intensive problem. Numerous methods based on diverse approaches including mutual information, random forests, Bayesian networks, correlation measures, as well as their transforms and filters such as data processing inequality, have been proposed. However, an effective gene network reconstruction method that performs well in all three aspects of computational efficiency, data size scalability, and output quality remains elusive. Simple techniques such as Pearson correlation are fast to compute but ignore indirect interactions, while more robust methods such as Bayesian networks are prohibitively time consuming to apply to tens of thousands of genes. ResultsWe developed maximum capacity path (MCP) score, a novel maximum-capacity-path-based metric to quantify the relative strengths of direct and indirect gene–gene interactions. We further present MCPNet, an efficient, parallelized gene network reconstruction software based on MCP score, to reverse engineer networks in unsupervised and ensemble manners. Using synthetic and real Saccharomyces cervisiae datasets as well as real Arabidopsis thaliana datasets, we demonstrate that MCPNet produces better quality networks as measured by AUPRC, is significantly faster than all other gene network reconstruction software, and also scales well to tens of thousands of genes and hundreds of CPU cores. Thus, MCPNet represents a new gene network reconstruction tool that simultaneously achieves quality, performance, and scalability requirements. Availability and implementationSource code freely available for download at https://doi.org/10.5281/zenodo.6499747 and https://github.com/AluruLab/MCPNet, implemented in C++ and supported on Linux.
more » « less
EnGRaiN : a supervised ensemble learning method for recovery of large-scale gene regulatory networks

https://doi.org/10.1093/bioinformatics/btab829

Aluru, Maneesha; Shrivastava, Harsh; Chockalingam, Sriram P.; Shivakumar, Shruti; Aluru, Srinivas; Martelli, ed., Pier Luigi (December 2021, Bioinformatics)

Abstract MotivationReconstruction of genome-scale networks from gene expression data is an actively studied problem. A wide range of methods that differ between the types of interactions they uncover with varying trade-offs between sensitivity and specificity have been proposed. To leverage benefits of multiple such methods, ensemble network methods that combine predictions from resulting networks have been developed, promising results better than or as good as the individual networks. Perhaps owing to the difficulty in obtaining accurate training examples, these ensemble methods hitherto are unsupervised. ResultsIn this article, we introduce EnGRaiN, the first supervised ensemble learning method to construct gene networks. The supervision for training is provided by small training datasets of true edge connections (positives) and edges known to be absent (negatives) among gene pairs. We demonstrate the effectiveness of EnGRaiN using simulated datasets as well as a curated collection of Arabidopsis thaliana datasets we created from microarray datasets available from public repositories. EnGRaiN shows better results not only in terms of receiver operating characteristic and PR characteristics for both real and simulated datasets compared with unsupervised methods for ensemble network construction, but also generates networks that can be mined for elucidating complex biological interactions. Availability and implementationEnGRaiN software and the datasets used in the study are publicly available at the github repository: https://github.com/AluruLab/EnGRaiN. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery

https://doi.org/10.1109/TPDS.2023.3244135

Srivastava, Ankit; Chockalingam, Sriram P.; Aluru, Srinivas (June 2023, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Parallel construction of module networks

https://doi.org/10.1145/3458817.3476207

Srivastava, Ankit; Chockalingam, Sriram P.; Aluru, Maneesha; Aluru, Srinivas (November 2021, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC))

Full Text Available
A comprehensive evaluation of long read error correction methods

https://doi.org/10.1186/s12864-020-07227-0

Zhang, Haowen; Jain, Chirag; Aluru, Srinivas (December 2020, BMC Genomics)
null (Ed.)
Abstract Background Third-generation single molecule sequencing technologies can sequence long reads, which is advancing the frontiers of genomics research. However, their high error rates prohibit accurate and efficient downstream analysis. This difficulty has motivated the development of many long read error correction tools, which tackle this problem through sampling redundancy and/or leveraging accurate short reads of the same biological samples. Existing studies to asses these tools use simulated data sets, and are not sufficiently comprehensive in the range of software covered or diversity of evaluation measures used. Results In this paper, we present a categorization and review of long read error correction methods, and provide a comprehensive evaluation of the corresponding long read error correction tools. Leveraging recent real sequencing data, we establish benchmark data sets and set up evaluation criteria for a comparative assessment which includes quality of error correction as well as run-time and memory usage. We study how trimming and long read sequencing depth affect error correction in terms of length distribution and genome coverage post-correction, and the impact of error correction performance on an important application of long reads, genome assembly. We provide guidelines for practitioners for choosing among the available error correction tools and identify directions for future research. Conclusions Despite the high error rate of long reads, the state-of-the-art correction tools can achieve high correction quality. When short reads are available, the best hybrid methods outperform non-hybrid methods in terms of correction quality and computing resource usage. When choosing tools for use, practitioners are suggested to be careful with a few correction tools that discard reads, and check the effect of error correction tools on downstream analysis. Our evaluation code is available as open-source at https://github.com/haowenz/LRECE .
more » « less
Full Text Available
ParRefCom: Parallel Reference-based Compression of Paired-end Genomics Read Datasets

https://doi.org/10.1145/3307339.3342171

Jammula, Nagakishore; Aluru, Srinivas (September 2019, Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM-BCB))

Full Text Available

Search for: All records