NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue

https://doi.org/10.1186/s12864-022-08591-9

Hang, Yuqing; Burns, Josh; Shealy, Benjamin T.; Pauly, Rini; Ficklin, Stephen P.; Feltus, Frank A. (December 2022, BMC Genomics)

Abstract Background Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. Results We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. Conclusions Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species.
more » « less
Full Text Available
Intelligent Resource Provisioning for Scientific Workflows and HPC

https://doi.org/10.1109/WORKS54523.2021.00007

Shealy, Benjamin T.; Feltus, F. Alex; Smith, Melissa C. (November 2021, IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS))

Full Text Available
GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure

https://doi.org/10.1186/s12859-022-04629-7

Hadish, John A.; Biggs, Tyler D.; Shealy, Benjamin T.; Bender, M. Reed; McKnight, Coleman B.; Wytko, Connor; Smith, Melissa C.; Feltus, F. Alex; Honaas, Loren; Ficklin, Stephen P. (December 2022, BMC Bioinformatics)

Abstract Background Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.
more » « less
Full Text Available
Addressing noise in co-expression network construction

https://doi.org/10.1093/bib/bbab495

Burns, Joshua J; Shealy, Benjamin T; Greer, Mitchell S; Hadish, John A; McGowan, Matthew T; Biggs, Tyler; Smith, Melissa C; Feltus, F Alex; Ficklin, Stephen P (January 2022, Briefings in Bioinformatics)

Abstract Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
more » « less
Full Text Available
NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles

https://doi.org/10.1534/g3.120.401067

Husain, Benafsh; Hickman, Allison R.; Hang, Yuqing; Shealy, Benjamin T.; Sapra, Karan; Feltus, F. Alex (September 2020, G3: Genes|Genomes|Genetics)
null (Ed.)
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
more » « less
Full Text Available
Cellular State Transformations Using Deep Learning for Precision Medicine Applications

https://doi.org/10.1016/j.patter.2020.100087

Targonski, Colin; Bender, M. Reed; Shealy, Benjamin T.; Husain, Benafsh; Paseman, Bill; Smith, Melissa C.; Feltus, F. Alex (September 2020, Patterns)
null (Ed.)
Full Text Available
Uncovering biomarker genes with enriched classification potential from Hallmark gene sets

https://doi.org/10.1038/s41598-019-46059-1

Targonski, Colin A.; Shearer, Courtney A.; Shealy, Benjamin T.; Smith, Melissa C.; Feltus, F. Alex (July 2019, Scientific Reports)

Abstract Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call “candidate genes”, by evaluating the ability of gene combinations to classify samples from a dataset, which we call “classification potential”. Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.
more » « less
GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks

https://doi.org/10.1109/ACCESS.2019.2951284

Shealy, Benjamin T.; Burns, Josh J. R.; Smith, Melissa C.; Alex Feltus, F.; Ficklin, Stephen P. (January 2019, IEEE Access)

Search for: All records