NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Identification of condition-specific biomarker systems in uterine cancer

https://doi.org/10.1093/g3journal/jkab392

Hickman, Allison_R; Hang, Yuqing; Pauly, Rini; Feltus, Frank_A; Andrews, ed., B. (November 2021, G3 Genes|Genomes|Genetics)

Abstract Uterine cancer is the fourth most common cancer among women, projected to affect 66,000 US women in 2021. Uterine cancer often arises in the inner lining of the uterus, known as the endometrium, but can present as several different types of cancer, including endometrioid cancer, serous adenocarcinoma, and uterine carcinosarcoma. Previous studies have analyzed the genetic changes between normal and cancerous uterine tissue to identify specific genes of interest, including TP53 and PTEN. Here we used Gaussian Mixture Models to build condition-specific gene coexpression networks for endometrial cancer, uterine carcinosarcoma, and normal uterine tissue. We then incorporated uterine regulatory edges and investigated potential coregulation relationships. These networks were further validated using differential expression analysis, functional enrichment, and a statistical analysis comparing the expression of transcription factors and their target genes across cancerous and normal uterine samples. These networks allow for a more comprehensive look into the biological networks and pathways affected in uterine cancer compared with previous singular gene analyses. We hope this study can be incorporated into existing knowledge surrounding the genetics of uterine cancer and soon become clinical biomarkers as a tool for better prognosis and treatment.
more » « less
Exploration into biomarker potential of region-specific brain gene co-expression networks

https://doi.org/10.1038/s41598-020-73611-1

Hang, Yuqing; Aburidi, Mohammed; Husain, Benafsh; Hickman, Allison R.; Poehlman, William L.; Feltus, F. Alex (October 2020, Scientific Reports)

Abstract The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain’s structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.
more » « less
Identification of condition-specific regulatory mechanisms in normal and cancerous human lung tissue

https://doi.org/10.1186/s12864-022-08591-9

Hang, Yuqing; Burns, Josh; Shealy, Benjamin T.; Pauly, Rini; Ficklin, Stephen P.; Feltus, Frank A. (December 2022, BMC Genomics)

Abstract Background Lung cancer is the leading cause of cancer death in both men and women. The most common lung cancer subtype is non-small cell lung carcinoma (NSCLC) comprising about 85% of all cases. NSCLC can be further divided into three subtypes: adenocarcinoma (LUAD), squamous cell carcinoma (LUSC), and large cell lung carcinoma. Specific genetic mutations and epigenetic aberrations play an important role in the developmental transition to a specific tumor subtype. The elucidation of normal lung versus lung tumor gene expression patterns and regulatory targets yields biomarker systems that discriminate lung phenotypes (i.e., biomarkers) and provide a foundation for the discovery of normal and aberrant gene regulatory mechanisms. Results We built condition-specific gene co-expression networks (csGCNs) for normal lung, LUAD, and LUSC conditions. Then, we integrated normal lung tissue-specific gene regulatory networks (tsGRNs) to elucidate control-target biomarker systems for normal and cancerous lung tissue. We characterized co-expressed gene edges, possibly under common regulatory control, for relevance in lung cancer. Conclusions Our approach demonstrates the ability to elucidate csGCN:tsGRN merged biomarker systems based on gene expression correlation and regulation. The biomarker systems we describe can be used to classify and further describe lung specimens. Our approach is generalizable and can be used to discover and interpret complex gene expression patterns for any condition or species.
more » « less
Full Text Available
GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure

https://doi.org/10.1186/s12859-022-04629-7

Hadish, John A.; Biggs, Tyler D.; Shealy, Benjamin T.; Bender, M. Reed; McKnight, Coleman B.; Wytko, Connor; Smith, Melissa C.; Feltus, F. Alex; Honaas, Loren; Ficklin, Stephen P. (December 2022, BMC Bioinformatics)

Abstract Background Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. Results GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. Conclusions Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.
more » « less
Full Text Available
EdgeCrafting: mining embedded, latent, nonlinear patterns to construct gene relationship networks

https://doi.org/10.1093/g3journal/jkac042

Husain, Benafsh; Reed Bender, Matthew; Alex Feltus, Frank (February 2022, G3 Genes|Genomes|Genetics)
Cherry, J M (Ed.)
Abstract The mechanisms that coordinate cellular gene expression are highly complex and intricately interconnected. Thus, it is necessary to move beyond a fully reductionist approach to understanding genetic information flow and begin focusing on the networked connections between genes that organize cellular function. Continued advancements in computational hardware, coupled with the development of gene correlation network algorithms, provide the capacity to study networked interactions between genes rather than their isolated functions. For example, gene coexpression networks are used to construct gene relationship networks using linear metrics such as Spearman or Pearson correlation. Recently, there have been tools designed to deepen these analyses by differentiating between intrinsic vs extrinsic noise within gene expression values, identifying different modules based on tissue phenotype, and capturing potential nonlinear relationships. In this report, we introduce an algorithm with a novel application of image-based segmentation modalities utilizing blob detection techniques applied for detecting bigenic edges in a gene expression matrix. We applied this algorithm called EdgeCrafting to a bulk RNA-sequencing gene expression matrix comprised of a healthy kidney and cancerous kidney data. We then compared EdgeCrafting against 4 other RNA expression analysis techniques: Weighted Gene Correlation Network Analysis, Knowledge Independent Network Construction, NetExtractor, and Differential gene expression analysis.
more » « less
Full Text Available
Addressing noise in co-expression network construction

https://doi.org/10.1093/bib/bbab495

Burns, Joshua J; Shealy, Benjamin T; Greer, Mitchell S; Hadish, John A; McGowan, Matthew T; Biggs, Tyler; Smith, Melissa C; Feltus, F Alex; Ficklin, Stephen P (January 2022, Briefings in Bioinformatics)

Abstract Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
more » « less
Full Text Available
Intelligent Resource Provisioning for Scientific Workflows and HPC

https://doi.org/10.1109/WORKS54523.2021.00007

Shealy, Benjamin T.; Feltus, F. Alex; Smith, Melissa C. (November 2021, IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS))

Full Text Available
Transcriptomic analysis of peripheral leukocytes in dairy cows with and without evidence of metritis and associated early postpartum disease

https://doi.org/10.15232/aas.2020-02092

McConnel, Craig; Crisp, Sierra; Biggs, Tyler; Parrish, Lindsay; Sischo, William; Adams-Progar, Amber; Ficklin, Stephen (December 2020, Applied Animal Science)
null (Ed.)
Full Text Available
NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles

https://doi.org/10.1534/g3.120.401067

Husain, Benafsh; Hickman, Allison R.; Hang, Yuqing; Shealy, Benjamin T.; Sapra, Karan; Feltus, F. Alex (September 2020, G3: Genes|Genomes|Genetics)
null (Ed.)
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
more » « less
Full Text Available
Cellular State Transformations Using Deep Learning for Precision Medicine Applications

https://doi.org/10.1016/j.patter.2020.100087

Targonski, Colin; Bender, M. Reed; Shealy, Benjamin T.; Husain, Benafsh; Paseman, Bill; Smith, Melissa C.; Feltus, F. Alex (September 2020, Patterns)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records