skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Inferring gene-pathway associations from consolidated transcriptome datasets: an interactive gene network explorer for Tetrahymena thermophila
Abstract Although an established model organism, Tetrahymena thermophila remains comparatively inaccessible to high throughput screens, and alternative bioinformatic approaches still rely on unconnected datasets and outdated algorithms. Here, we report a new approach to consolidating RNA-seq and microarray data based on a systematic exploration of parameters and computational controls, enabling us to infer functional gene associations from their co-expression patterns. To illustrate the power of this approach, we took advantage of new data regarding a previously studied pathway, the biogenesis of a secretory organelle called the mucocyst. Our untargeted clustering approach recovered over 80% of the genes that were previously verified to play a role in mucocyst biogenesis. Furthermore, we tested four new genes that we predicted to be mucocyst-associated based on their co-expression and found that knocking out each of them results in mucocyst secretion defects. We also found that our approach succeeds in clustering genes associated with several other cellular pathways that we evaluated based on prior literature. We present the Tetrahymena Gene Network Explorer (TGNE) as an interactive tool for genetic hypothesis generation and functional annotation in this organism and as a framework for building similar tools for other systems.  more » « less
Award ID(s):
1937326
PAR ID:
10593278
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
NAR Genomics and Bioinformatics
Volume:
7
Issue:
2
ISSN:
2631-9268
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The proper balance of gene expression is essential for cellular health, organismal development, and maintaining homeostasis. In response to complex internal and external signals, the cell needs to modulate gene expression to maintain proteostasis and establish cellular identity within its niche. On a genome level, single-celled prokaryotic microbes display clustering of co-expressed genes that are regulated as a polycistronic RNA. This phenomenon is largely absent from eukaryotic microbes, although there is extensive clustering of co-expressed genes as functional pairs spread throughout the genome in Saccharomyces cerevisiae. While initial analysis demonstrated conservation of clustering in divergent fungal lineages, a comprehensive analysis has yet to be performed. Here we report on the prevalence, conservation, and significance of the functional clustering of co-regulated genes within the opportunistic human pathogen, Candida albicans. Our analysis reveals that there is extensive clustering within this organism—although the identity of the gene pairs is unique compared with those found in S. cerevisiae—indicating that this genomic arrangement evolved after these microbes diverged evolutionarily, rather than being the result of an ancestral arrangement. We report a clustered arrangement in gene families that participate in diverse molecular functions and are not the result of a divergent orientation with a shared promoter. This arrangement coordinates the transcription of the clustered genes to their neighboring genes, with the clusters congregating to genomic loci that are conducive to transcriptional regulation at a distance. 
    more » « less
  2. Abstract Identifying genes that interact to confer a biological function to an organism is one of the main goals of functional genomics. High‐throughput technologies for assessment and quantification of genome‐wide gene expression patterns have enabled systems‐level analyses to infer pathways or networks of genes involved in different functions under many different conditions. Here, we leveraged the publicly available, information‐rich RNA‐Seq datasets of the model plantArabidopsis thalianato construct a gene co‐expression network, which was partitioned into clusters or modules that harbor genes correlated by expression. Gene ontology and pathway enrichment analyses were performed to assess functional terms and pathways that were enriched within the different gene modules. By interrogating the co‐expression network for genes in different modules that associate with a gene of interest, diverse functional roles of the gene can be deciphered. By mapping genes differentially expressing under a certain condition inArabidopsisonto the co‐expression network, we demonstrate the ability of the network to uncover novel genes that are likely transcriptionally active but prone to be missed by standard statistical approaches due to their falling outside of the confidence zone of detection. To our knowledge, this is the firstA. thalianaco‐expression network constructed using the entire mRNA‐Seq datasets (>20,000) available at the NCBI SRA database. The developed network can serve as a useful resource for theArabidopsisresearch community to interrogate specific genes of interest within the network, retrieve the respective interactomes, decipher gene modules that are transcriptionally altered under certain condition or stage, and gain understanding of gene functions. 
    more » « less
  3. Misteli, Tom (Ed.)
    Endogenous RNA interference (RNAi) pathways regulate a wide range of cellular processes in diverse eukaryotes, yet in the ciliated eukaryote, Tetrahymena thermophila, the cellular purpose of RNAi pathways that generate ∼23–24 nucleotide (nt) small (s)RNAs has remained unknown. Here, we investigated the phenotypic and gene expression impacts on vegetatively growing cells when genes involved in ∼23–24 nt sRNA biogenesis are disrupted. We observed slower proliferation and increased expression of genes involved in DNA metabolism and chromosome organization and maintenance in sRNA biogenesis mutants RSP1Δ, RDN2Δ, and RDF2Δ. In addition, RSP1Δ and RDN2Δ cells frequently exhibited enlarged chromatin extrusion bodies, which are nonnuclear, DNA-containing structures that may be akin to mammalian micronuclei. Expression of homologous recombination factor Rad51 was specifically elevated in RSP1Δ and RDN2Δ strains, with Rad51 and double-stranded DNA break marker γ-H2A.X localized to discrete macronuclear foci. In addition, an increase in Rad51 and γ-H2A.X foci was also found in knockouts of TWI8, a macronucleus-localized PIWI protein. Together, our findings suggest that an evolutionarily conserved role for RNAi pathways in maintaining genome integrity may be extended even to the early branching eukaryotic lineage that gave rise to Tetrahymena thermophila. 
    more » « less
  4. Fu, Feng (Ed.)
    With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest thatKRTAP3-1,KRTAP3-3, andKRTAP3-5share regulatory elements in skin and pancreas. Furthermore, we find thatCELA3AandCELA3Bshare associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes. 
    more » « less
  5. Abstract Recent technologies such asspatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes. 
    more » « less