The proper balance of gene expression is essential for cellular health, organismal development, and maintaining homeostasis. In response to complex internal and external signals, the cell needs to modulate gene expression to maintain proteostasis and establish cellular identity within its niche. On a genome level, single-celled prokaryotic microbes display clustering of co-expressed genes that are regulated as a polycistronic RNA. This phenomenon is largely absent from eukaryotic microbes, although there is extensive clustering of co-expressed genes as functional pairs spread throughout the genome in Saccharomyces cerevisiae. While initial analysis demonstrated conservation of clustering in divergent fungal lineages, a comprehensive analysis has yet to be performed. Here we report on the prevalence, conservation, and significance of the functional clustering of co-regulated genes within the opportunistic human pathogen, Candida albicans. Our analysis reveals that there is extensive clustering within this organism—although the identity of the gene pairs is unique compared with those found in S. cerevisiae—indicating that this genomic arrangement evolved after these microbes diverged evolutionarily, rather than being the result of an ancestral arrangement. We report a clustered arrangement in gene families that participate in diverse molecular functions and are not the result of a divergent orientation with a shared promoter. This arrangement coordinates the transcription of the clustered genes to their neighboring genes, with the clusters congregating to genomic loci that are conducive to transcriptional regulation at a distance.
more »
« less
Inferring gene-pathway associations from consolidated transcriptome datasets: an interactive gene network explorer for Tetrahymena thermophila
Abstract Although an established model organism, Tetrahymena thermophila remains comparatively inaccessible to high throughput screens, and alternative bioinformatic approaches still rely on unconnected datasets and outdated algorithms. Here, we report a new approach to consolidating RNA-seq and microarray data based on a systematic exploration of parameters and computational controls, enabling us to infer functional gene associations from their co-expression patterns. To illustrate the power of this approach, we took advantage of new data regarding a previously studied pathway, the biogenesis of a secretory organelle called the mucocyst. Our untargeted clustering approach recovered over 80% of the genes that were previously verified to play a role in mucocyst biogenesis. Furthermore, we tested four new genes that we predicted to be mucocyst-associated based on their co-expression and found that knocking out each of them results in mucocyst secretion defects. We also found that our approach succeeds in clustering genes associated with several other cellular pathways that we evaluated based on prior literature. We present the Tetrahymena Gene Network Explorer (TGNE) as an interactive tool for genetic hypothesis generation and functional annotation in this organism and as a framework for building similar tools for other systems.
more »
« less
- Award ID(s):
- 1937326
- PAR ID:
- 10637366
- Editor(s):
- Notredame, Cedric
- Publisher / Repository:
- Oxford Academic
- Date Published:
- Journal Name:
- NAR Genomics and Bioinformatics
- Volume:
- 7
- Issue:
- 2
- ISSN:
- 2631-9268
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Misteli, Tom (Ed.)Endogenous RNA interference (RNAi) pathways regulate a wide range of cellular processes in diverse eukaryotes, yet in the ciliated eukaryote, Tetrahymena thermophila, the cellular purpose of RNAi pathways that generate ∼23–24 nucleotide (nt) small (s)RNAs has remained unknown. Here, we investigated the phenotypic and gene expression impacts on vegetatively growing cells when genes involved in ∼23–24 nt sRNA biogenesis are disrupted. We observed slower proliferation and increased expression of genes involved in DNA metabolism and chromosome organization and maintenance in sRNA biogenesis mutants RSP1Δ, RDN2Δ, and RDF2Δ. In addition, RSP1Δ and RDN2Δ cells frequently exhibited enlarged chromatin extrusion bodies, which are nonnuclear, DNA-containing structures that may be akin to mammalian micronuclei. Expression of homologous recombination factor Rad51 was specifically elevated in RSP1Δ and RDN2Δ strains, with Rad51 and double-stranded DNA break marker γ-H2A.X localized to discrete macronuclear foci. In addition, an increase in Rad51 and γ-H2A.X foci was also found in knockouts of TWI8, a macronucleus-localized PIWI protein. Together, our findings suggest that an evolutionarily conserved role for RNAi pathways in maintaining genome integrity may be extended even to the early branching eukaryotic lineage that gave rise to Tetrahymena thermophila.more » « less
-
Abstract Recent technologies such asspatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes.more » « less
-
Barbash, Daniel (Ed.)Abstract To understand the relative importance of cis and trans effects on regulation, we crossed multi-parent recombinant-inbred-lines (RILs) to a common tester and measured allele specific gene expression in the offspring. Testing difference of allelic imbalance between two RIL x Tester crosses is a test of cis or trans depending on the RIL alleles compared. The study design also enables to separate two sources of trans variation, genetic and environmental, detected via interactions with cis effects. We demonstrate the effectiveness of this approach in a long-read RNA-seq experiment in female abdominal tissue at two time points in Drosophila melanogaster. Among the 40% of all loci that show evidence of genetic variation in cis, trans effects due to environment are detectable in 31% of loci and trans effects due to genetic background in 19%, with little overlap in sources of trans variation. The genes identified in this study are associated with genes previously reported to exhibit genetic variation in gene expression. Eleven genes in a QTL for thermotolerance, previously shown to differ in expression based on temperature, have evidence for regulation of gene expression regardless of the environment, including the cuticular protein Cpr67B, suggesting a functional role for standing variation in gene expression. This study provides a blueprint for identifying regulatory variation in gene expression, as the tester design maximizes cis variation and enables the efficient assessment of all pairs of RIL alleles relative to the tester, a much smaller study compared to the pairwise direct assessment.more » « less
-
Fu, Feng (Ed.)With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest thatKRTAP3-1,KRTAP3-3, andKRTAP3-5share regulatory elements in skin and pancreas. Furthermore, we find thatCELA3AandCELA3Bshare associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.more » « less
An official website of the United States government

