skip to main content


Title: An atlas of gene expression variation across the Caenorhabditis elegans species
Phenotypic variation in diverse organism-level traits have been studied in Caenorhabditis elegans wild strains, but differences in gene expression and the underlying variation in regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal- level traits, including drug and toxicant responses. We performed transcriptomic analysis on 207 genetically distinct C. elegans wild strains to study natural regulatory variation of gene expression. Using this massive dataset, we performed genome-wide association mappings to investigate the genetic basis underlying gene expression variation and revealed complex genetic architectures. We found a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further used mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of gene expression variation in shaping phenotypic diversity.  more » « less
Award ID(s):
1764421
NSF-PAR ID:
10336414
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
bioRxiv
ISSN:
2692-8205
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Phenotypic variation in organism-level traits has been studied inCaenorhabditis eleganswild strains, but the impacts of differences in gene expression and the underlying regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal-level traits, including drug and toxicant responses. We perform transcriptomic analyses on 207 genetically distinctC. eleganswild strains to study natural regulatory variation of gene expression. Using this massive dataset, we perform genome-wide association mappings to investigate the genetic basis underlying gene expression variation and reveal complex genetic architectures. We find a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further use mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of using gene expression variation to understand how phenotypic diversity is generated.

     
    more » « less
  2. Larracuente, Amanda (Ed.)
    Abstract Short tandem repeats (STRs) have orders of magnitude higher mutation rates than single nucleotide variants (SNVs) and have been proposed to accelerate evolution in many organisms. However, only few studies have addressed the impact of STR variation on phenotypic variation at both the organismal and molecular levels. Potential driving forces underlying the high mutation rates of STRs also remain largely unknown. Here, we leverage the recently generated expression and STR variation data among wild Caenorhabditis elegans strains to conduct a genome-wide analysis of how STRs affect gene expression variation. We identify thousands of expression STRs (eSTRs) showing regulatory effects and demonstrate that they explain missing heritability beyond SNV-based expression quantitative trait loci. We illustrate specific regulatory mechanisms such as how eSTRs affect splicing sites and alternative splicing efficiency. We also show that differential expression of antioxidant genes and oxidative stresses might affect STR mutations systematically using both wild strains and mutation accumulation lines. Overall, we reveal the interplay between STRs and gene expression variation by providing novel insights into regulatory mechanisms of STRs and highlighting that oxidative stress could lead to higher STR mutation rates. 
    more » « less
  3. Short tandem repeats (STRs) represent an important class of genetic variation that can contribute to phenotypic differences. Although millions of single nucleotide variants (SNVs) and short indels have been identified among wild Caenorhabditis elegans strains, the natural diversity in STRs remains unknown. Here, we characterized the distribution of 31,991 STRs with motif lengths of 1–6 bp in the reference genome of C. elegans . Of these STRs, 27,667 harbored polymorphisms across 540 wild strains and only 9691 polymorphic STRs (pSTRs) had complete genotype data for more than 90% of the strains. Compared with the reference genome, the pSTRs showed more contraction than expansion. We found that STRs with different motif lengths were enriched in different genomic features, among which coding regions showed the lowest STR diversity and constrained STR mutations. STR diversity also showed similar genetic divergence and selection signatures among wild strains as in previous studies using SNVs. We further identified STR variation in two mutation accumulation line panels that were derived from two wild strains and found background-dependent and fitness-dependent STR mutations. We also performed the first genome-wide association analyses between natural variation in STRs and organismal phenotypic variation among wild C. elegans strains. Overall, our results delineate the first large-scale characterization of STR variation in wild C. elegans strains and highlight the effects of selection on STR mutations. 
    more » « less
  4. Macdonald, S (Ed.)
    Abstract Quantitative genetics in Caenorhabditis elegans seeks to identify naturally segregating genetic variants that underlie complex traits. Genome-wide association studies scan the genome for individual genetic variants that are significantly correlated with phenotypic variation in a population, or quantitative trait loci. Genome-wide association studies are a popular choice for quantitative genetic analyses because the quantitative trait loci that are discovered segregate in natural populations. Despite numerous successful mapping experiments, the empirical performance of genome-wide association study has not, to date, been formally evaluated in C. elegans. We developed an open-source genome-wide association study pipeline called NemaScan and used a simulation-based approach to provide benchmarks of mapping performance in collections of wild C. elegans strains. Simulated trait heritability and complexity determined the spectrum of quantitative trait loci detected by genome-wide association studies. Power to detect smaller-effect quantitative trait loci increased with the number of strains sampled from the C. elegans Natural Diversity Resource. Population structure was a major driver of variation in mapping performance, with populations shaped by recent selection exhibiting significantly lower false discovery rates than populations composed of more divergent strains. We also recapitulated previous genome-wide association studies of experimentally validated quantitative trait variants. Our simulation-based evaluation of performance provides the community with critical context to pursue quantitative genetic studies using the C. elegans Natural Diversity Resource to elucidate the genetic basis of complex traits in C. elegans natural populations. 
    more » « less
  5. Hug, Laura A. (Ed.)
    ABSTRACT Natural microbial communities consist of closely related taxa that may exhibit phenotypic differences and inhabit distinct niches. However, connecting genetic diversity to ecological properties remains a challenge in microbial ecology due to the lack of pure cultures across the microbial tree of life. “ Candidatus Accumulibacter phosphatis” (Accumulibacter) is a polyphosphate-accumulating organism that contributes to the enhanced biological phosphorus removal (EBPR) biotechnological process for removing excess phosphorus from wastewater and preventing eutrophication from downstream receiving waters. Distinct Accumulibacter clades often coexist in full-scale wastewater treatment plants and laboratory-scale enrichment bioreactors and have been hypothesized to inhabit distinct ecological niches. However, since individual strains of the Accumulibacter lineage have not been isolated in pure culture to date, these predictions have been made solely on genome-based comparisons and enrichments with varying strain compositions. Here, we used genome-resolved metagenomics and metatranscriptomics to explore the activity of coexisting Accumulibacter strains in an engineered bioreactor environment. We obtained four high-quality genomes of Accumulibacter strains that were present in the bioreactor ecosystem, one of which is a completely contiguous draft genome scaffolded with long Nanopore reads. We identified core and accessory genes to investigate how gene expression patterns differed among the dominating strains. Using this approach, we were able to identify putative pathways and functions that may confer distinct functions to Accumulibacter strains and provide key functional insights into this biotechnologically significant microbial lineage. IMPORTANCE “ Candidatus Accumulibacter phosphatis” is a model polyphosphate-accumulating organism that has been studied using genome-resolved metagenomics, metatranscriptomics, and metaproteomics to understand the EBPR process. Within the Accumulibacter lineage, several similar but diverging clades are defined by the shared sequence identity of the polyphosphate kinase ( ppk1 ) locus. These clades are predicted to have key functional differences in acetate uptake rates, phage defense mechanisms, and nitrogen-cycling capabilities. However, such hypotheses have largely been made based on gene content comparisons of sequenced Accumulibacter genomes, some of which were obtained from different systems. Here, we performed time series genome-resolved metatranscriptomics to explore gene expression patterns of coexisting Accumulibacter clades in the same bioreactor ecosystem. Our work provides an approach for elucidating ecologically relevant functions based on gene expression patterns between closely related microbial populations. 
    more » « less