skip to main content


Title: Natural variation in C. elegans short tandem repeats
Short tandem repeats (STRs) represent an important class of genetic variation that can contribute to phenotypic differences. Although millions of single nucleotide variants (SNVs) and short indels have been identified among wild Caenorhabditis elegans strains, the natural diversity in STRs remains unknown. Here, we characterized the distribution of 31,991 STRs with motif lengths of 1–6 bp in the reference genome of C. elegans . Of these STRs, 27,667 harbored polymorphisms across 540 wild strains and only 9691 polymorphic STRs (pSTRs) had complete genotype data for more than 90% of the strains. Compared with the reference genome, the pSTRs showed more contraction than expansion. We found that STRs with different motif lengths were enriched in different genomic features, among which coding regions showed the lowest STR diversity and constrained STR mutations. STR diversity also showed similar genetic divergence and selection signatures among wild strains as in previous studies using SNVs. We further identified STR variation in two mutation accumulation line panels that were derived from two wild strains and found background-dependent and fitness-dependent STR mutations. We also performed the first genome-wide association analyses between natural variation in STRs and organismal phenotypic variation among wild C. elegans strains. Overall, our results delineate the first large-scale characterization of STR variation in wild C. elegans strains and highlight the effects of selection on STR mutations.  more » « less
Award ID(s):
1764421
NSF-PAR ID:
10427891
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Genome Research
Volume:
32
Issue:
19
ISSN:
1088-9051
Page Range / eLocation ID:
1852-1861
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Larracuente, Amanda (Ed.)
    Abstract Short tandem repeats (STRs) have orders of magnitude higher mutation rates than single nucleotide variants (SNVs) and have been proposed to accelerate evolution in many organisms. However, only few studies have addressed the impact of STR variation on phenotypic variation at both the organismal and molecular levels. Potential driving forces underlying the high mutation rates of STRs also remain largely unknown. Here, we leverage the recently generated expression and STR variation data among wild Caenorhabditis elegans strains to conduct a genome-wide analysis of how STRs affect gene expression variation. We identify thousands of expression STRs (eSTRs) showing regulatory effects and demonstrate that they explain missing heritability beyond SNV-based expression quantitative trait loci. We illustrate specific regulatory mechanisms such as how eSTRs affect splicing sites and alternative splicing efficiency. We also show that differential expression of antioxidant genes and oxidative stresses might affect STR mutations systematically using both wild strains and mutation accumulation lines. Overall, we reveal the interplay between STRs and gene expression variation by providing novel insights into regulatory mechanisms of STRs and highlighting that oxidative stress could lead to higher STR mutation rates. 
    more » « less
  2. Kim, J (Ed.)
    Abstract Though natural systems harbor genetic and phenotypic variation, research in model organisms is often restricted to a reference strain. Focusing on a reference strain yields a great depth of knowledge but potentially at the cost of breadth of understanding. Furthermore, tools developed in the reference context may introduce bias when applied to other strains, posing challenges to defining the scope of variation within model systems. Here, we evaluate how genetic differences among 5 wild Caenorhabditis elegans strains affect gene expression and its quantification, in general and after induction of the RNA interference (RNAi) response. Across strains, 34% of genes were differentially expressed in the control condition, including 411 genes that were not expressed at all in at least 1 strain; 49 of these were unexpressed in reference strain N2. Reference genome mapping bias caused limited concern: despite hyperdiverse hotspots throughout the genome, 92% of variably expressed genes were robust to mapping issues. The transcriptional response to RNAi was highly strain- and target-gene-specific and did not correlate with RNAi efficiency, as the 2 RNAi-insensitive strains showed more differentially expressed genes following RNAi treatment than the RNAi-sensitive reference strain. We conclude that gene expression, generally and in response to RNAi, differs across C. elegans strains such that the choice of strain may meaningfully influence scientific inferences. Finally, we introduce a resource for querying gene expression variation in this dataset at https://wildworm.biosci.gatech.edu/rnai/. 
    more » « less
  3. Macdonald, S (Ed.)
    Abstract Quantitative genetics in Caenorhabditis elegans seeks to identify naturally segregating genetic variants that underlie complex traits. Genome-wide association studies scan the genome for individual genetic variants that are significantly correlated with phenotypic variation in a population, or quantitative trait loci. Genome-wide association studies are a popular choice for quantitative genetic analyses because the quantitative trait loci that are discovered segregate in natural populations. Despite numerous successful mapping experiments, the empirical performance of genome-wide association study has not, to date, been formally evaluated in C. elegans. We developed an open-source genome-wide association study pipeline called NemaScan and used a simulation-based approach to provide benchmarks of mapping performance in collections of wild C. elegans strains. Simulated trait heritability and complexity determined the spectrum of quantitative trait loci detected by genome-wide association studies. Power to detect smaller-effect quantitative trait loci increased with the number of strains sampled from the C. elegans Natural Diversity Resource. Population structure was a major driver of variation in mapping performance, with populations shaped by recent selection exhibiting significantly lower false discovery rates than populations composed of more divergent strains. We also recapitulated previous genome-wide association studies of experimentally validated quantitative trait variants. Our simulation-based evaluation of performance provides the community with critical context to pursue quantitative genetic studies using the C. elegans Natural Diversity Resource to elucidate the genetic basis of complex traits in C. elegans natural populations. 
    more » « less
  4. Phenotypic variation in diverse organism-level traits have been studied in Caenorhabditis elegans wild strains, but differences in gene expression and the underlying variation in regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal- level traits, including drug and toxicant responses. We performed transcriptomic analysis on 207 genetically distinct C. elegans wild strains to study natural regulatory variation of gene expression. Using this massive dataset, we performed genome-wide association mappings to investigate the genetic basis underlying gene expression variation and revealed complex genetic architectures. We found a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further used mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of gene expression variation in shaping phenotypic diversity. 
    more » « less
  5. Abstract

    Phenotypic variation in organism-level traits has been studied inCaenorhabditis eleganswild strains, but the impacts of differences in gene expression and the underlying regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal-level traits, including drug and toxicant responses. We perform transcriptomic analyses on 207 genetically distinctC. eleganswild strains to study natural regulatory variation of gene expression. Using this massive dataset, we perform genome-wide association mappings to investigate the genetic basis underlying gene expression variation and reveal complex genetic architectures. We find a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further use mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of using gene expression variation to understand how phenotypic diversity is generated.

     
    more » « less