skip to main content

Title: Testing for fitness epistasis in a transplant experiment identifies a candidate adaptive locus in Timema stick insects
Identifying the genetic basis of adaptation is a central goal of evolutionary biology. However, identifying genes and mutations affecting fitness remains challenging because a large number of traits and variants can influence fitness. Selected phenotypes can also be difficult to know a priori , complicating top–down genetic approaches for trait mapping that involve crosses or genome-wide association studies. In such cases, experimental genetic approaches, where one maps fitness directly and attempts to infer the traits involved afterwards, can be valuable. Here, we re-analyse data from a transplant experiment involving Timema stick insects, where five physically clustered single-nucleotide polymorphisms associated with cryptic body coloration were shown to interact to affect survival. Our analysis covers a larger genomic region than past work and revealed a locus previously not identified as associated with survival. This locus resides near a gene, Punch ( Pu ) , involved in pteridine pigments production, implying that it could be associated with an unmeasured coloration trait. However, by combining previous and newly obtained phenotypic data, we show that this trait is not eye or body coloration. We discuss the implications of our results for the discovery of traits, genes and mutations associated with fitness in other systems, as more » well as for supergene evolution. This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’. « less
Authors:
; ; ; ; ;
Award ID(s):
1638768
Publication Date:
NSF-PAR ID:
10386347
Journal Name:
Philosophical Transactions of the Royal Society B: Biological Sciences
Volume:
377
Issue:
1855
ISSN:
0962-8436
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect onmore »gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks.« less
  2. ABSTRACT Genome-wide association studies (GWAS) can identify genetic variants responsible for naturally occurring and quantitative phenotypic variation. Association studies therefore provide a powerful complement to approaches that rely on de novo mutations for characterizing gene function. Although bacteria should be amenable to GWAS, few GWAS have been conducted on bacteria, and the extent to which nonindependence among genomic variants (e.g., linkage disequilibrium [LD]) and the genetic architecture of phenotypic traits will affect GWAS performance is unclear. We apply association analyses to identify candidate genes underlying variation in 20 biochemical, growth, and symbiotic phenotypes among 153 strains of Ensifer meliloti . For 11 traits, we find genotype-phenotype associations that are stronger than expected by chance, with the candidates in relatively small linkage groups, indicating that LD does not preclude resolving association candidates to relatively small genomic regions. The significant candidates show an enrichment for nucleotide polymorphisms (SNPs) over gene presence-absence variation (PAV), and for five traits, candidates are enriched in large linkage groups, a possible signature of epistasis. Many of the variants most strongly associated with symbiosis phenotypes were in genes previously identified as being involved in nitrogen fixation or nodulation. For other traits, apparently strong associations were not stronger thanmore »the range of associations detected in permuted data. In sum, our data show that GWAS in bacteria may be a powerful tool for characterizing genetic architecture and identifying genes responsible for phenotypic variation. However, careful evaluation of candidates is necessary to avoid false signals of association. IMPORTANCE Genome-wide association analyses are a powerful approach for identifying gene function. These analyses are becoming commonplace in studies of humans, domesticated animals, and crop plants but have rarely been conducted in bacteria. We applied association analyses to 20 traits measured in Ensifer meliloti , an agriculturally and ecologically important bacterium because it fixes nitrogen when in symbiosis with leguminous plants. We identified candidate alleles and gene presence-absence variants underlying variation in symbiosis traits, antibiotic resistance, and use of various carbon sources; some of these candidates are in genes previously known to affect these traits whereas others were in genes that have not been well characterized. Our results point to the potential power of association analyses in bacteria, but also to the need to carefully evaluate the potential for false associations.« less
  3. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species ismore »currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]« less
  4. High rates of dispersal can breakdown coadapted gene complexes. However, concentrated genomic architecture (i.e., genomic islands of divergence) can suppress recombination to allow evolution of local adaptations despite high gene flow. Pacific lamprey (Entosphenus tridentatus) is a highly dispersive anadromous fish. Observed trait diversity and evidence for genetic basis of traits suggests it may be locally adapted. We addressed whether concentrated genomic architecture could influence local adaptation for Pacific lamprey. Using two new whole genome assemblies and genotypes from 7,716 single nucleotide polymorphism (SNP) loci in 518 individuals from across the species range, we identified four genomic islands of divergence (on chromosomes 01, 02, 04, and 22). We determined robust phenotype-by-genotype relationships by testing multiple traits across geographic sites. These trait associations probably explain genomic divergence across the species’ range. We genotyped a subset of 302 broadly distributed SNPs in 2,145 individuals for association testing for adult body size, sexual maturity, migration distance and timing, adult swimming ability, and larval growth. Body size traits were strongly associated with SNPs on chromosomes 02 and 04. Moderate associations also implicated SNPs on chromosome 01 as being associated with variation in female maturity. Finally, we used candidate SNPs to extrapolate a heterogeneous spatiotemporalmore »distribution of these predicted phenotypes based on independent data sets of larval and adult collections. These maturity and body size results guide future elucidation of factors driving regional optimization of these traits for fitness. Pacific lamprey is culturally important and imperiled. This research addresses biological uncertainties that challenge restoration efforts.« less
  5. Abstract

    Microgeographic adaptation provides a particularly interesting context for understanding the genetic basis of phenotypic divergence and may also present unique empirical challenges. In particular, plant adaptation to extreme soil mosaics may generate barriers to gene flow or shifts in mating system that confound simple genomic scans for adaptive loci. Here, we combine three approaches – quantitative trait locus (QTL) mapping of candidate intervals in controlled crosses, population resequencing (PoolSeq) and analyses of wild recombinant individuals – to investigate one trait associated withMimulus guttatus(yellow monkeyflower) adaptation to geothermal soils in Yellowstone National Park. We mapped a majorQTLcausing dense leaf trichomes in thermally adapted plants to a <50‐kb region of linkage Group 14 (Tr14) previously implicated in trichome divergence between independentM. guttatuspopulations. A PoolSeq scan of Tr14 region revealed a cluster of six genes, coincident with the inferredQTLpeak, with high allele frequency differences sufficient to explain observed phenotypic differentiation. One of these, the R2R3MYBtranscription factor Migut.N02661, is a plausible functional candidate and was also strongly associated (r2 = 0.27) with trichome phenotype in analyses of wild‐collected admixed individuals. Although functional analyses will be necessary to definitively link molecular variants in Tr14 with trichome divergence, our analyses are a major step inmore »that direction. They point to a simple, and parallel, genetic basis for one axis ofMimulus guttatusadaptation to an extreme habitat, suggest a broadly conserved genetic basis for trichome variation across flowering plants and pave the way for further investigations of this challenging case of microgeographic incipient speciation.

    « less