skip to main content


Title: Drosophila Evolution over Space and Time (DEST): A New Population Genomics Resource
Abstract Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate data sets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in >20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This data set, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental metadata. A web-based genome browser and web portal provide easy access to the SNP data set. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan data set. Our resource will enable population geneticists to analyze spatiotemporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.  more » « less
Award ID(s):
1737824
NSF-PAR ID:
10389606
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Editor(s):
Nielsen, Rasmus
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
12
ISSN:
1537-1719
Page Range / eLocation ID:
5782 to 5805
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Learning and memory are critical functions for all animals, giving individuals the ability to respond to changes in their environment. Within populations, individuals vary, however the mechanisms underlying this variation in performance are largely unknown. Thus, it remains to be determined what genetic factors cause an individual to have high learning ability and what factors determine how well an individual will remember what they have learned. To genetically dissect learning and memory performance, we used theDrosophilasynthetic population resource (DSPR), a multiparent mapping resource in the model systemDrosophila melanogaster, consisting of a large set of recombinant inbred lines (RILs) that naturally vary in these and other traits. Fruit flies can be trained in a “heat box” to learn to remain on one side of a chamber (place learning) and can remember this (place memory) over short timescales. Using this paradigm, we measured place learning and memory for ~49 000 individual flies from over 700 DSPR RILs. We identified 16 different loci across the genome that significantly affect place learning and/or memory performance, with 5 of these loci affecting both traits. To identify transcriptomic differences associated with performance, we performed RNA‐Seq on pooled samples of seven high performing and seven low performing RILs for both learning and memory and identified hundreds of genes with differences in expression in the two sets. Integrating our transcriptomic results with the mapping results allowed us to identify nine promising candidate genes, advancing our understanding of the genetic basis underlying natural variation in learning and memory performance.

     
    more » « less
  2. Abstract

    Understanding the molecular basis of repeated evolution improves our ability to predict evolution across the tree of life. Only since the last decade has high‐throughput sequencing enabled comparative genome scans to thoroughly examine the repeatability of genetic changes driving repeated phenotypic evolution. The Asian corn borer (ACB),Ostrinia furnacalis(Guenée), and the European corn borer (ECB),Ostrinia nubilalis(Hübner), are two closely related moths displaying repeatable phenological adaptation to a wide range of climates on two separate continents, largely manifesting as changes in the timing of diapause induction and termination across latitude. Candidate genes underlying diapause variation in North American ECB have been previously identified. Here, we sampled seven ACB populations across 23 degrees of latitude in China to elucidate the genetic basis of diapause variation and evolutionary mechanisms driving parallel clinal responses in the two species. Using pooled whole‐genome sequencing (Pool‐seq) data, population genomic analyses revealed hundreds of single nucleotide polymorphisms (SNP) whose allele frequencies covaried with mean diapause phenotypes along the cline. Genes involved in circadian rhythm were over‐represented among candidate genes with strong signatures of spatially varying selection. Only one of two circadian clock genes associated with diapause evolution in ECB showed evidence of reuse in ACB (period [per]), butperalleles were not shared between species nor with their outgroup, implicating independent mutational paths. Nonetheless, evidence of adaptive introgression was discovered at putative diapause loci located elsewhere in the genome, suggesting that de novo mutations and introgression might both underlie the repeated phenological evolution.

     
    more » « less
  3. Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to identify factor binding to genomic DNA and chromatin modifications. ChIP-seq data analysis is affected by genomic regions that generate ultra-high artifactual signals. To remove these signals from ChIP-seq data, the Encyclopedia of DNA Elements (ENCODE) project developed comprehensive sets of regions defined by low mappability and ultra-high signals called blacklists for human, mouse (Mus musculus), nematode (Caenorhabditis elegans), and fruit fly (Drosophila melanogaster). However, blacklists are not currently available for many model and nonmodel species. Here, we describe an alternative approach for removing false-positive peaks called greenscreen. Greenscreen is easy to implement, requires few input samples, and uses analysis tools frequently employed for ChIP-seq. Greenscreen removes artifactual signals as effectively as blacklists in Arabidopsis thaliana and human ChIP-seq dataset while covering less of the genome and dramatically improves ChIP-seq peak calling and downstream analyses. Greenscreen filtering reveals true factor binding overlap and occupancy changes in different genetic backgrounds or tissues. Because it is effective with as few as two inputs, greenscreen is readily adaptable for use in any species or genome build. Although developed for ChIP-seq, greenscreen also identifies artifactual signals from other genomic datasets including Cleavage Under Targets and Release Using Nuclease. We present an improved ChIP-seq pipeline incorporating greenscreen that detects more true peaks than other methods. 
    more » « less
  4. Abstract A long-standing enigma concerns the geographic and ecological origins of the intensively studied vinegar fly, Drosophila melanogaster. This globally distributed human commensal is thought to originate from sub-Saharan Africa, yet until recently, it had never been reported from undisturbed wilderness environments that could reflect its precommensal niche. Here, we document the collection of 288 D. melanogaster individuals from multiple African wilderness areas in Zambia, Zimbabwe, and Namibia. The presence of D. melanogaster in these remote woodland environments is consistent with an ancestral range in southern-central Africa, as opposed to equatorial regions. After sequencing the genomes of 17 wilderness-collected flies collected from Kafue National Park in Zambia, we found reduced genetic diversity relative to town populations, elevated chromosomal inversion frequencies, and strong differences at specific genes including known insecticide targets. Combining these genomes with existing data, we probed the history of this species’ geographic expansion. Demographic estimates indicated that expansion from southern-central Africa began ∼10,000 years ago, with a Saharan crossing soon after, but expansion from the Middle East into Europe did not begin until roughly 1,400 years ago. This improved model of demographic history will provide an important resource for future evolutionary and genomic studies of this key model organism. Our findings add context to the history of D. melanogaster, while opening the door for future studies on the biological basis of adaptation to human environments. 
    more » « less
  5. Annotating the genomes of multiple species allows us to analyze the evolution of their genes. While many eukaryotic genome assemblies already include computational gene predictions, these predictions can benefit from review and refinement through manual gene annotation. The Genomics Education Partnership (GEP; https://thegep.org/ ) developed a structural annotation protocol for protein-coding genes that enables undergraduate student and faculty researchers to create high-quality gene annotations that can be utilized in subsequent scientific investigations. For example, this protocol has been utilized by the GEP faculty to engage undergraduate students in the comparative annotation of genes involved in the insulin signaling pathway in 27 Drosophila species, using D. melanogaster as the reference genome. Students construct gene models using multiple lines of computational and empirical evidence including expression data (e.g., RNA-Seq), sequence similarity (e.g., BLAST and multiple sequence alignment), and computational gene predictions. Quality control measures require each gene be annotated by at least two students working independently, followed by reconciliation of the submitted gene models by a more experienced student. This article provides an overview of the annotation protocol and describes how discrepancies in student submitted gene models are resolved to produce a final, high-quality gene set suitable for subsequent analyses. The protocol can be adapted to other scientific questions (e.g., expansion of the Drosophila Muller F element) and species (e.g., parasitoid wasps) to provide additional opportunities for undergraduate students to participate in genomics research. These student annotation efforts can substantially improve the quality of gene annotations in publicly available genomic databases. 
    more » « less