skip to main content


Title: Model‐based genotype and ancestry estimation for potential hybrids with mixed‐ploidy
Abstract

Non‐random mating among individuals can lead to spatial clustering of genetically similar individuals and population stratification. This deviation from panmixia is commonly observed in natural populations. Consequently, individuals can have parentage in single populations or involving hybridization between differentiated populations. Accounting for this mixture and structure is important when mapping the genetics of traits and learning about the formative evolutionary processes that shape genetic variation among individuals and populations. Stratified genetic relatedness among individuals is commonly quantified using estimates of ancestry that are derived from a statistical model. Development of these models for polyploid and mixed‐ploidy individuals and populations has lagged behind those for diploids. Here, we extend and test a hierarchical Bayesian model, calledentropy, which can use low‐depth sequence data to estimate genotype and ancestry parameters in autopolyploid and mixed‐ploidy individuals (including sex chromosomes and autosomes within individuals). Our analysis of simulated data illustrated the trade‐off between sequencing depth and genome coverage and found lower error associated with low‐depth sequencing across a larger fraction of the genome than with high‐depth sequencing across a smaller fraction of the genome. The model has high accuracy and sensitivity as verified with simulated data and through analysis of admixture among populations of diploid and tetraploidArabidopsis arenosa.

 
more » « less
Award ID(s):
1638602
NSF-PAR ID:
10451001
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
21
Issue:
5
ISSN:
1755-098X
Page Range / eLocation ID:
p. 1434-1451
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Whole‐genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub‐sampling with estimating recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low‐coverage whole‐genome sequencing of pooled or indexed‐individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non‐model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyse these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control and then execute analyses. Here, we present an updated version of a bioinformatics pipeline calledPoolParty2that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harbouring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite,angsd, and illustrate the compatibility and complementarity of these suites usingangsdto generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided byPoolParty2. Finally, we apply our updated pipeline to an empirical dataset of low‐coverage whole genomic data from population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock. Thus, we not only demonstrate the utility ofPoolParty2for genomic studies that combine sequencing data from multiple individuals, but also illustrate how it compliments other bioinformatics resources such asangsd.

     
    more » « less
  2. Abstract

    Discovery and analysis of genetic variants underlying agriculturally important traits are key to molecular breeding of crops. Reduced representation approaches have provided cost‐efficient genotyping using next‐generation sequencing. However, accurate genotype calling from next‐generation sequencing data is challenging, particularly in polyploid species due to their genome complexity. Recently developed Bayesian statistical methods implemented in available software packages, polyRAD, EBG, and updog, incorporate error rates and population parameters to accurately estimate allelic dosage across any ploidy. We used empirical and simulated data to evaluate the three Bayesian algorithms and demonstrated their impact on the power of genome‐wide association study (GWAS) analysis and the accuracy of genomic prediction. We further incorporated uncertainty in allelic dosage estimation by testing continuous genotype calls and comparing their performance to discrete genotypes in GWAS and genomic prediction. We tested the genotype‐calling methods using data from two autotetraploid species,Miscanthus sacchariflorusandVaccinium corymbosum, and performed GWAS and genomic prediction. In the empirical study, the tested Bayesian genotype‐calling algorithms differed in their downstream effects on GWAS and genomic prediction, with some showing advantages over others. Through subsequent simulation studies, we observed that at low read depth, polyRAD was advantageous in its effect on GWAS power and limit of false positives. Additionally, we found that continuous genotypes increased the accuracy of genomic prediction, by reducing genotyping error, particularly at low sequencing depth. Our results indicate that by using the Bayesian algorithm implemented in polyRAD and continuous genotypes, we can accurately and cost‐efficiently implement GWAS and genomic prediction in polyploid crops.

     
    more » « less
  3. Abstract Objectives

    Body size and composition vary widely among individuals and populations, and long‐term research in diverse contexts informs our understanding of genetic, cultural, and environmental impacts on this variation. We analyze longitudinal measures of height, weight, and body mass index (BMI) from a Caribbean village, estimating the extent to which these anthropometrics are shaped by genetic variance in a small‐scale population of mixed ancestry.

    Materials and Methods

    Longitudinal data from a traditionally horticultural village in Dominica document height and weight in a non‐Western population that is transitioning to increasingly Westernized lifestyles, and an 11‐generation pedigree enables us to estimate the proportions of phenotypic variation in height, weight, and BMI attributed to genetic variation. We assess within‐individual variation across growth curves as well as heritabilities of these traits for 260 individuals using Bayesian variance component estimation.

    Results

    Age, sex, and secular trends account for the majority of anthropometric variation in these longitudinal data. Independent of age, sex, and secular trends, our analyses show high repeatabilities for the remaining variation in height, weight, and BMI growth curves (>0.75), and moderate heritabilities (h2height= 0.68,h2weight= 0.64,h2BMI= 0.49) reveal clear genetic signals that account for large proportions of the variation in body size observed between families. Secular trends show increases of 6.5% in height and 16.0% in weight from 1997 to 2017.

    Discussion

    This horticultural Caribbean population has transitioned to include more Westernized foods and technologies over the decades captured in this analysis. BMI varies widely between individuals and is significantly shaped by genetic variation, warranting future exploration with other physiological correlates and associated genetic variants.

     
    more » « less
  4. Abstract

    Crenate broomrape (Orobanche crenataForsk.) is a serious long‐standing parasitic weed problem in Algeria, mainly affecting legumes but also vegetable crops. Unresolved questions for parasitic weeds revolve around the extent to which these plants undergo local adaptation, especially with respect to host specialization, which would be expected to be a strong selective factor for obligate parasitic plants. In the present study, the genotyping‐by‐sequencing (GBS) approach was used to analyze genetic diversity and population structure of 10 Northern AlgerianO.crenatapopulations with different geographical origins and host species (faba bean, pea, chickpea, carrot, and tomato). In total, 8004 high‐quality single‐nucleotide polymorphisms (5% missingness) were obtained and used across the study. Genetic diversity and relationships of 95 individuals from 10 populations were studied using model‐based ancestry analysis, principal components analysis, discriminant analysis of principal components, and phylogeny approaches. The genetic differentiation (FST) between pairs of populations was lower between adjacent populations and higher between geographically separated ones, but no support was found for isolation by distance. Further analyses identified four genetic clusters and revealed evidence of structuring among populations and, although confounded with location, among hosts. In the clearest example,O.crenatagrowing on pea had a SNP profile that was distinct from other host/location combinations. These results illustrate the importance and potential of GBS to reveal the dynamics of parasitic weed dispersal and population structure.

     
    more » « less
  5. Abstract Background

    Given the economic and environmental importance of allopolyploids and other species with highly duplicated genomes, there is a need for methods to distinguish paralogs, i.e. duplicate sequences within a genome, from Mendelian loci, i.e. single copy sequences that pair at meiosis. The ratio of observed to expected heterozygosity is an effective tool for filtering loci but requires genotyping to be performed first at a high computational cost, whereas counting the number of sequence tags detected per genotype is computationally quick but very ineffective in inbred or polyploid populations. Therefore, new methods are needed for filtering paralogs.

    Results

    We introduce a novel statistic,Hind/HE, that uses the probability that two reads sampled from a genotype will belong to different alleles, instead of observed heterozygosity. The expected value ofHind/HEis the same across all loci in a dataset, regardless of read depth or allele frequency. In contrast to methods based on observed heterozygosity, it can be estimated and used for filtering loci prior to genotype calling. In addition to filtering paralogs, it can be used to filter loci with null alleles or high overdispersion, and identify individuals with unexpected ploidy and hybrid status. We demonstrate that the statistic is useful at read depths as low as five to 10, well below the depth needed for accurate genotype calling in polyploid and outcrossing species.

    Conclusions

    Our methodology for estimatingHind/HEacross loci and individuals, as well as determining reasonable thresholds for filtering loci, is implemented in polyRAD v1.6, available athttps://github.com/lvclark/polyRAD. In large sequencing datasets, we anticipate that the ability to filter markers and identify problematic individuals prior to genotype calling will save researchers considerable computational time.

     
    more » « less