Impact of genotype‐calling methodologies on genome‐wide association and genomic prediction in polyploids

Njuguna, Joyce N.  (ORCID:0000000320083387); Clark, Lindsay V.; Lipka, Alexander E.  (ORCID:0000000315718528); Anzoua, Kossonou G.; Bagmet, Larisa; Chebukin, Pavel; Dwiyanti, Maria S.; Dzyubenko, Elena; Dzyubenko, Nicolay; Ghimire, Bimal Kumar; Jin, Xiaoli  (ORCID:000000034829176X); Johnson, Douglas A.; Kjeldsen, Jens Bonderup; Nagano, Hironori; de Bem Oliveira, Ivone; Peng, Junhua; Petersen, Karen Koefoed; Sabitov, Andrey; Seong, Eun Soo; Yamada, Toshihiko; Yoo, Ji Hye; Yu, Chang Yeon; Zhao, Hua; Munoz, Patricio; Long, Stephen P.; Sacks, Erik J.

doi:10.1002/tpg2.20401

Abstract

Discovery and analysis of genetic variants underlying agriculturally important traits are key to molecular breeding of crops. Reduced representation approaches have provided cost‐efficient genotyping using next‐generation sequencing. However, accurate genotype calling from next‐generation sequencing data is challenging, particularly in polyploid species due to their genome complexity. Recently developed Bayesian statistical methods implemented in available software packages, polyRAD, EBG, and updog, incorporate error rates and population parameters to accurately estimate allelic dosage across any ploidy. We used empirical and simulated data to evaluate the three Bayesian algorithms and demonstrated their impact on the power of genome‐wide association study (GWAS) analysis and the accuracy of genomic prediction. We further incorporated uncertainty in allelic dosage estimation by testing continuous genotype calls and comparing their performance to discrete genotypes in GWAS and genomic prediction. We tested the genotype‐calling methods using data from two autotetraploid species,Miscanthus sacchariflorusandVaccinium corymbosum, and performed GWAS and genomic prediction. In the empirical study, the tested Bayesian genotype‐calling algorithms differed in their downstream effects on GWAS and genomic prediction, with some showing advantages over others. Through subsequent simulation studies, we observed that at low read depth, polyRAD was advantageous in its effect on GWAS power and limit of false positives. Additionally, we found that continuous genotypes increased the accuracy of genomic prediction, by reducing genotyping error, particularly at low sequencing depth. Our results indicate that by using the Bayesian algorithm implemented in polyRAD and continuous genotypes, we can accurately and cost‐efficiently implement GWAS and genomic prediction in polyploid crops.

More Like this