The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies.
Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits.
Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.