Microhaplotypes provide increased power from short‐read DNA sequences for relationship inference

Baetscher, Diana S.; Clemento, Anthony J.; Ng, Thomas C.; Anderson, Eric C.  (ORCID:0000000313260840); Garza, John C.  (ORCID:0000000273256803)

doi:10.1111/1755-0998.12737

Abstract

The accelerating rate at whichDNAsequence data are now generated by high‐throughput sequencing instruments provides both opportunities and challenges for population genetic and ecological investigations of animals and plants. We show here how the common practice of calling genotypes from a singleSNPper sequenced region ignores substantial additional information in the phased short‐read sequences that are provided by these sequencing instruments. We target sequenced regions with multipleSNPs in kelp rockfish (Sebastes atrovirens) to determine “microhaplotypes” and then call these microhaplotypes as alleles at each locus. We then demonstrate how these multi‐allelic marker data from such loci dramatically increase power for relationship inference. The microhaplotype approach decreases false‐positive rates by several orders of magnitude, relative to calling bi‐allelicSNPs, for two challenging analytical procedures, full‐sibling and single parent–offspring pair identification. We also show how the identification of half‐sibling pairs requires so much data that physical linkage becomes a consideration, and that most published studies that attempt to do so are dramatically underpowered. The advent of phased short‐readDNAsequence data, in conjunction with emerging analytical tools for their analysis, promises to improve efficiency by reducing the number of loci necessary for a particular level of statistical confidence, thereby lowering the cost of data collection and reducing the degree of physical linkage amongst markers used for relationship estimation. Such advances will facilitate collaborative research and management for migratory and other widespread species.

More Like this