skip to main content


Title: Microhaplotypes provide increased power from short‐read DNA sequences for relationship inference
Abstract

The accelerating rate at whichDNAsequence data are now generated by high‐throughput sequencing instruments provides both opportunities and challenges for population genetic and ecological investigations of animals and plants. We show here how the common practice of calling genotypes from a singleSNPper sequenced region ignores substantial additional information in the phased short‐read sequences that are provided by these sequencing instruments. We target sequenced regions with multipleSNPs in kelp rockfish (Sebastes atrovirens) to determine “microhaplotypes” and then call these microhaplotypes as alleles at each locus. We then demonstrate how these multi‐allelic marker data from such loci dramatically increase power for relationship inference. The microhaplotype approach decreases false‐positive rates by several orders of magnitude, relative to calling bi‐allelicSNPs, for two challenging analytical procedures, full‐sibling and single parent–offspring pair identification. We also show how the identification of half‐sibling pairs requires so much data that physical linkage becomes a consideration, and that most published studies that attempt to do so are dramatically underpowered. The advent of phased short‐readDNAsequence data, in conjunction with emerging analytical tools for their analysis, promises to improve efficiency by reducing the number of loci necessary for a particular level of statistical confidence, thereby lowering the cost of data collection and reducing the degree of physical linkage amongst markers used for relationship estimation. Such advances will facilitate collaborative research and management for migratory and other widespread species.

 
more » « less
NSF-PAR ID:
10056074
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
18
Issue:
2
ISSN:
1755-098X
Page Range / eLocation ID:
p. 296-305
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Next‐generation sequencing technologies now allow researchers of non‐model systems to perform genome‐based studies without the requirement of a (often unavailable) closely related genomic reference. We evaluated the role of restriction endonuclease (RE) selection in double‐digest restriction‐site‐associatedDNAsequencing (ddRADseq) by generating reduced representation genome‐wide data using four differentREcombinations. Our expectation was thatREselections targeting longer, more complex restriction sites would recover fewer loci thanREwith shorter, less complex sites. We sequenced a diverse sample of non‐model arachnids, including five congeneric pairs of harvestmen (Opiliones) and four pairs of spiders (Araneae). Sample pairs consisted of either conspecifics or closely related congeneric taxa, and in total 26 sample pair analyses were tested. Sequence demultiplexing, read clustering and variant calling were performed in thepyRADprogram. The 6‐base pair cutterEcoRIcombined with methylated site‐specific 4‐base pair cutterMspIproduced, on average, the greatest numbers of intra‐individual loci and shared loci per sample pair. As expected, the number of shared loci recovered for a sample pair covaried with the degree of genetic divergence, estimated with cytochrome oxidase I sequences, although this relationship was non‐linear. Our comparative results will prove useful in guiding protocol selection for ddRADseq experiments on many arachnid taxa where reference genomes, even from closely related species, are unavailable.

     
    more » « less
  2. Abstract

    The development of high‐throughput sequencing technologies is dramatically increasing the use of single nucleotide polymorphisms (SNPs) across the field of genetics, but most parentage studies of wild populations still rely on microsatellites. We developed a bioinformatic pipeline for identifyingSNPpanels that are informative for parentage analysis from restriction site‐associatedDNAsequencing (RADseq) data. This pipeline includes options for analysis with or without a reference genome, and provides methods to maximize genotyping accuracy and select sets of unlinked loci that have high statistical power. We test this pipeline on small populations of Mexican gray wolf and bighorn sheep, for which parentage analyses are expected to be challenging due to low genetic diversity and the presence of many closely related individuals. We compare the results of parentage analysis acrossSNPpanels generated with or without the use of a reference genome, and betweenSNPs and microsatellites. For Mexican gray wolf, we conducted parentage analyses for 30 pups from a single cohort where samples were available from 64% of possible mothers and 53% of possible fathers, and the accuracy of parentage assignments could be estimated because true identities of parents were known a priori based on field data. For bighorn sheep, we conducted maternity analyses for 39 lambs from five cohorts where 77% of possible mothers were sampled, but true identities of parents were unknown. Analyses with and without a reference genome producedSNPpanels with ≥95% parentage assignment accuracy for Mexican gray wolf, outperforming microsatellites at 78% accuracy. Maternity assignments were completely consistent across allSNPpanels for the bighorn sheep, and were 74.4% consistent with assignments from microsatellites. Accuracy and consistency of parentage analysis were not reduced when using as few as 284SNPs for Mexican gray wolf and 142SNPs for bighorn sheep, indicating our pipeline can be used to developSNPgenotyping assays for parentage analysis with relatively small numbers of loci.

     
    more » « less
  3. Abstract

    Molecular ecologists seek to genotype hundreds to thousands of loci from hundreds to thousands of individuals at minimal cost per sample. Current methods, such as restriction‐site‐associatedDNAsequencing (RADseq) and sequence capture, are constrained by costs associated with inefficient use of sequencing data and sample preparation. Here, we introduceRADcap, an approach that combines the major benefits ofRADseq (low cost with specific start positions) with those of sequence capture (repeatable sequencing of specific loci) to significantly increase efficiency and reduce costs relative to current approaches.RADcap uses a new version of dual‐digestRADseq (3RAD) to identify candidateSNPloci for capture bait design and subsequently uses custom sequence capture baits to consistently enrich candidateSNPloci across many individuals. We combined this approach with a new library preparation method for identifying and removingPCRduplicates from 3RADlibraries, which allows researchers to processRADseq data using traditional pipelines, and we tested theRADcap method by genotyping sets of 96–384Wisteriaplants. Our results demonstrate that ourRADcap method: (i) methodologically reduces (to <5%) and allows computational removal ofPCRduplicate reads from data, (ii) achieves 80–90% reads on target in 11 of 12 enrichments, (iii) returns consistent coverage (≥4×) across >90% of individuals at up to 99.8% of the targeted loci, (iv) produces consistently high occupancy matrices of genotypes across hundreds of individuals and (v) costs significantly less than current approaches.

     
    more » « less
  4. Abstract

    The Cyclophyllidea is the most diverse order of tapeworms, encompassing species that infect all classes of terrestrial tetrapods including humans and domesticated animals. Available phylogenetic reconstructions based either on morphology or molecular data lack the resolution to allow scientists to either propose a solid taxonomy or infer evolutionary associations. Molecular markers available for the Cyclophyllidea mostly include ribosomalDNAand mitochondrial loci. In this study, we identified 3641 single‐copy nuclear coding loci by comparing the genomes ofHymenolepis microstoma,Echinococcus granulosusandTaenia solium. We designedRNAbaits based on the sequence ofH. microstoma, and applied target enrichment and Illumina sequencing to test the utility of those baits to recover loci useful for phylogenetic analyses. We capturedDNAfrom five species of tapeworms representing two families of cyclophyllideans. We obtained an average of 3284 (90%) of the targets from the test samples and then used captured sequences (2 181 361 bp in total; fragment size ranging from 301 to 6969 bp) to reconstruct a phylogeny for the five test species plus the three species for which genomic data are available. The results were consistent with the current consensus regarding cyclophyllidean relationships. To assess the potential for our method to yield informative genetic variation at intraspecific scales, we extracted 14 074 single nucleotide polymorphisms (SNPs) from alignments of fourArostrilepis macrocirrosaand twoA. cookiand successfully inferred their relationships. The results showed that our target gene tools yield data sets that provide robust inferences at a range of taxonomic scales in the Cyclophyllidea.

     
    more » « less
  5. Abstract Background

    Co‐occurrence of two genetic diseases is challenging for accurate diagnosis and genetic counseling. The recent availability of whole exome sequencing (WES) has dramatically improved the molecular diagnosis of rare genetic diseases in particular in consanguineous populations.

    Methods

    We report here on a consanguineous family from Southern Tunisia including three members affected with congenital ichthyosis. The index case had a hearing loss (HL) and ichthyosis and was primarily suspected as suffering from keratitis‐ichthyosis‐deafness (KID) syndrome.WESwas performed for the index case, and all members of the nuclear family were sequenced (Sanger method).

    Results

    TheWESapproach allowed the identification of two strong candidate variants in two different genes; a missense mutation c.1334T>G (p.Leu445Trp) in exon 11 ofSLC26A4gene, associated with isolatedHLand a novel missense mutation c.728G>T (p.Arg243Leu) in exon 8 ofCYP4F22gene likely responsible for ichthyosis. These two mutations were predicted to be pathogenic by three pathogenicity prediction softwares (Scale‐Invariant Feature Transform [SIFT], Polymorphism Phenotyping [PolyPhen], Mutation Taster) to underlie theHLand ichthyosis, respectively.

    Conclusions

    The present study raises awareness about the importance of familial history for accurate diagnosis of syndromic genetic diseases and differential diagnosis with co‐occurrence of two distinct clinical entities. In addition, in countries with limited resources,WESsequencing for a single individual provides a cost effective tool for molecular diagnosis confirmation and genetic counseling.

     
    more » « less