skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Direct sequencing of haplotypes from diploid individuals through a modified emulsion PCR ‐based single‐molecule sequencing approach
Award ID(s):
1046372
PAR ID:
10337046
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
13
Issue:
1
ISSN:
1755-098X
Page Range / eLocation ID:
135 to 143
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract High‐throughput sequencing has changed many aspects of population genetics, molecular ecology and related fields, affecting both experimental design and data analysis. The software packageangsdallows users to perform a number of population genetic analyses on high‐throughput sequencing data.angsduses probabilistic approaches which can directly make use of genotype likelihoods; thus,SNPcalling is not required for comparative analyses. This takes advantage of all the sequencing data and produces more accurate results for samples with low sequencing depth. Here, we presentangsd‐wrapper, a set of wrapper scripts that provides a user‐friendly interface for runningangsdand visualizing results.angsd‐wrapper supports multiple types of analyses including estimates of nucleotide sequence diversity neutrality tests, principal component analysis, estimation of admixture proportions for individual samples and calculation of statistics that quantify recent introgression.angsd‐wrapper also provides interactive graphing ofangsdresults to enhance data exploration. We demonstrate the usefulness ofangsd‐wrapper by analysing resequencing data from populations of wild and domesticatedZea.angsd‐wrapper is freely available fromhttps://github.com/mojaveazure/angsd-wrapper. 
    more » « less
  2. Abstract Many applications in molecular ecology require the ability to match specific DNA sequences from single‐ or mixed‐species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target‐specific enrichment capabilities of CRISPR‐Cas systems may offer advantages in some applications. We identified 54,837 CRISPR‐Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single‐ and mixed‐species samples, which yielded mean chloroplast sequence lengths of 2,530–11,367 bp, depending on the experiment. In comparison to mixed‐species experiments, single‐species experiments yielded more on‐target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed‐species experiments yielded sufficient data to provide ≥48‐fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplasttrnL‐P6 marker. Prior work developed CRISPR‐based enrichment protocols for long‐read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short‐read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR‐based analyses of mixed‐species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori. 
    more » « less
  3. Abstract Next‐generation sequencing technologies now allow researchers of non‐model systems to perform genome‐based studies without the requirement of a (often unavailable) closely related genomic reference. We evaluated the role of restriction endonuclease (RE) selection in double‐digest restriction‐site‐associatedDNAsequencing (ddRADseq) by generating reduced representation genome‐wide data using four differentREcombinations. Our expectation was thatREselections targeting longer, more complex restriction sites would recover fewer loci thanREwith shorter, less complex sites. We sequenced a diverse sample of non‐model arachnids, including five congeneric pairs of harvestmen (Opiliones) and four pairs of spiders (Araneae). Sample pairs consisted of either conspecifics or closely related congeneric taxa, and in total 26 sample pair analyses were tested. Sequence demultiplexing, read clustering and variant calling were performed in thepyRADprogram. The 6‐base pair cutterEcoRIcombined with methylated site‐specific 4‐base pair cutterMspIproduced, on average, the greatest numbers of intra‐individual loci and shared loci per sample pair. As expected, the number of shared loci recovered for a sample pair covaried with the degree of genetic divergence, estimated with cytochrome oxidase I sequences, although this relationship was non‐linear. Our comparative results will prove useful in guiding protocol selection for ddRADseq experiments on many arachnid taxa where reference genomes, even from closely related species, are unavailable. 
    more » « less
  4. SUMMARY The application of high‐throughput sequencing to cellular transcriptome profiling (RNA‐seq) has enabled significant advances in our understanding of gene expression in plants. However, conventional RNA‐seq data reports mainly cytoplasmic transcript abundance rather than actual transcription rates. As a result, it is less sensitive to detect unstable and low‐abundance nuclear RNA species, such as long non‐coding RNAs, and is less directly connected to chromatin features and processes such as DNA replication. To bridge this gap, several protocols have been established to profile newly synthesized RNA in plants and other eukaryotes. These protocols can be technically challenging and present their own difficulties and limitations. Here we analyze newly synthesized nuclear RNA metabolically labeledin vivowith 5‐ethynyl uridine (EU‐nuclear RNA) in maize (Zea maysL.) root tips and compare it with the entire nuclear RNA population. We also compare both nuclear RNA preparations to conventional RNA‐seq analysis of cellular RNA. The transcript abundance profiles of protein‐coding genes in nuclear RNA and EU‐nuclear RNA were tightly correlated with each other (R2 = 0.767), but quite distinct from that of cellular RNA (R2 = 0.170 or 0.293). Nuclear and EU‐nuclear RNA reads are frequently mapped across entire genes, including introns, while cellular reads are predominantly mapped to mature transcripts. Both nuclear and EU‐nuclear RNA exhibited a greater ability to detect both protein‐coding and non‐coding expressed genes. 
    more » « less
  5. Summary Universal primers for SSU rRNA genes allow profiling of natural communities by simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR reaction. Despite the potential to show relative abundance for all rRNA genes, universal primers are rarely used, due to various concerns including amplicon length variation and its effect on bioinformatic pipelines. We thus developed 16S and 18S rRNA mock communities and a bioinformatic pipeline to validate this approach. Using these mocks, we show that universal primers (515Y/926R) outperformed eukaryote‐specific V4 primers in observed versus expected abundance correlations (slope = 0.88 vs. 0.67–0.79), and mock community members with single mismatches to the primer were strongly underestimated (threefold to eightfold). Using field samples, both primers yielded similar 18S beta‐diversity patterns (Mantel test,p < 0.001) but differences in relative proportions of many rarer taxa. To test for length biases, we mixed mock communities (16S + 18S) before PCR and found a twofold underestimation of 18S sequences due to sequencing bias. Correcting for the twofold underestimation, we estimate that, in Southern California field samples (1.2–80 μm), there were averages of 35% 18S, 28% chloroplast 16S, and 37% prokaryote 16S rRNA genes. These data demonstrate the potential for universal primers to generate comprehensive microbiome profiles. 
    more » « less