skip to main content


Title: ISSRseq: An extensible method for reduced representation sequencing
Abstract

The capability to generate densely sampled single nucleotide polymorphism (SNP) data is essential in diverse subdisciplines of biology, including crop breeding, pathology, forensics, forestry, ecology, evolution and conservation. However, the wet‐laboratory expertise and bioinformatics training required to conduct genome‐scale variant discovery remain limiting factors for investigators with limited resources.

Here we present ISSRseq, a PCR‐based method for reduced representation of genomic variation using simple sequence repeats as priming sites to sequence inter simple sequence repeat (ISSR) regions. Briefly, ISSR regions are amplified with single primers, pooled, used to construct sequencing libraries with a commercially available kit, and sequenced on the Illumina platform. We also present a flexible bioinformatic pipeline that assembles ISSR loci, calls and hard filters variants, outputs data matrices in common formats, and conducts population analyses using R.

Using three angiosperm species as case studies, we demonstrate that ISSRseq is highly repeatable, necessitates only simple wet‐laboratory skills and commonplace instrumentation, is flexible in terms of the number of single primers used, and can generate genomic‐scale variant discovery on par with existing RRS methods which require more complex wet‐laboratory procedures.

ISSRseq represents a straightforward approach to SNP genotyping in any organism, and we predict that this method will be particularly useful for those studying population genomics and phylogeography of non‐model organisms. Furthermore, the ease of ISSRseq relative to other RRS methods should prove useful to those lacking advanced expertise in wet‐laboratory methods or bioinformatics.

 
more » « less
Award ID(s):
2044259 1920858
NSF-PAR ID:
10446826
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
13
Issue:
3
ISSN:
2041-210X
Page Range / eLocation ID:
p. 668-681
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Genetic transformation is a powerful means for the improvement of crop plants, but requires labor‐ and resource‐intensive methods. An efficient method for identifying single‐copy transgene insertion events from a population of independent transgenic lines is desirable. Currently, transgene copy number is estimated by either Southern blot hybridization analyses or quantitative polymerase chain reaction (qPCR) experiments. Southern hybridization is a convincing and reliable method, but it also is expensive, time‐consuming and often requires a large amount of genomicDNAand radioactively labeled probes. Alternatively,qPCRrequires lessDNAand is potentially simpler to perform, but its results can lack the accuracy and precision needed to confidently distinguish between one‐ and two‐copy events in transgenic plants with large genomes. To address this need, we developed a droplet digitalPCR‐based method for transgene copy number measurement in an array of crops: rice, citrus, potato, maize, tomato and wheat. The method utilizes specific primers to amplify target transgenes, and endogenous reference genes in a single duplexed reaction containing thousands of droplets. Endpoint amplicon production in the droplets is detected and quantified using sequence‐specific fluorescently labeled probes. The results demonstrate that this approach can generate confident copy number measurements in independent transgenic lines in these crop species. This method and the compendium of probes and primers will be a useful resource for the plant research community, enabling the simple and accurate determination of transgene copy number in these six important crop species.

     
    more » « less
  2. Summary

    Humans have domesticated diverse species from across the plant kingdom, yet much of our foundational knowledge of domestication has come from studies investigating relatively few of the most important annual food crops. Here, we examine the impacts of domestication on genetic diversity in a tropical perennial fruit species, mango (Mangifera indica).

    We used restriction site associatedDNAsequencing to generate genomic single nucleotide polymorphism (SNP) data from 106 mango cultivars from seven geographical regions along with 52 samples of closely related species and unidentified cultivars to identify centers of mango genetic diversity and examine how post‐domestication dispersal shaped the geographical distribution of diversity.

    We identify two gene pools of cultivated mango, representing Indian and Southeast Asian germplasm. We found no significant genetic bottleneck associated with the introduction of mango into new regions of the world. By contrast, we show that mango populations in introduced regions have elevated levels of diversity.

    Our results suggest that mango has a more complex history of domestication than previously supposed, perhaps including multiple domestication events, hybridization and regional selection. Our work has direct implications for mango breeding and genebank management, and also builds on recent efforts to understand how woody perennial crops respond to domestication.

     
    more » « less
  3. INTRODUCTION One of the central applications of the human reference genome has been to serve as a baseline for comparison in nearly all human genomic studies. Unfortunately, many difficult regions of the reference genome have remained unresolved for decades and are affected by collapsed duplications, missing sequences, and other issues. Relative to the current human reference genome, GRCh38, the Telomere-to-Telomere CHM13 (T2T-CHM13) genome closes all remaining gaps, adds nearly 200 million base pairs (Mbp) of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for scientific inquiry. RATIONALE We demonstrate how the T2T-CHM13 reference genome universally improves read mapping and variant identification in a globally diverse cohort. This cohort includes all 3202 samples from the expanded 1000 Genomes Project (1KGP), sequenced with short reads, as well as 17 globally diverse samples sequenced with long reads. By applying state-of-the-art methods for calling single-nucleotide variants (SNVs) and structural variants (SVs), we document the strengths and limitations of T2T-CHM13 relative to its predecessors and highlight its promise for revealing new biological insights within technically challenging regions of the genome. RESULTS Across the 1KGP samples, we found more than 1 million additional high-quality variants genome-wide using T2T-CHM13 than with GRCh38. Within previously unresolved regions of the genome, we identified hundreds of thousands of variants per sample—a promising opportunity for evolutionary and biomedical discovery. T2T-CHM13 improves the Mendelian concordance rate among trios and eliminates tens of thousands of spurious SNVs per sample, including a reduction of false positives in 269 challenging, medically relevant genes by up to a factor of 12. These corrections are in large part due to improvements to 70 protein-coding genes in >9 Mbp of inaccurate sequence caused by falsely collapsed or duplicated regions in GRCh38. Using the T2T-CHM13 genome also yields a more comprehensive view of SVs genome-wide, with a greatly improved balance of insertions and deletions. Finally, by providing numerous resources for T2T-CHM13 (including 1KGP genotypes, accessibility masks, and prominent annotation databases), our work will facilitate the transition to T2T-CHM13 from the current reference genome. CONCLUSION The vast improvements in variant discovery across samples of diverse ancestries position T2T-CHM13 to succeed as the next prevailing reference for human genetics. T2T-CHM13 thus offers a model for the construction and study of high-quality reference genomes from globally diverse individuals, such as is now being pursued through collaboration with the Human Pangenome Reference Consortium. As a foundation, our work underscores the benefits of an accurate and complete reference genome for revealing diversity across human populations. Genomic features and resources available for T2T-CHM13. Comparisons to GRCh38 reveal broad improvements in SNVs, indels, and SVs discovered across diverse human populations by means of short-read (1KGP) and long-read sequencing (LRS). These improvements are due to resolution of complex genomic loci (nonsyntenic and previously unresolved), duplication errors, and discordant haplotypes, including those in medically relevant genes. 
    more » « less
  4. The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5 ′ and 3 ′ untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics. 
    more » « less
  5. Abstract

    Genome editing technologies have revolutionized genetic studies in the life sciences community in recent years. The application of these technologies allows researchers to conveniently generate mutations in almost any gene of interest. This is very useful for species such as maize that have complex genomes and lack comprehensive mutant collections. With the improvement of genome editing tools and transformation methods, these technologies are also widely used to assist breeding research and implementation in maize. However, the detection and genotyping of genomic edits rely on low‐throughput, high‐cost methods, such as traditional agarose gel electrophoresis and Sanger sequencing. This article describes a method to barcode the target regions of genomic edits from many individuals by low‐cost polymerase chain reaction (PCR) amplification. It also employs next‐generation sequencing (NGS) to genotype the genome‐edited plants at high throughput and low cost. This protocol can be used for initial screening of genomic edits as well as derived population genotyping on a small or large scale, at high efficiency and low cost. © 2021 Wiley Periodicals LLC.

    Basic Protocol 1: A fast genomic DNA preparation method from genome edited plants

    Basic Protocol 2: Barcoding the amplicons of edited regions from each individual by two rounds of PCR

    Basic Protocol 3: Bioinformatics analysis

     
    more » « less