skip to main content


Title: The state of Medusozoa genomics: current evidence and future challenges
Abstract

Medusozoa is a widely distributed ancient lineage that harbors one-third of Cnidaria diversity divided into 4 classes. This clade is characterized by the succession of stages and modes of reproduction during metagenic lifecycles, and includes some of the most plastic body plans and life cycles among animals. The characterization of traditional genomic features, such as chromosome numbers and genome sizes, was rather overlooked in Medusozoa and many evolutionary questions still remain unanswered. Modern genomic DNA sequencing in this group started in 2010 with the publication of the Hydra vulgaris genome and has experienced an exponential increase in the past 3 years. Therefore, an update of the state of Medusozoa genomics is warranted. We reviewed different sources of evidence, including cytogenetic records and high-throughput sequencing projects. We focused on 4 main topics that would be relevant for the broad Cnidaria research community: (i) taxonomic coverage of genomic information; (ii) continuity, quality, and completeness of high-throughput sequencing datasets; (iii) overview of the Medusozoa specific research questions approached with genomics; and (iv) the accessibility of data and metadata. We highlight a lack of standardization in genomic projects and their reports, and reinforce a series of recommendations to enhance future collaborative research.

 
more » « less
Award ID(s):
1935672
PAR ID:
10367121
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GigaScience
Volume:
11
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The reduced cost of high‐throughput sequencing and the development of gene sets with wide phylogenetic applicability has led to the rise of sequence capture methods as a plausible platform for both phylogenomics and population genomics in plants. An important consideration in large targeted sequencing projects is the per‐sample cost, which can be inflated when using off‐the‐shelf kits or reagents not purchased in bulk. Here, we discuss methods to reduce per‐sample costs in high‐throughput targeted sequencing projects. We review the minimal equipment and consumable requirements for targeted sequencing while comparing several alternatives to reduce bulk costs inDNAextraction, library preparation, target enrichment, and sequencing. We consider how each of the workflow alterations may be affected byDNAquality (e.g., fresh vs. herbarium tissue), genome size, and the phylogenetic scale of the project. We provide a cost calculator for researchers considering targeted sequencing to use when designing projects, and identify challenges for future development of low‐cost sequencing in non‐model plant systems.

     
    more » « less
  2. Abstract

    High‐throughput DNA sequencing facilitates the analysis of large portions of the genome in nonmodel organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double‐digest restriction‐associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations ofAmphirrhox longifolia(Violaceae), a nonmodel plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and biallelic markers should be sampled for accurate estimates of intra‐ and interpopulation genetic diversity. We identified 3646 and 4900 polymorphic SNPs for the two populations ofA. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity withinA. longifoliapopulations, when 1000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e. two individuals), accurate estimates ofFSTcan be obtained with a large number of SNPs (≥1500). These results highlight the potential of high‐throughput genomic sequencing approaches to address questions related to evolutionary biology in nonmodel organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.

     
    more » « less
  3. Abstract

    Confidence in experimental results is critical for discovery. As the scale of data generation in genomics has grown exponentially, experimental error has likely kept pace despite the best efforts of many laboratories. Technical mistakes can and do occur at nearly every stage of a genomics assay (i.e. cell line contamination, reagent swapping, tube mislabelling, etc.) and are often difficult to identify post-execution. However, the DNA sequenced in genomic experiments contains certain markers (e.g. indels) encoded within and can often be ascertained forensically from experimental datasets. We developed the Genotype validation Pipeline (GenoPipe), a suite of heuristic tools that operate together directly on raw and aligned sequencing data from individual high-throughput sequencing experiments to characterize the underlying genome of the source material. We demonstrate how GenoPipe validates and rescues erroneously annotated experiments by identifying unique markers inherent to an organism's genome (i.e. epitope insertions, gene deletions and SNPs).

     
    more » « less
  4. Abstract Motivation

    Recent advances in genomics and precision medicine have been made possible through the application of high throughput sequencing (HTS) to large collections of human genomes. Although HTS technologies have proven their use in cataloging human genome variation, computational analysis of the data they generate is still far from being perfect. The main limitation of Illumina and other popular sequencing technologies is their short read length relative to the lengths of (common) genomic repeats. Newer (single molecule sequencing – SMS) technologies such as Pacific Biosciences and Oxford Nanopore are producing longer reads, making it theoretically possible to overcome the difficulties imposed by repeat regions. Unfortunately, because of their high sequencing error rate, reads generated by these technologies are very difficult to work with and cannot be used in many of the standard downstream analysis pipelines. Note that it is not only difficult to find the correct mapping locations of such reads in a reference genome, but also to establish their correct alignment so as to differentiate sequencing errors from real genomic variants. Furthermore, especially since newer SMS instruments provide higher throughput, mapping and alignment need to be performed much faster than before, maintaining high sensitivity.

    Results

    We introduce lordFAST, a novel long-read mapper that is specifically designed to align reads generated by PacBio and potentially other SMS technologies to a reference. lordFAST not only has higher sensitivity than the available alternatives, it is also among the fastest and has a very low memory footprint.

    Availability and implementation

    lordFAST is implemented in C++ and supports multi-threading. The source code of lordFAST is available at https://github.com/vpc-ccg/lordfast.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    Genome editing technologies have revolutionized genetic studies in the life sciences community in recent years. The application of these technologies allows researchers to conveniently generate mutations in almost any gene of interest. This is very useful for species such as maize that have complex genomes and lack comprehensive mutant collections. With the improvement of genome editing tools and transformation methods, these technologies are also widely used to assist breeding research and implementation in maize. However, the detection and genotyping of genomic edits rely on low‐throughput, high‐cost methods, such as traditional agarose gel electrophoresis and Sanger sequencing. This article describes a method to barcode the target regions of genomic edits from many individuals by low‐cost polymerase chain reaction (PCR) amplification. It also employs next‐generation sequencing (NGS) to genotype the genome‐edited plants at high throughput and low cost. This protocol can be used for initial screening of genomic edits as well as derived population genotyping on a small or large scale, at high efficiency and low cost. © 2021 Wiley Periodicals LLC.

    Basic Protocol 1: A fast genomic DNA preparation method from genome edited plants

    Basic Protocol 2: Barcoding the amplicons of edited regions from each individual by two rounds of PCR

    Basic Protocol 3: Bioinformatics analysis

     
    more » « less