Abstract Background The helmeted honeyeater (Lichenostomus melanops cassidix) is a Critically Endangered bird endemic to Victoria, Australia. To aid its conservation, the population is the subject of genetic rescue. To understand, monitor, and modulate the effects of genetic rescue on the helmeted honeyeater genome, a chromosome-length genome and a high-density linkage map are required. Results We used a combination of Illumina, Oxford Nanopore, and Hi-C sequencing technologies to assemble a chromosome-length genome of the helmeted honeyeater, comprising 906 scaffolds, with length of 1.1 Gb and scaffold N50 of 63.8 Mb. Annotation comprised 57,181 gene models. Using a pedigree of 257 birds and 53,111 single-nucleotide polymorphisms, we obtained high-density linkage and recombination maps for 25 autosomes and Z chromosome. The total sex-averaged linkage map was 1,347 cM long, with the male map being 6.7% longer than the female map. Recombination maps revealed sexually dimorphic recombination rates (overall higher in males), with average recombination rate of 1.8 cM/Mb. Comparative analyses revealed high synteny of the helmeted honeyeater genome with that of 3 passerine species (e.g., 32 Hi-C scaffolds mapped to 30 zebra finch autosomes and Z chromosome). The genome assembly and linkage map suggest that the helmeted honeyeater exhibits a fission of chromosome 1A into 2 chromosomes relative to zebra finch. PSMC analysis showed a ∼15-fold decline in effective population size to ∼60,000 from mid- to late Pleistocene. Conclusions The annotated chromosome-length genome and high-density linkage map provide rich resources for evolutionary studies and will be fundamental in guiding conservation efforts for the helmeted honeyeater.
more »
« less
OMGS: Optical Map-based Genome Scaffolding
Due to the current limitations of sequencing technologies, de novo genome assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. While scaffolding is computationally easier than sequence assembly, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and inaccuracies in the linkage information. Genome scaffolding tools either use paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) as linkage information. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others). However, the most commonly used scaffolding tools have a serious limitation: they can only deal with one optical map at a time, forcing users to alternate or iterate over multiple maps. In this paper, we introduce a novel scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map. OMGS can be obtained from GIT.
more »
« less
- PAR ID:
- 10094706
- Date Published:
- Journal Name:
- RECOMB 2019 - ACM Annual Conference on Research in Computational Molecular Biology
- Page Range / eLocation ID:
- 190-207
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundGenome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome. ResultsIn this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called , is aimed at generating long scaffolds of a target genome, from two sets of input sequences—an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a$$\langle $$ contig,contig$$\rangle $$ graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic “wiring” heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds. ConclusionsOur experiments with on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of to generate significantly longer scaffolds in low coverage settings ($$1\times $$ –$$10\times $$ ).more » « less
-
de los Campos, G (Ed.)Abstract De novo genome assembly is essential for genomic research. High-quality genomes assembled into phased pseudomolecules are challenging to produce and often contain assembly errors because of repeats, heterozygosity, or the chosen assembly strategy. Although algorithms that produce partially phased assemblies exist, haploid draft assemblies that may lack biological information remain favored because they are easier to generate and use. We developed HaploSync, a suite of tools that produces fully phased, chromosome-scale diploid genome assemblies, and performs extensive quality control to limit assembly artifacts. HaploSync scaffolds sequences from a draft diploid assembly into phased pseudomolecules guided by a genetic map and/or the genome of a closely related species. HaploSync generates a report that visualizes the relationships between current and legacy sequences, for both haplotypes, and displays their gene and marker content. This quality control helps the user identify misassemblies and guides Haplosync’s correction of scaffolding errors. Finally, HaploSync fills assembly gaps with unplaced sequences and resolves collapsed homozygous regions. In a series of plant, fungal, and animal kingdom case studies, we demonstrate that HaploSync efficiently increases the assembly contiguity of phased chromosomes, improves completeness by filling gaps, corrects scaffolding, and correctly phases highly heterozygous, complex regions.more » « less
-
Abstract The Chinese hamster genome serves as a reference genome for the study of Chinese hamster ovary (CHO) cells, the preferred host system for biopharmaceutical production. Recent re‐sequencing of the Chinese hamster genome resulted in the RefSeq PICR meta‐assembly, a set of highly accurate scaffolds that filled over 95% of the gaps in previous assembly versions. However, these scaffolds did not reach chromosome‐scale due to the absence of long‐range scaffolding information during the meta‐assembly process. Here, long‐range scaffolding of the PICR Chinese hamster genome assembly was performed using high‐throughput chromosome conformation capture (Hi‐C). This process resulted in a new “PICRH” genome, where 97% of the genome is contained in 11 mega‐scaffolds corresponding to the Chinese hamster chromosomes (2n = 22) and the total number of scaffolds is reduced by three‐fold from 1,830 scaffolds in PICR to 647 in PICRH. Continuity was improved while preserving accuracy, leading to quality scores higher than recent builds of mouse chromosomes and comparable to human chromosomes. The PICRH genome assembly will be an indispensable tool for designing advanced genetic engineering strategies in CHO cells and enabling systematic examination of genomic and epigenomic instability through comparative analysis of CHO cell lines on a common set of chromosomal coordinates.more » « less
-
Sharakhov, Igor V. (Ed.)Rubus idaeus L. (red raspberry), is a perennial woody plant species of the Rosaceae family that is widely cultivated in the temperate regions of world and is thus an economically important soft fruit species. It is prized for its flavour and aroma, as well as a high content of healthful compounds such as vitamins and antioxidants. Breeding programs exist globally for red raspberry, but variety development is a long and challenging process. Genomic and molecular tools for red raspberry are valuable resources for breeding. Here, a chromosome-length genome sequence assembly and related gene predictions for the red raspberry cultivar ‘Anitra’ are presented, comprising PacBio long read sequencing scaffolded using Hi-C sequence data. The assembled genome sequence totalled 291.7 Mbp, with 247.5 Mbp (84.8%) incorporated into seven sequencing scaffolds with an average length of 35.4 Mbp. A total of 39,448 protein-coding genes were predicted, 75% of which were functionally annotated. The seven chromosome scaffolds were anchored to a previously published genetic linkage map with a high degree of synteny and comparisons to genomes of closely related species within the Rosoideae revealed chromosome-scale rearrangements that have occurred over relatively short evolutionary periods. A chromosome-level genomic sequence of R . idaeus will be a valuable resource for the knowledge of its genome structure and function in red raspberry and will be a useful and important resource for researchers and plant breeders.more » « less
An official website of the United States government

