skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: Taro Genome Assembly and Linkage Map Reveal QTLs for Resistance to Taro Leaf Blight
Abstract Taro (Colocasia esculenta) is a food staple widely cultivated in the humid tropics of Asia, Africa, Pacific and the Caribbean. One of the greatest threats to taro production is Taro Leaf Blight caused by the oomycete pathogen Phytophthora colocasiae. Here we describe a de novo taro genome assembly and use it to analyze sequence data from a Taro Leaf Blight resistant mapping population. The genome was assembled from linked-read sequences (10x Genomics; ∼60x coverage) and gap-filled and scaffolded with contigs assembled from Oxford Nanopore Technology long-reads and linkage map results. The haploid assembly was 2.45 Gb total, with a maximum contig length of 38 Mb and scaffold N50 of 317,420 bp. A comparison of family-level (Araceae) genome features reveals the repeat content of taro to be 82%, >3.5x greater than in great duckweed (Spirodela polyrhiza), 23%. Both genomes recovered a similar percent of Benchmarking Universal Single-copy Orthologs, 80% and 84%, based on a 3,236 gene database for monocot plants. A greater number of nucleotide-binding leucine-rich repeat disease resistance genes were present in genomes of taro than the duckweed, ∼391 vs. ∼70 (∼182 and ∼46 complete). The mapping population data revealed 16 major linkage groups with 520 markers, and 10 quantitative trait loci (QTL) significantly associated with Taro Leaf Blight disease resistance. The genome sequence of taro enhances our understanding of resistance to TLB, and provides markers that may accelerate breeding programs. This genome project may provide a template for developing genomic resources in other understudied plant species.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
G3 Genes|Genomes|Genetics
Page Range / eLocation ID:
2763 to 2775
Medium: X
Sponsoring Org:
National Science Foundation
More Like this

    The Pacific crabapple (Malus fusca) is a wild relative of the commercial apple (Malus×domestica). With a range extending from Alaska to Northern California,M. fuscais extremely hardy and disease resistant. The species represents an untapped genetic resource for the development of new apple cultivars with enhanced stress resistance. However, gene discovery and utilization ofM. fuscahave been hampered by the lack of genomic resources. Here, we present a high‐quality, haplotype‐resolved, chromosome‐scale genome assembly and annotation forM. fusca. The genome was assembled using high‐fidelity long‐reads and scaffolded using genetic maps and high‐throughput chromatin conformation capture sequencing, resulting in one of the most contiguous apple genomes to date. We annotated the genome using public transcriptomic data from the same species taken from diverse plant structures and developmental stages. Using this assembly, we explored haplotypic structural variation within the genome ofM. fusca, identifying thousands of large variants. We further showed high sequence co‐linearity with other domesticated and wildMalusspecies. Finally, we resolve a known quantitative trait locus associated with resistance to fire blight (Erwinia amylovora). Insights gained from the assembly of a reference‐quality genome of this hardy wild apple relative will be invaluable as a tool to facilitate DNA‐informed introgression breeding.

    more » « less
  2. Abstract

    Two mapping populations were developed from crosses of the Asianindicarice (Oryza sativaL.) cultivar ‘Dee Geo Woo Gen’ (DGWG; PI 699210 Parent, PI 699212 Parent) and two weedy rice ecotypes, an early‐flowering straw hull (SH) biotype AR‐2000‐1135‐01 (PI 699209 Parent) collected in Arkansas and a late‐flowering black hull (BHA) biotype MS‐1996‐9 (PI 699211 Parent) collected in Mississippi. The weed and crop‐based rice recombinant inbred line (RIL) mapping populations have been used to identify genomic regions associated with weedy traits as well as resistance to sheath blight and rice blast diseases. The mapping population consists of 185 (DGWG/SH; Reg. no. MP‐9, NSL 541035 MAP) and 234 (BHA/DGWG; Reg. no. MP‐10, NSL 541036 MAP) F8RILs, of which 175 (DGWG/SH) and 224 (BHA/DGWG) were used to construct two linkage maps using single nucleotide polymorphic markers to identify weedy traits, sheath blight, and blast resistance loci. These mapping populations and related datasets represent a valuable resource for basic rice evolutionary genomic research and applied marker‐assisted breeding efforts in disease resistance.

    more » « less
  3. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. 
    more » « less
  4. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  5. Abstract

    Eastern black walnut (Juglans nigraL.), one of the most valuable timber and veneer trees in North America, provides nut shells with unique industrial uses and nut kernels with distinctive culinary attributes. A mature F1full-sib progeny orchard of 248 individuals from the cross of two eastern black walnut cultivars provides a long-term resource for discovering genetic mechanisms controlling life history, quality traits, and stress resistance. The genetic linkage map, constructed with 356 single nucleotide polymorphism (SNP) markers and 62 expressed sequence tag simple sequence repeats (EST-SSRs), is 1645.7 cM in length, distributed across the expected 16 linkage groups. In this first application of QTL mapping inJ. nigra, we report QTL for budbreak, peak pistillate bloom, peak staminate bloom, and heterodichogamy. A dominant major QTL for heterodichogamy is reported, the sequence for which is syntenic with the heterodichogamy QTL on chromosome 11 of Persian walnut (J. regiaL.). The mapping population parents are both protogynous, and segregation suggests a Mendelian component, with a 3:1-like inheritance pattern from heterozygous parents. Mapping the sequenced EST-SSR markers to theJ. regia“Chandler” V2.0 genome sequence revealed evidence for collinearity and structural changes on two of the sixteen chromosomes. The inclusion of sequenced EST-SSR markers enables the direct comparison of this and subsequentJ. nigramaps and otherJuglandaceaegenetic maps. This investigation initiates long-term QTL detection studies for quality and stress resistance traits in black walnut.

    more » « less