Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
more »
« less
A haplotype-resolved, chromosome-scale genome for Malus domestica Borkh. ‘WA 38’
Abstract Genome sequencing for agriculturally important Rosaceous crops has made rapid progress both in completeness and annotation quality. Whole genome sequence and annotation give breeders, researchers, and growers information about cultivar-specific traits such as fruit quality and disease resistance, and inform strategies to enhance postharvest storage. Here we present a haplotype-phased, chromosomal-level genome of Malus domestica, ‘WA 38’, a new apple cultivar released to market in 2017 as Cosmic Crisp®. Using both short and long-read sequencing data with a k-mer-based approach, chromosomes originating from each parent were assembled and segregated. This is the first pome fruit genome fully phased into parental haplotypes in which chromosomes from each parent are identified and separated into their unique, respective haplomes. The two haplome assemblies, ‘Honeycrisp’ originated HapA and ‘Enterprise’ originated HapB, are about 650 Megabases each, and both have a BUSCO score of 98.7% complete. A total of 53,028 and 54,235 genes were annotated from HapA and HapB, respectively. Additionally, we provide genome-scale comparisons to ‘Gala’, ‘Honeycrisp’, and other relevant cultivars highlighting major differences in genome structure and gene family circumscription. This assembly and annotation was done in collaboration with the American Campus Tree Genomes project that includes ‘WA 38’ (Washington State University), ‘d’Anjou’ pear (Auburn University), and many more. To ensure transparency, reproducibility, and applicability for any genome project, our genome assembly and annotation workflow is recorded in detail and shared under a public GitLab repository. All software is containerized, offering a simple implementation of the workflow.
more »
« less
- Award ID(s):
- 2239530
- PAR ID:
- 10569397
- Editor(s):
- McIntyre, L
- Publisher / Repository:
- Oxford
- Date Published:
- Journal Name:
- G3: Genes, Genomes, Genetics
- ISSN:
- 2160-1836
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
CitationSnead, A.A., Meng, F., Largotta, N. et al. Diploid chromosome-level genome assembly and annotation for Lycorma delicatula. Sci Data 12, 579 (2025). https://doi.org/10.1038/s41597-025-04854-8AbstractThe spotted lanternfly (Lycorma delicatula) is a planthopper species (Hemiptera: Fulgoridae) native to China but invasive in South Korea, Japan, and the United States where it is a significant threat to agriculture. Hence, genomic resources are critical to both management and understand the genomic characteristics of successful invaders. Here, we report a haplotype-phased genome assembly and annotation using PacBio long-read sequencing, Hi-C technology, and RNA-seq data. The 2.2 Gbp genome comprises 13 chromosomes, and our whole genome sequencing of eighty-two adults indicated chromosome four as the sex chromosome and anXO sex-determination system.We identified over 12,000 protein coding genes and performed functional annotation, facilitating identification of several candidate genes which may hold importance for spotted lanternfly control. Both the assemblies and annotations were highly complete with over 96% of BUSCO genes complete regardless of the database employed (i.e., Eukaryota, Arthropoda, Insecta). This reference-quality genome will serve as an important resource for both development and optimization of management practices for the spotted lanternfly and invasive genomics as a whole.Description of the data and file structureThis dataset contains the haplotype-phased chromosome-level genome assembly of the spotted lanternfly (Lycorma delicatula) described and published in Snead & Meng et al. (in review). The genome combined long-read data and HiC data (SRA31402152-SRA31402153) to assembly and scaffold each haplotype. The annotation uses RNAseq data from 12 adults (SRA31411873-SRA31411894) to structurally annotate both haplotypes. Finally, whole-genome sequencing of 82 adult spotted lanternfly (bioproject PRJNA1136004) described in the metadata csv provided was used to identify punitive sex chromosomes. The dataset also include GO results for each chromosome not explicitly described in the results of the manuscript.Files and variablesFile: SLF_Hap1.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1.gffDescription: A structural annotation of the 13 chromosome haplotype 1 assembly with functional annotations.File: SLF_Hap2.gffDescription: A structural annotation of the 13 chromosome haplotype 2 assembly with functional annotations.File: GO_plot_chr_1.pngDescription: An image of the top 20 GO term results for chromosome 1.File: GO_plot_chr_2.pngDescription: An image of the top 20 GO term results for chromosome 2.File: GO_plot_chr_3.pngDescription: An image of the top 20 GO term results for chromosome 3.File: GO_plot_chr_8.pngDescription: An image of the top 20 GO term results for chromosome 8.File: GO_plot_chr_5.pngDescription: An image of the top 20 GO term results for chromosome 5.File: GO_plot_chr_4.pngDescription: An image of the top 20 GO term results for chromosome 4.File: GO_plot_chr_6.pngDescription: An image of the top 20 GO term results for chromosome 6.File: GO_plot_chr_7.pngDescription: An image of the top 20 GO term results for chromosome 7.File: GO_plot_chr_11.pngDescription: An image of the top 20 GO term results for chromosome 11.File: GO_plot_chr_9.pngDescription: An image of the top 20 GO term results for chromosome 9.File: GO_plot_chr_10.pngDescription: An image of the top 20 GO term results for chromosome 10.File: GO_plot_chr_12.pngDescription: An image of the top 20 GO term results for chromosome 12.File: GO_plot_chr_13.pngDescription: An image of the top 20 GO term results for chromosome 13.File: SLF_Samples_SRA.csvDescription: A csv with the sequencing information, SRA numbers, and sexes of the adults used in to identify the putative sex chromosome.File: SLF_RNAseq_Metadata.csvDescription: A csv with the sequencing information, SRA numbers, and other metadata for the RNAseq used to annotation the genomes.Variablesaccession: The SRA accession numberstudy: The studyobject_status: If the NCBI submission was new or not.bioproject_accession: The bioproject accession numberbiosample_accession: The Biosample accession numberlibrary_ID: The ID used to identify that genomic library.title: The title of the study (the bioproject)library_strategy: Specific sequencing technique used to prepare the library.library_source: The biological material used to create the sequencing library.library_selection: The library preparation method.library_layout: The arrangement of reads within the sequencing library.platform: The sequencing platform.instrument_model: The model of the sequences.design_description: Description of the study design.filetype: Type of filefilename: First filefilename2: Second filesex: The biological sex of the adult.Code/softwareThe initial haplotype-phased scaffolded genome was assembled by Dovetail Genomics (Cantata Bio) with standard software outlined in the methods with default settings. Scripts for the remaining work including annotation, gene ontology enrichment, and other analyses are located in the Github repository (https://github.com/anthonysnead/SLF-Genome-Assembly(opens in new window)).Access informationOther publicly accessible locations of the data:The raw sequencing data and the annotated haplotype-phased genome assembly of Lycorma delicatula have been deposited at the National Center for Biotechnology Information (NCBI). The Hi-C and HiFi data can be found under SRA31402152 and SRA31402153. The RNA-seq data can be found under SRA31411873-SRA31411894, while the DNA-seq data can be found under bioproject PRJNA1136004.more » « less
-
de los Campos, G (Ed.)Abstract De novo genome assembly is essential for genomic research. High-quality genomes assembled into phased pseudomolecules are challenging to produce and often contain assembly errors because of repeats, heterozygosity, or the chosen assembly strategy. Although algorithms that produce partially phased assemblies exist, haploid draft assemblies that may lack biological information remain favored because they are easier to generate and use. We developed HaploSync, a suite of tools that produces fully phased, chromosome-scale diploid genome assemblies, and performs extensive quality control to limit assembly artifacts. HaploSync scaffolds sequences from a draft diploid assembly into phased pseudomolecules guided by a genetic map and/or the genome of a closely related species. HaploSync generates a report that visualizes the relationships between current and legacy sequences, for both haplotypes, and displays their gene and marker content. This quality control helps the user identify misassemblies and guides Haplosync’s correction of scaffolding errors. Finally, HaploSync fills assembly gaps with unplaced sequences and resolves collapsed homozygous regions. In a series of plant, fungal, and animal kingdom case studies, we demonstrate that HaploSync efficiently increases the assembly contiguity of phased chromosomes, improves completeness by filling gaps, corrects scaffolding, and correctly phases highly heterozygous, complex regions.more » « less
-
Abstract Carya glabra(2n= 4x= 64), also known as pignut hickory, is a widely distributed species in the walnut family (Juglandaceae). Native to the central and eastern United States and southeastern Canada,C. glabraplays an important ecological role as a common upland forest species; it is closely related to several economically valuable nut trees, includingC. illinoinensis(pecan). A deeper understanding of the genetics ofC. glabrais essential for studying its evolutionary history and biology, with potential implications for agricultural improvement of pecan. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased, representing the first assembled polyploid genome in the genusCarya. A total of 64 pseudochromosomes were assembled and phased into four haplotypes. The haplotype A assembly spans 600.4 Mb, comprises 55.0% repetitive sequences, and contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Functional annotation assigned 94.3% of haplotype A genes to gene families, and 79.7% and 86.3% of genes were annotated with Gene Ontology terms and protein domains, respectively; 635 putative plant disease resistance genes were found in haplotype A. The other three haplotypes exhibited similarly high-quality annotation metrics. Our genomic analyses also suggest thatC. glabrais an autotetraploid. Comparative genomic analyses revealed high collinearity among the four haplotypes ofC. glabraand the published genomes of three otherCaryaspecies, although structural variation among the genomes of these species was identified. In addition, we provide an improved chloroplast genome assembly and the first mitochondrial genome forC. glabra. Importantly, most members of the research team are undergraduate students; the sequenced individual is located in McCarty Woods, a Conservation Area on the University of Florida campus. This work highlights the value of genome assembly efforts as powerful tools for teaching genomics and supporting conservation initiatives. This first high-quality reference genome forC. glabraprovides a valuable resource for studyingCarya, a genus of significant ecological and economic importance. Article summaryCarya glabra(pignut hickory) is a common upland forest species in North America. This species is a member of the walnut family (Juglandaceae), which includes many economically important nut trees. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased. The haplotype A assembly contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Our genomic analyses suggest thatC. glabrais an autopolyploid. We also provide chloroplast and mitochondrial genome assemblies. This nuclear genome provides a valuable resource for studyingCarya, a genus of significant ecological and economic importance.more » « less
-
Abstract BackgroundGenetic and epigenetic perturbation of cis-regulatory sequences can shift patterns of gene expression and result in novel phenotypes. Phased genome assemblies now enable the local dissection of linkages between cis-regulatory sequences, including their epigenetic state, and allele-specific gene expression to further characterize gene regulation and resulting phenotypes in heterozygous genomes. ResultsWe assembled a locally phased genome for a mandarin hybrid named ‘Fairchild’ to explore the molecular signatures of allele-specific gene expression. With local genome phasing, genes with allele-specific expression were paired with haplotype-specific chromatin states, including levels of chromatin accessibility, histone modifications, and DNA methylation. We found that 30% of variation in allele-specific expression could be attributed to haplotype associated factors, with allelic levels of chromatin accessibility and three histone modifications in gene bodies having the most influence. Structural variants in promoter regions were also associated with allele-specific expression, including specific enrichments of hAT and MULE-MuDR DNA transposon sequences. Integration of haplotype-resolved genetic and epigenetic landscapes with high-throughput phenotypic analysis of fruit traits in a panel of 154 accessions with mandarin and pummelo ancestry revealed that trait-associated variants were enriched in regions of open chromatin. Mining of trait-associated variants uncovered a Gypsy retrotransposon insertion in a gene that regulates potassium transport and may contribute to the reduction in fruit size that is observed in mandarins. ConclusionsUsing a locally phased assembly of a heterozygous cultivar of citrus, we dissected the interplay between genetic variants and molecular phenotypes to reveal cis-regulatory sequences with potential functional effects on phenotypes relevant for genetic improvement.more » « less
An official website of the United States government

