CitationSnead, A.A., Meng, F., Largotta, N. et al. Diploid chromosome-level genome assembly and annotation for Lycorma delicatula. Sci Data 12, 579 (2025). https://doi.org/10.1038/s41597-025-04854-8AbstractThe spotted lanternfly (Lycorma delicatula) is a planthopper species (Hemiptera: Fulgoridae) native to China but invasive in South Korea, Japan, and the United States where it is a significant threat to agriculture. Hence, genomic resources are critical to both management and understand the genomic characteristics of successful invaders. Here, we report a haplotype-phased genome assembly and annotation using PacBio long-read sequencing, Hi-C technology, and RNA-seq data. The 2.2 Gbp genome comprises 13 chromosomes, and our whole genome sequencing of eighty-two adults indicated chromosome four as the sex chromosome and anXO sex-determination system.We identified over 12,000 protein coding genes and performed functional annotation, facilitating identification of several candidate genes which may hold importance for spotted lanternfly control. Both the assemblies and annotations were highly complete with over 96% of BUSCO genes complete regardless of the database employed (i.e., Eukaryota, Arthropoda, Insecta). This reference-quality genome will serve as an important resource for both development and optimization of management practices for the spotted lanternfly and invasive genomics as a whole.Description of the data and file structureThis dataset contains the haplotype-phased chromosome-level genome assembly of the spotted lanternfly (Lycorma delicatula) described and published in Snead & Meng et al. (in review). The genome combined long-read data and HiC data (SRA31402152-SRA31402153) to assembly and scaffold each haplotype. The annotation uses RNAseq data from 12 adults (SRA31411873-SRA31411894) to structurally annotate both haplotypes. Finally, whole-genome sequencing of 82 adult spotted lanternfly (bioproject PRJNA1136004) described in the metadata csv provided was used to identify punitive sex chromosomes. The dataset also include GO results for each chromosome not explicitly described in the results of the manuscript.Files and variablesFile: SLF_Hap1.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1.gffDescription: A structural annotation of the 13 chromosome haplotype 1 assembly with functional annotations.File: SLF_Hap2.gffDescription: A structural annotation of the 13 chromosome haplotype 2 assembly with functional annotations.File: GO_plot_chr_1.pngDescription: An image of the top 20 GO term results for chromosome 1.File: GO_plot_chr_2.pngDescription: An image of the top 20 GO term results for chromosome 2.File: GO_plot_chr_3.pngDescription: An image of the top 20 GO term results for chromosome 3.File: GO_plot_chr_8.pngDescription: An image of the top 20 GO term results for chromosome 8.File: GO_plot_chr_5.pngDescription: An image of the top 20 GO term results for chromosome 5.File: GO_plot_chr_4.pngDescription: An image of the top 20 GO term results for chromosome 4.File: GO_plot_chr_6.pngDescription: An image of the top 20 GO term results for chromosome 6.File: GO_plot_chr_7.pngDescription: An image of the top 20 GO term results for chromosome 7.File: GO_plot_chr_11.pngDescription: An image of the top 20 GO term results for chromosome 11.File: GO_plot_chr_9.pngDescription: An image of the top 20 GO term results for chromosome 9.File: GO_plot_chr_10.pngDescription: An image of the top 20 GO term results for chromosome 10.File: GO_plot_chr_12.pngDescription: An image of the top 20 GO term results for chromosome 12.File: GO_plot_chr_13.pngDescription: An image of the top 20 GO term results for chromosome 13.File: SLF_Samples_SRA.csvDescription: A csv with the sequencing information, SRA numbers, and sexes of the adults used in to identify the putative sex chromosome.File: SLF_RNAseq_Metadata.csvDescription: A csv with the sequencing information, SRA numbers, and other metadata for the RNAseq used to annotation the genomes.Variablesaccession: The SRA accession numberstudy: The studyobject_status: If the NCBI submission was new or not.bioproject_accession: The bioproject accession numberbiosample_accession: The Biosample accession numberlibrary_ID: The ID used to identify that genomic library.title: The title of the study (the bioproject)library_strategy: Specific sequencing technique used to prepare the library.library_source: The biological material used to create the sequencing library.library_selection: The library preparation method.library_layout: The arrangement of reads within the sequencing library.platform: The sequencing platform.instrument_model: The model of the sequences.design_description: Description of the study design.filetype: Type of filefilename: First filefilename2: Second filesex: The biological sex of the adult.Code/softwareThe initial haplotype-phased scaffolded genome was assembled by Dovetail Genomics (Cantata Bio) with standard software outlined in the methods with default settings. Scripts for the remaining work including annotation, gene ontology enrichment, and other analyses are located in the Github repository (https://github.com/anthonysnead/SLF-Genome-Assembly(opens in new window)).Access informationOther publicly accessible locations of the data:The raw sequencing data and the annotated haplotype-phased genome assembly of Lycorma delicatula have been deposited at the National Center for Biotechnology Information (NCBI). The Hi-C and HiFi data can be found under SRA31402152 and SRA31402153. The RNA-seq data can be found under SRA31411873-SRA31411894, while the DNA-seq data can be found under bioproject PRJNA1136004.
more »
« less
Two independent origins of XY sex chromosomes in Asparagus
File Contents Ahor_pb32m_HAP1_v1.0.chrY_regions.bed.gz Y chromosome nonrecombining region coordinatesAhor_pb32m_HAP1_v1.0.EDTA.TEanno.gff3.gz All TE annotations by EDTAAhor_pb32m_HAP1_v1.0.EDTA.TEintact.fa.gz Intact TE sequences by EDTAAhor_pb32m_HAP1_v1.0.EDTA.TEintact.gff3.gz Intact TE annotations by EDTAAhor_pb32m_HAP1_v1.0.entap_results.tsv Reciprocal functional gene annotations by EnTAPAhor_pb32m_HAP1_v1.0.fa.gz Haplotype assembly fasta fileAhor_pb32m_HAP1_v1.0.ISOFORMS.bed.gz Filtered gene annotations - all isoformsAhor_pb32m_HAP1_v1.0.ISOFORMS.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - all isoformsAhor_pb32m_HAP1_v1.0.ISOFORMS.gff3.gz Filtered gene annotations - all isoformsAhor_pb32m_HAP1_v1.0.ISOFORMS.peptides.fa.gz Filtered gene annotations (protein sequences) - all isoformsAhor_pb32m_HAP1_v1.0.PRIMARY.bed.gz Filtered gene annotations - longest isoforms onlyAhor_pb32m_HAP1_v1.0.PRIMARY.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - longest isoforms onlyAhor_pb32m_HAP1_v1.0.PRIMARY.gff3.gz Filtered gene annotations - longest isoforms onlyAhor_pb32m_HAP1_v1.0.PRIMARY.peptides.fa.gz Filtered gene annotations (protein sequences) - longest isoforms onlyAhor_pb32m_HAP1_v1.0.RM.TE.gff.gz All TE annotations by RepeatMaskerAhor_pb32m_HAP2_v1.0.chrX_regions.bed.gz X chromosome nonrecombining region coordinatesAhor_pb32m_HAP2_v1.0.EDTA.TEanno.gff3.gz All TE annotations by EDTAAhor_pb32m_HAP2_v1.0.EDTA.TEintact.fa.gz Intact TE sequences by EDTAAhor_pb32m_HAP2_v1.0.EDTA.TEintact.gff3.gz Intact TE annotations by EDTAAhor_pb32m_HAP2_v1.0.entap_results.tsv Reciprocal functional gene annotations by EnTAPAhor_pb32m_HAP2_v1.0.fa.gz Haplotype assembly fasta fileAhor_pb32m_HAP2_v1.0.ISOFORMS.bed.gz Filtered gene annotations - all isoformsAhor_pb32m_HAP2_v1.0.ISOFORMS.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - all isoformsAhor_pb32m_HAP2_v1.0.ISOFORMS.gff3.gz Filtered gene annotations - all isoformsAhor_pb32m_HAP2_v1.0.ISOFORMS.peptides.fa.gz Filtered gene annotations (protein sequences) - all isoformsAhor_pb32m_HAP2_v1.0.PRIMARY.bed.gz Filtered gene annotations - longest isoforms onlyAhor_pb32m_HAP2_v1.0.PRIMARY.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - longest isoforms onlyAhor_pb32m_HAP2_v1.0.PRIMARY.gff3.gz Filtered gene annotations - longest isoforms onlyAhor_pb32m_HAP2_v1.0.PRIMARY.peptides.fa.gz Filtered gene annotations (protein sequences) - longest isoforms onlyAhor_pb32m_HAP2_v1.0.RM.TE.gff.gz All TE annotations by RepeatMaskerAoff_pb81m_HAP1_v1.0.chrX_regions.bed.gz X chromosome nonrecombining region coordinatesAoff_pb81m_HAP1_v1.0.EDTA.TEanno.gff3.gz All TE annotations by EDTAAoff_pb81m_HAP1_v1.0.EDTA.TEintact.fa.gz Intact TE sequences by EDTAAoff_pb81m_HAP1_v1.0.EDTA.TEintact.gff3.gz Intact TE annotations by EDTAAoff_pb81m_HAP1_v1.0.entap_results.tsv Reciprocal functional gene annotations by EnTAPAoff_pb81m_HAP1_v1.0.fa.gz Haplotype assembly fasta fileAoff_pb81m_HAP1_v1.0.ISOFORMS.bed.gz Filtered gene annotations - all isoformsAoff_pb81m_HAP1_v1.0.ISOFORMS.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - all isoformsAoff_pb81m_HAP1_v1.0.ISOFORMS.gff3.gz Filtered gene annotations - all isoformsAoff_pb81m_HAP1_v1.0.ISOFORMS.peptides.fa.gz Filtered gene annotations (protein sequences) - all isoformsAoff_pb81m_HAP1_v1.0.PRIMARY.bed.gz Filtered gene annotations - longest isoforms onlyAoff_pb81m_HAP1_v1.0.PRIMARY.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - longest isoforms onlyAoff_pb81m_HAP1_v1.0.PRIMARY.gff3.gz Filtered gene annotations - longest isoforms onlyAoff_pb81m_HAP1_v1.0.PRIMARY.peptides.fa.gz Filtered gene annotations (protein sequences) - longest isoforms onlyAoff_pb81m_HAP1_v1.0.RM.TE.gff.gz All TE annotations by RepeatMaskerAoff_pb81m_HAP2_v1.0.chrY_regions.bed.gz Y chromosome nonrecombining region coordinatesAoff_pb81m_HAP2_v1.0.EDTA.TEanno.gff3.gz All TE annotations by EDTAAoff_pb81m_HAP2_v1.0.EDTA.TEintact.fa.gz Intact TE sequences by EDTAAoff_pb81m_HAP2_v1.0.EDTA.TEintact.gff3.gz Intact TE annotations by EDTAAoff_pb81m_HAP2_v1.0.entap_results.tsv Reciprocal functional gene annotations by EnTAPAoff_pb81m_HAP2_v1.0.fa.gz Haplotype assembly fasta fileAoff_pb81m_HAP2_v1.0.ISOFORMS.bed.gz Filtered gene annotations - all isoformsAoff_pb81m_HAP2_v1.0.ISOFORMS.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - all isoformsAoff_pb81m_HAP2_v1.0.ISOFORMS.gff3.gz Filtered gene annotations - all isoformsAoff_pb81m_HAP2_v1.0.ISOFORMS.peptides.fa.gz Filtered gene annotations (protein sequences) - all isoformsAoff_pb81m_HAP2_v1.0.PRIMARY.bed.gz Filtered gene annotations - longest isoforms onlyAoff_pb81m_HAP2_v1.0.PRIMARY.CDS.fa.gz Filtered gene annotations (protein coding nucleotides) - longest isoforms onlyAoff_pb81m_HAP2_v1.0.PRIMARY.gff3.gz Filtered gene annotations - longest isoforms onlyAoff_pb81m_HAP2_v1.0.PRIMARY.peptides.fa.gz Filtered gene annotations (protein sequences) - longest isoforms onlyAoff_pb81m_HAP2_v1.0.RM.TE.gff.gz All TE annotations by RepeatMaskerAhorridus_MSY_CODEML.zip CODEML M2a output for 11 MSY genes with evidence of positive selection in Asparagus horridus
more »
« less
- PAR ID:
- 10661269
- Publisher / Repository:
- Zenodo
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Carya glabra(2n= 4x= 64), also known as pignut hickory, is a widely distributed species in the walnut family (Juglandaceae). Native to the central and eastern United States and southeastern Canada,C. glabraplays an important ecological role as a common upland forest species; it is closely related to several economically valuable nut trees, includingC. illinoinensis(pecan). A deeper understanding of the genetics ofC. glabrais essential for studying its evolutionary history and biology, with potential implications for agricultural improvement of pecan. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased, representing the first assembled polyploid genome in the genusCarya. A total of 64 pseudochromosomes were assembled and phased into four haplotypes. The haplotype A assembly spans 600.4 Mb, comprises 55.0% repetitive sequences, and contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Functional annotation assigned 94.3% of haplotype A genes to gene families, and 79.7% and 86.3% of genes were annotated with Gene Ontology terms and protein domains, respectively; 635 putative plant disease resistance genes were found in haplotype A. The other three haplotypes exhibited similarly high-quality annotation metrics. Our genomic analyses also suggest thatC. glabrais an autotetraploid. Comparative genomic analyses revealed high collinearity among the four haplotypes ofC. glabraand the published genomes of three otherCaryaspecies, although structural variation among the genomes of these species was identified. In addition, we provide an improved chloroplast genome assembly and the first mitochondrial genome forC. glabra. Importantly, most members of the research team are undergraduate students; the sequenced individual is located in McCarty Woods, a Conservation Area on the University of Florida campus. This work highlights the value of genome assembly efforts as powerful tools for teaching genomics and supporting conservation initiatives. This first high-quality reference genome forC. glabraprovides a valuable resource for studyingCarya, a genus of significant ecological and economic importance. Article summaryCarya glabra(pignut hickory) is a common upland forest species in North America. This species is a member of the walnut family (Juglandaceae), which includes many economically important nut trees. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased. The haplotype A assembly contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Our genomic analyses suggest thatC. glabrais an autopolyploid. We also provide chloroplast and mitochondrial genome assemblies. This nuclear genome provides a valuable resource for studyingCarya, a genus of significant ecological and economic importance.more » « less
-
The Drosophila kikkawai feature with NCBI Gene ID 108084518 was determined to be an ortholog of Drosophila melanogaster Sox102F, a member of the FlyBase High Mobility Group Box Transcription Factors gene group (FBgg0000748). Five isoforms were constructed using the GEP F element annotation protocol, the longest being novel isoform Sox102F-PNE (identified using the XM_017180752 RefSeq prediction and RNA-seq data). Among the isoforms found in both D. melanogaster and D. kikkawai, Sox102F-PB is the longest and exhibits a 1.18x coding span expansion due to transposable element insertion into an intron. All D. kikkawai protein isoforms contain the conserved domain HMG_box_dom (IPR009071).more » « less
-
Pyhäjärvi, T (Ed.)Abstract Blackberries (Rubus spp.) are the fourth most economically important berry crop worldwide. Genome assemblies and annotations have been developed for Rubus species in subgenus Idaeobatus, including black raspberry (R. occidentalis), red raspberry (R. idaeus), and R. chingii, but very few genomic resources exist for blackberries and their relatives in subgenus Rubus. Here we present a chromosome-length assembly and annotation of the diploid blackberry germplasm accession “Hillquist” (R. argutus). “Hillquist” is the only known source of primocane-fruiting (annual-fruiting) in tetraploid fresh-market blackberry breeding programs and is represented in the pedigree of many important cultivars worldwide. The “Hillquist” assembly, generated using Pacific Biosciences long reads scaffolded with high-throughput chromosome conformation capture sequencing, consisted of 298 Mb, of which 270 Mb (90%) was placed on 7 chromosome-length scaffolds with an average length of 38.6 Mb. Approximately 52.8% of the genome was composed of repetitive elements. The genome sequence was highly collinear with a novel maternal haplotype-resolved linkage map of the tetraploid blackberry selection A-2551TN and genome assemblies of R. chingii and red raspberry. A total of 38,503 protein-coding genes were predicted, of which 72% were functionally annotated. Eighteen flowering gene homologs within a previously mapped locus aligning to an 11.2 Mb region on chromosome Ra02 were identified as potential candidate genes for primocane-fruiting. The utility of the “Hillquist” genome has been demonstrated here by the development of the first genotyping-by-sequencing-based linkage map of tetraploid blackberry and the identification of possible candidate genes for primocane-fruiting. This chromosome-length assembly will facilitate future studies in Rubus biology, genetics, and genomics and strengthen applied breeding programs.more » « less
-
Teleosts are important models to study sex chromosomes and sex-determining (SD) genes because they present a variety of sex determination systems. Here, we used Nanopore and Hi-C technologies to generate a high-contiguity chromosome-level genome assembly of a YY southern catfish ( Silurus meridionalis ). The assembly is 750.0 Mb long, with contig N50 of 15.96 Mb and scaffold N50 of 27.22 Mb. We also sequenced and assembled an XY male genome with a size of 727.2 Mb and contig N50 of 13.69 Mb. We identified a candidate SD gene through comparisons to our previous assembly of an XX individual. By resequencing male and female pools, we characterized a 2.38 Mb sex-determining region (SDR) on Chr24. Analysis of read coverage and comparison of the X and Y chromosome sequences showed a Y specific insertion (approx. 500 kb) in the SDR which contained a male-specific duplicate of amhr2 (named amhr2y ). amhr2y and amhr2 shared high-nucleotide identity (81.0%) in the coding region but extremely low identity in the promotor and intron regions. The exclusive expression in the male gonadal primordium and loss-of-function inducing male to female sex reversal confirmed the role of amhr2y in male sex determination. Our study provides a new example of amhr2 as the SD gene in fish and sheds light on the convergent evolution of the duplication of AMH/AMHR2 pathway members underlying the evolution of sex determination in different fish lineages.more » « less
An official website of the United States government
