skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Haplotype Phased Chromosome-level Lycorma delicatula Genome and Annotation
CitationSnead, A.A., Meng, F., Largotta, N. et al. Diploid chromosome-level genome assembly and annotation for Lycorma delicatula. Sci Data 12, 579 (2025). https://doi.org/10.1038/s41597-025-04854-8AbstractThe spotted lanternfly (Lycorma delicatula) is a planthopper species (Hemiptera: Fulgoridae) native to China but invasive in South Korea, Japan, and the United States where it is a significant threat to agriculture. Hence, genomic resources are critical to both management and understand the genomic characteristics of successful invaders. Here, we report a haplotype-phased genome assembly and annotation using PacBio long-read sequencing, Hi-C technology, and RNA-seq data. The 2.2 Gbp genome comprises 13 chromosomes, and our whole genome sequencing of eighty-two adults indicated chromosome four as the sex chromosome and anXO sex-determination system.We identified over 12,000 protein coding genes and performed functional annotation, facilitating identification of several candidate genes which may hold importance for spotted lanternfly control. Both the assemblies and annotations were highly complete with over 96% of BUSCO genes complete regardless of the database employed (i.e., Eukaryota, Arthropoda, Insecta). This reference-quality genome will serve as an important resource for both development and optimization of management practices for the spotted lanternfly and invasive genomics as a whole.Description of the data and file structureThis dataset contains the haplotype-phased chromosome-level genome assembly of the spotted lanternfly (Lycorma delicatula) described and published in Snead & Meng et al. (in review). The genome combined long-read data and HiC data (SRA31402152-SRA31402153) to assembly and scaffold each haplotype. The annotation uses RNAseq data from 12 adults (SRA31411873-SRA31411894) to structurally annotate both haplotypes. Finally, whole-genome sequencing of 82 adult spotted lanternfly (bioproject PRJNA1136004) described in the metadata csv provided was used to identify punitive sex chromosomes. The dataset also include GO results for each chromosome not explicitly described in the results of the manuscript.Files and variablesFile: SLF_Hap1.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2.fastaDescription: A fasta file of the assembled genome for the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 1 assembly.File: SLF_Hap2_Repeats.gffDescription: A gff file of the repeats annotated in the cleaned 13 chromosome haplotype 2 assembly.File: SLF_Hap1.gffDescription: A structural annotation of the 13 chromosome haplotype 1 assembly with functional annotations.File: SLF_Hap2.gffDescription: A structural annotation of the 13 chromosome haplotype 2 assembly with functional annotations.File: GO_plot_chr_1.pngDescription: An image of the top 20 GO term results for chromosome 1.File: GO_plot_chr_2.pngDescription: An image of the top 20 GO term results for chromosome 2.File: GO_plot_chr_3.pngDescription: An image of the top 20 GO term results for chromosome 3.File: GO_plot_chr_8.pngDescription: An image of the top 20 GO term results for chromosome 8.File: GO_plot_chr_5.pngDescription: An image of the top 20 GO term results for chromosome 5.File: GO_plot_chr_4.pngDescription: An image of the top 20 GO term results for chromosome 4.File: GO_plot_chr_6.pngDescription: An image of the top 20 GO term results for chromosome 6.File: GO_plot_chr_7.pngDescription: An image of the top 20 GO term results for chromosome 7.File: GO_plot_chr_11.pngDescription: An image of the top 20 GO term results for chromosome 11.File: GO_plot_chr_9.pngDescription: An image of the top 20 GO term results for chromosome 9.File: GO_plot_chr_10.pngDescription: An image of the top 20 GO term results for chromosome 10.File: GO_plot_chr_12.pngDescription: An image of the top 20 GO term results for chromosome 12.File: GO_plot_chr_13.pngDescription: An image of the top 20 GO term results for chromosome 13.File: SLF_Samples_SRA.csvDescription: A csv with the sequencing information, SRA numbers, and sexes of the adults used in to identify the putative sex chromosome.File: SLF_RNAseq_Metadata.csvDescription: A csv with the sequencing information, SRA numbers, and other metadata for the RNAseq used to annotation the genomes.Variablesaccession: The SRA accession numberstudy: The studyobject_status: If the NCBI submission was new or not.bioproject_accession: The bioproject accession numberbiosample_accession: The Biosample accession numberlibrary_ID: The ID used to identify that genomic library.title: The title of the study (the bioproject)library_strategy: Specific sequencing technique used to prepare the library.library_source: The biological material used to create the sequencing library.library_selection: The library preparation method.library_layout: The arrangement of reads within the sequencing library.platform: The sequencing platform.instrument_model: The model of the sequences.design_description: Description of the study design.filetype: Type of filefilename: First filefilename2: Second filesex: The biological sex of the adult.Code/softwareThe initial haplotype-phased scaffolded genome was assembled by Dovetail Genomics (Cantata Bio) with standard software outlined in the methods with default settings. Scripts for the remaining work including annotation, gene ontology enrichment, and other analyses are located in the Github repository (https://github.com/anthonysnead/SLF-Genome-Assembly(opens in new window)).Access informationOther publicly accessible locations of the data:The raw sequencing data and the annotated haplotype-phased genome assembly of Lycorma delicatula have been deposited at the National Center for Biotechnology Information (NCBI). The Hi-C and HiFi data can be found under SRA31402152 and SRA31402153. The RNA-seq data can be found under SRA31411873-SRA31411894, while the DNA-seq data can be found under bioproject PRJNA1136004.  more » « less
Award ID(s):
2312129
PAR ID:
10599305
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
figshare
Date Published:
Subject(s) / Keyword(s):
Genomics Genomics and transcriptomics
Format(s):
Medium: X Size: 5851680570 Bytes
Size(s):
5851680570 Bytes
Right(s):
Creative Commons Attribution 4.0 International
Sponsoring Org:
National Science Foundation
More Like this
  1. Synopsis Urbanization promotes the formation of heat islands. For ectothermic animals in cities, the urban heat island effect can increase developmental rate and result in smaller adult body size (i.e., the temperature-size rule). A smaller adult body size could be consequential for invasive urban ectotherms due to potential effects of body size on thermal tolerance, dispersal distance, and fecundity. Here, we explored the effect of urbanization on body size in the spotted lanternfly (Lycorma delicatula), an invasive planthopper (Hemiptera: Fulgoridae) that is rapidly spreading across urban and non-urban settings in the United States. We then evaluated the consequences of spotted lanternfly body size for heat tolerance, a trait with importance for ectotherm survival in urban heat islands. Contrary to our expectations, we found that both male (P = 0.011) and female (P < 0.001) spotted lanternflies were larger in more urbanized areas and that females displayed a positive effect of body size on resistance to hot temperatures (P = 0.018). These results reject plasticity in developmental rate due to the urban heat island effect as an explanation for spotted lanternfly body size and instead lend necessary (but insufficient) support to an adaptive explanation stemming from advantages of larger body size in cities. This study demonstrates a positive effect of urbanization on spotted lanternfly body size, with potential implications for dispersal distance, fecundity, and thermal tolerance in urban areas. 
    more » « less
  2. Abstract Non‐native plant pests and pathogens threaten biodiversity, ecosystem function, food security, and economic livelihoods. As new invasive populations establish, often as an unintended consequence of international trade, they can become additional sources of introductions, accelerating global spread through bridgehead effects. While the study of non‐native pest spread has used computational models to provide insights into drivers and dynamics of biological invasions and inform management, efforts have focused on local or regional scales and are challenged by complex transmission networks arising from bridgehead population establishment. This paper presents a flexible spatiotemporal stochastic network model called PoPS (Pest or Pathogen Spread) Global that couples international trade networks with core drivers of biological invasions—climate suitability, host availability, and propagule pressure—quantified through open, globally available databases to forecast the spread of non‐native plant pests. The modular design of the framework makes it adaptable for various pests capable of dispersing via human‐mediated pathways, supports proactive responses to emerging pests when limited data are available, and enables forecasts at different spatial and temporal resolutions. We demonstrate the framework using a case study of the invasive planthopper spotted lanternfly (Lycorma delicatula). The model was calibrated with historical, known spotted lanternfly introductions to identify potential bridgehead populations that may contribute to global spread. This global view of phytosanitary pandemics provides crucial information for anticipating biological invasions, quantifying transport pathways risk levels, and allocating resources to safeguard plant health, agriculture, and natural resources. 
    more » « less
  3. Abstract Models that are both spatially and temporally dynamic are needed to forecast where and when non-native pests and pathogens are likely to spread, to provide advance information for natural resource managers. The potential US range of the invasive spotted lanternfly (SLF, Lycorma delicatula ) has been modeled, but until now, when it could reach the West Coast’s multi-billion-dollar fruit industry has been unknown. We used process-based modeling to forecast the spread of SLF assuming no treatments to control populations occur. We found that SLF has a low probability of first reaching the grape-producing counties of California by 2027 and a high probability by 2033. Our study demonstrates the importance of spatio-temporal modeling for predicting the spread of invasive species to serve as an early alert for growers and other decision makers to prepare for impending risks of SLF invasion. It also provides a baseline for comparing future control options. 
    more » « less
  4. Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control. 
    more » « less
  5. Whiteman, N (Ed.)
    Abstract The genome sequence of the diploid and highly homozygous Vitis vinifera genotype PN40024 serves as the reference for many grapevine studies. Despite several improvements to the PN40024 genome assembly, its current version PN12X.v2 is quite fragmented and only represents the haploid state of the genome with mixed haplotypes. In fact, being nearly homozygous, this genome contains several heterozygous regions that are yet to be resolved. Taking the opportunity of improvements that long-read sequencing technologies offer to fully discriminate haplotype sequences, an improved version of the reference, called PN40024.v4, was generated. Through incorporating long genomic sequencing reads to the assembly, the continuity of the 12X.v2 scaffolds was highly increased with a total number decreasing from 2,059 to 640 and a reduction in N bases of 88%. Additionally, the full alternative haplotype sequence was built for the first time, the chromosome anchoring was improved and the number of unplaced scaffolds was reduced by half. To obtain a high-quality gene annotation that outperforms previous versions, a liftover approach was complemented with an optimized annotation workflow for Vitis. Integration of the gene reference catalogue and its manual curation have also assisted in improving the annotation, while defining the most reliable estimation of 35,230 genes to date. Finally, we demonstrated that PN40024 resulted from 9 selfings of cv. “Helfensteiner” (cross of cv. “Pinot noir” and “Schiava grossa”) instead of a single “Pinot noir”. These advances will help maintain the PN40024 genome as a gold-standard reference, also contributing toward the eventual elaboration of the grapevine pangenome. 
    more » « less