Search for: All records

Award ID contains: 1943371

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Annotation of protein-coding genes in 49 diatom genomes from the Bacillariophyta clade

https://doi.org/10.1038/s41597-025-05306-z

Nenasheva, Natalia; Pitzschel, Clara; Webster, Cynthia_N; Hart, Alexander_J; Wegrzyn, Jill_L; Bengtsson, Mia_M; Hoff, Katharina_J (June 2025, Scientific Data)

Abstract Diatoms, a major group of microalgae, play a critical role in global carbon cycling and primary production. Despite their ecological significance, comprehensive genomic resources for diatoms are limited. To address this, we have annotated previously unannotated genome assemblies of 49 diatom species. Genome assemblies were obtained from NCBI Datasets and processed for repeat elements using RepeatModeler2 and RepeatMasker. For gene prediction, BRAKER2 was employed in the absence of transcriptomic data, while BRAKER3 was utilised when transcriptome short read data were available from the Sequence Read Archive. The quality of genome assemblies and predicted protein sets was evaluated using BUSCO, ensuring high-quality genomic resources. Functional annotation was performed using EnTAP, providing insights into the biological roles of the predicted proteins. Our study enhances the genomic toolkit available for diatoms, facilitating future research in diatom biology, ecology, and evolution.
more » « less
A haplotype-resolved reference genome for Eucalyptus grandis

https://doi.org/10.1093/g3journal/jkaf112

Lötter, Anneri; Bruna, Tomas; Duong, Tuan A; Barry, Kerrie; Lipzen, Anna; Daum, Chris; Yoshinaga, Yuko; Grimwood, Jane; Jenkins, Jerry W; Talag, Jayson; et al (May 2025, G3: Genes, Genomes, Genetics)
Ingvarsson, P (Ed.)
Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species.
more » « less
Free, publicly-accessible full text available May 30, 2026
Genome Assembly of a Living Fossil, the Atlantic Horseshoe Crab Limulus polyphemus , Reveals Lineage-Specific Whole-Genome Duplications, Transposable Element-Based Centromeres, and a ZW Sex Chromosome System

https://doi.org/10.1093/molbev/msaf021

Castellano, Kate R; Neitzey, Michelle L; Starovoitov, Andrew; Barrett, Gabriel A; Reid, Noah M; Vuruputoor, Vidya S; Webster, Cynthia N; Storer, Jessica M; Pauloski, Nicole R; Ameral, Natalie J; et al (February 2025, Molecular Biology and Evolution)
Wilson, Melissa (Ed.)
Abstract Horseshoe crabs, considered living fossils with a stable morphotype spanning ∼445 million years, are evolutionarily, ecologically, and biomedically important species experiencing rapid population decline. Of the four extant species of horseshoe crabs, the Atlantic horseshoe crab, Limulus polyphemus, has become an essential component of the modern medicine toolkit. Here, we present the first chromosome-level genome assembly, and the most contiguous and complete assembly to date, for L. polyphemus using nanopore long-read sequencing and chromatin conformation analysis. We find support for three horseshoe crab-specific whole-genome duplications, but none shared with Arachnopulmonata (spiders and scorpions). Moreover, we discovered tandem duplicates of endotoxin detection pathway components Factors C and G, identify candidate centromeres consisting of Gypsy retroelements, and classify the ZW sex chromosome system for this species and a sister taxon, Carcinoscorpius rotundicauda. Finally, we revealed this species has been experiencing a steep population decline over the last 5 million years, highlighting the need for international conservation interventions and fisheries-based management for this critical species.
more » « less
Free, publicly-accessible full text available February 1, 2026
Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomes

https://doi.org/10.1002/aps3.11533

Vuruputoor, Vidya S.; Monyak, Daniel; Fetter, Karl C.; Webster, Cynthia; Bhattarai, Akriti; Shrestha, Bikash; Zaman, Sumaira; Bennett, Jeremy; McEvoy, Susan L.; Caballero, Madison; et al (July 2023, Applications in Plant Sciences)

Abstract PremiseRobust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein‐coding gene predictions. MethodsThe impact of repeat masking, long‐read and short‐read inputs, and de novo and genome‐guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. ResultsBenchmarks that reflect gene structures, reciprocal similarity search alignments, and mono‐exonic/multi‐exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA‐read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence‐based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome‐guided transcriptome assemblies, or full‐length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post‐processing with functional and structural filters is highly recommended. DiscussionWhile the annotation of non‐model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
more » « less
Full Text Available
Unveiling the Genetic Blueprint of a Desert Scorpion: A Chromosome-level Genome of Hadrurus arizonensis Provides the First Reference for Parvorder Iurida

https://doi.org/10.1093/gbe/evae097

Bryant, Meridia Jane; Coello, Asher M; Glendening, A M; Hilliman, Samuel A; Jara, Carolina Fernanda; Pring, Samuel S; Rodríguez_Rivera, Aviel; Santiago_Membreño, Jennifer; Nigro, Lisa; Pauloski, Nicole; et al (May 2024, Genome Biology and Evolution)
Fraser, Bonnie (Ed.)
Abstract Over 400 million years old, scorpions represent an ancient group of arachnids and one of the first animals to adapt to life on land. Presently, the lack of available genomes within scorpions hinders research on their evolution. This study leverages ultralong nanopore sequencing and Pore-C to generate the first chromosome-level assembly and annotation for the desert hairy scorpion, Hadrurus arizonensis. The assembled genome is 2.23 Gb in size with an N50 of 280 Mb. Pore-C scaffolding reoriented 99.6% of bases into nine chromosomes and BUSCO identified 998 (98.6%) complete arthropod single copy orthologs. Repetitive elements represent 54.69% of the assembled bases, including 872,874 (29.39%) LINE elements. A total of 18,996 protein-coding genes and 75,256 transcripts were predicted, and extracted protein sequences yielded a BUSCO score of 97.2%. This is the first genome assembled and annotated within the family Hadruridae, representing a crucial resource for closing gaps in genomic knowledge of scorpions, resolving arachnid phylogeny, and advancing studies in comparative and functional genomics.
more » « less
Full Text Available
Crossroads of assembling a moss genome: navigating contaminants and horizontal gene transfer in the moss Physcomitrellopsis africana

https://doi.org/10.1093/g3journal/jkae104

Vuruputoor, Vidya_S; Starovoitov, Andrew; Cai, Yuqing; Liu, Yang; Rahmatpour, Nasim; Hedderson, Terry_A; Wilding, Nicholas; Wegrzyn, Jill_L; Goffinet, Bernard; Ingvarsson, ed., P. (May 2024, G3: Genes, Genomes, Genetics)

Abstract The first chromosome-scale reference genome of the rare narrow-endemic African moss Physcomitrellopsis africana (P. africana) is presented here. Assembled from 73 × Oxford Nanopore Technologies (ONT) long reads and 163 × Beijing Genomics Institute (BGI)-seq short reads, the 414 Mb reference comprises 26 chromosomes and 22,925 protein-coding genes [Benchmarking Universal Single-Copy Ortholog (BUSCO) scores: C:94.8% (D:13.9%)]. This genome holds 2 genes that withstood rigorous filtration of microbial contaminants, have no homolog in other land plants, and are thus interpreted as resulting from 2 unique horizontal gene transfers (HGTs) from microbes. Further, P. africana shares 176 of the 273 published HGT candidates identified in Physcomitrium patens (P. patens), but lacks 98 of these, highlighting that perhaps as many as 91 genes were acquired in P. patens in the last 40 million years following its divergence from its common ancestor with P. africana. These observations suggest rather continuous gene gains via HGT followed by potential losses during the diversification of the Funariaceae. Our findings showcase both dynamic flux in plant HGTs over evolutionarily “short” timescales, alongside enduring impacts of successful integrations, like those still functionally maintained in extant P. africana. Furthermore, this study describes the informatic processes employed to distinguish contaminants from candidate HGT events.
more » « less
Haplogenome assembly reveals structural variation in Eucalyptus interspecific hybrids

https://doi.org/10.1093/gigascience/giad064

Lötter, Anneri; Duong, Tuan A; Candotti, Julia; Mizrachi, Eshchar; Wegrzyn, Jill L; Myburg, Alexander A (December 2022, GigaScience)

Abstract BackgroundDe novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species. FindingsUsing Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements. ConclusionsKnowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.
more » « less
Full Text Available
A genome sequence for the threatened whitebark pine

https://doi.org/10.1093/g3journal/jkae061

Neale, David B.; Zimin, Aleksey V.; Meltzer, Amy; Bhattarai, Akriti; Amee, Maurice; Figueroa Corona, Laura; Allen, Brian J.; Puiu, Daniela; Wright, Jessica; De La Torre, Amanda R.; et al (March 2024, G3: Genes, Genomes, Genetics)

Abstract Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.
more » « less
Chromosome-Level Genome Assembly and Annotation of a Periodical Cicada Species: Magicicada septendecula

https://doi.org/10.1093/gbe/evae001

Bush, Jonas; Webster, Cynthia; Wegrzyn, Jill; Simon, Chris; Wilcox, Edward; Khan, Ruqayya; Weisz, David; Dudchenko, Olga; Aiden, Erez Lieberman; Frandsen, Paul; et al (January 2024, Genome Biology and Evolution)

Abstract We present a high-quality assembly and annotation of the periodical cicada species, Magicicada septendecula (Hemiptera: Auchenorrhyncha: Cicadidae). Periodical cicadas have a significant ecological impact, serving as a food source for many mammals, reptiles, and birds. Magicicada are well known for their massive emergences of 1 to 3 species that appear in different locations in the eastern United States nearly every year. These year classes (“broods”) emerge dependably every 13 or 17 yr in a given location. Recently, it has become clear that 4-yr early or late emergences of a sizeable portion of a population are an important part of the history of brood formation; however, the biological mechanisms by which they track the passage of time remain a mystery. Using PacBio HiFi reads in conjunction with Hi-C proximity ligation data, we have assembled and annotated the first whole genome for a periodical cicada, an important resource for future phylogenetic and comparative genomic analysis. This also represents the first quality genome assembly and annotation for the Hemipteran superfamily Cicadoidea. With a scaffold N50 of 518.9 Mb and a complete BUSCO score of 96.7%, we are confident that this assembly will serve as a vital resource toward uncovering the genomic basis of periodical cicadas’ long, synchronized life cycles and will provide a robust framework for further investigations into these insects.
more » « less
Profiling genome‐wide methylation in two maples: Fine‐scale approaches to detection with nanopore technology

https://doi.org/10.1111/eva.13669

McEvoy, Susan L.; Grady, Patrick G. S.; Pauloski, Nicole; O'Neill, Rachel J.; Wegrzyn, Jill L. (April 2024, Evolutionary Applications)

Abstract DNA methylation is critical to the regulation of transposable elements and gene expression and can play an important role in the adaptation of stress response mechanisms in plants. Traditional methods of methylation quantification rely on bisulfite conversion that can compromise accuracy. Recent advances in long‐read sequencing technologies allow for methylation detection in real time. The associated algorithms that interpret these modifications have evolved from strictly statistical approaches to Hidden Markov Models and, recently, deep learning approaches. Much of the existing software focuses on methylation in the CG context, but methylation in other contexts is important to quantify, as it is extensively leveraged in plants. Here, we present methylation profiles for two maple species across the full range of 5mC sequence contexts using Oxford Nanopore Technologies (ONT) long‐reads. Hybrid and reference‐guided assemblies were generated for two newAceraccessions:Acer negundo(box elder; 65x ONT and 111X Illumina) andAcer saccharum(sugar maple; 93x ONT and 148X Illumina). The ONT reads generated for these assemblies were re‐basecalled, and methylation detection was conducted in a custom pipeline with the publishedAcerreferences (PacBio assemblies) and hybrid assemblies reported herein to generate four epigenomes. Examination of the transposable element landscape revealed the dominance ofLTR Copiaelements and patterns of methylation associated with different classes of TEs. Methylation distributions were examined at high resolution across gene and repeat density and described within the broader angiosperm context, and more narrowly in the context of gene family dynamics and candidate nutrient stress genes.
more » « less

« Prev Next »