Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus. 
                        more » 
                        « less   
                    
                            
                            Chromosome level genome assembly of the Etruscan shrew Suncus etruscus
                        
                    
    
            Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10489965
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Data
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2052-4463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Murphy, William (Ed.)Abstract The stone marten (Martes foina) is an important species for cytogenetic studies in the order Carnivora. ZooFISH probes created from its chromosomes provided a strong and clean signal in chromosome painting experiments and were valuable for studying the evolution of carnivoran genome architecture. The research revealed that the stone marten chromosome set is similar to the presumed ancestral karyotype of the Carnivora, which added an additional value for the species. Using linked-read and Hi-C sequencing, we generated a chromosome-length genome assembly of a male stone marten (Gansu province, China) from a primary cell line. The stone marten assembly had a length of 2.42 Gbp, scaffold N50 of 144 Mbp, and a 96.2% BUSCO completeness score. We identified 19 chromosomal scaffolds (2n = 38) and assigned them chromosome ids based on chromosome painting data. Annotation identified 20,087 protein-coding gene models, of which 18,283 were assigned common names. Comparison of the stone marten assembly with the cat, dog, and human genomes revealed several small syntenic blocks absent on the published painting maps. Finally, we assessed the heterozygosity and its distribution over the chromosomes. The detected low heterozygosity level (0.4 hetSNPs/kbp) and the presence of long runs of homozygosity require further research and a new evaluation of the conservation status of the stone marten in China. Combined with available carnivoran genomes in large-scale synteny analysis, the stone marten genome will highlight new features and events in carnivoran evolution, hidden from cytogenetic approaches.more » « less
- 
            We sequenced the genome of the North American groundhog, Marmota monax , also known as the woodchuck. Our sequencing strategy included a combination of short, high-quality Illumina reads plus long reads generated by both Pacific Biosciences and Oxford Nanopore instruments. Assembly of the combined data produced a genome of 2.74 Gbp in total length, with an N50 contig size of 1,094,236 bp. To annotate the genome, we mapped the genes from another M. monax genome and from the closely related Alpine marmot, Marmota marmota , onto our assembly, resulting in 20,559 annotated protein-coding genes and 28,135 transcripts. The genome assembly and annotation are available in GenBank under BioProject PRJNA587092 .more » « less
- 
            Abstract The Javan gibbon, Hylobates moloch, is an endangered gibbon species restricted to the forest remnants of western and central Java, Indonesia, and one of the rarest of the Hylobatidae family. Hylobatids consist of 4 genera (Holoock, Hylobates, Symphalangus, and Nomascus) that are characterized by different numbers of chromosomes, ranging from 38 to 52. The underlying cause of this karyotype plasticity is not entirely understood, at least in part, due to the limited availability of genomic data. Here we present the first scaffold-level assembly for H. moloch using a combination of whole-genome Illumina short reads, 10X Chromium linked reads, PacBio, and Oxford Nanopore long reads and proximity-ligation data. This Hylobates genome represents a valuable new resource for comparative genomics studies in primates.more » « less
- 
            Abstract Raphidioptera (snakeflies) are a holometabolan order with the least species diversity but play a pivotal role in understanding the origin of complete metamorphosis. Here, we provide an annotated, chromosome-level reference genome assembly for an Asian endemic snakeflyMongoloraphidia duomilia(Yang, 1998) of the family Raphidiidae, assembled using PacBio HiFi and Hi-C data from female specimens. The resulting assembly is 653.56 Mb, of which 97.90% is anchored into 13 chromosomes. The scaffold N50 is 53.50 Mb, and BUSCO completeness is 97.80%. Repetitive elements comprise 64.31% of the genome (366.04 Mb). We identified 599 noncoding RNAs and predicted 11,141 protein-coding genes in the genome (97.70% BUSCO completeness). The new snakefly genome will facilitate comparison of genome architecture across Neuropterida and Holometabola and shed light on the ecological and evolutionary transitions between Neuropterida and Coleopterida.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
