skip to main content


Title: The USDA-ARS Ag100Pest Initiative: High-Quality Genome Assemblies for Agricultural Pest Arthropod Research
The phylum Arthropoda includes species crucial for ecosystem stability, soil health, crop production, and others that present obstacles to crop and animal agriculture. The United States Department of Agriculture’s Agricultural Research Service initiated the Ag100Pest Initiative to generate reference genome assemblies of arthropods that are (or may become) pests to agricultural production and global food security. We describe the project goals, process, status, and future. The first three years of the project were focused on species selection, specimen collection, and the construction of lab and bioinformatics pipelines for the efficient production of assemblies at scale. Contig-level assemblies of 47 species are presented, all of which were generated from single specimens. Lessons learned and optimizations leading to the current pipeline are discussed. The project name implies a target of 100 species, but the efficiencies gained during the project have supported an expansion of the original goal and a total of 158 species are currently in the pipeline. We anticipate that the processes described in the paper will help other arthropod research groups or other consortia considering genome assembly at scale.  more » « less
Award ID(s):
2021795
NSF-PAR ID:
10299708
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Insects
Volume:
12
Issue:
7
ISSN:
2075-4450
Page Range / eLocation ID:
626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Cultivated pear consists of several Pyrus species with Pyrus communis (European pear) representing a large fraction of worldwide production. As a relatively recently domesticated crop and perennial tree, pear can benefit from genome-assisted breeding. Additionally, comparative genomics within Rosaceae promises greater understanding of evolution within this economically important family. Here, we generate a fully phased chromosome-scale genome assembly of P. communis ‘d’Anjou.’ Using PacBio HiFi and Dovetail Omni-C reads, the genome is resolved into the expected 17 chromosomes, with each haplotype totaling nearly 540 Megabases and a contig N50 of nearly 14 Mb. Both haplotypes are highly syntenic to each other and to the Malus domestica ‘Honeycrisp’ apple genome. Nearly 45,000 genes were annotated in each haplotype, over 90% of which have direct RNA-seq expression evidence. We detect signatures of the known whole-genome duplication shared between apple and pear, and we estimate 57% of d’Anjou genes are retained in duplicate derived from this event. This genome highlights the value of generating phased diploid assemblies for recovering the full allelic complement in highly heterozygous crop species.

     
    more » « less
  2. Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future. 
    more » « less
  3. Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production. 
    more » « less
  4. Koepfli, Klaus-Peter (Ed.)
    Abstract Genomics research has relied principally on the establishment and curation of a reference genome for the species. However, it is increasingly recognized that a single reference genome cannot fully describe the extent of genetic variation within many widely distributed species. Pangenome representations are based on high-quality genome assemblies of multiple individuals and intended to represent the broadest possible diversity within a species. A Bovine Pangenome Consortium (BPC) has recently been established to begin assembling genomes from more than 600 recognized breeds of cattle, together with other related species to provide information on ancestral alleles and haplotypes. Previously reported de novo genome assemblies for Angus, Brahman, Hereford, and Highland breeds of cattle are part of the initial BPC effort. The present report describes a complete single haplotype assembly at chromosome-scale for a fullblood Simmental cow from an F1 bison–cattle hybrid fetus by trio binning. Simmental cattle, also known as Fleckvieh due to their red and white spots, originated in central Europe in the 1830s as a triple-purpose breed selected for draught, meat, and dairy production. There are over 50 million Simmental cattle in the world, known today for their fast growth and beef yields. This assembly (ARS_Simm1.0) is similar in length to the other bovine assemblies at 2.86 Gb, with a scaffold N50 of 102 Mb (max scaffold 156.8 Mb) and meets or exceeds the continuity of the best Bos taurus reference assemblies to date. 
    more » « less
  5. Sethuraman, Arun (Ed.)
    Abstract

    Damselflies and dragonflies (Order: Odonata) play important roles in both aquatic and terrestrial food webs and can serve as sentinels of ecosystem health and predictors of population trends in other taxa. The habitat requirements and limited dispersal of lotic damselflies make them especially sensitive to habitat loss and fragmentation. As such, landscape genomic studies of these taxa can help focus conservation efforts on watersheds with high levels of genetic diversity, local adaptation, and even cryptic endemism. Here, as part of the California Conservation Genomics Project (CCGP), we report the first reference genome for the American rubyspot damselfly, Hetaerina americana, a species associated with springs, streams and rivers throughout California. Following the CCGP assembly pipeline, we produced two de novo genome assemblies. The primary assembly includes 1,630,044,487 base pairs, with a contig N50 of 5.4 Mb, a scaffold N50 of 86.2 Mb, and a BUSCO completeness score of 97.6%. This is the seventh Odonata genome to be made publicly available and the first for the subfamily Hetaerininae. This reference genome fills an important phylogenetic gap in our understanding of Odonata genome evolution, and provides a genomic resource for a host of interesting ecological, evolutionary, and conservation questions for which the rubyspot damselfly genus Hetaerina is an important model system.

     
    more » « less