skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Roadmap for naming uncultivated Archaea and Bacteria
Abstract

The assembly of single-amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) has led to a surge in genome-based discoveries of members affiliated with Archaea and Bacteria, bringing with it a need to develop guidelines for nomenclature of uncultivated microorganisms. The International Code of Nomenclature of Prokaryotes (ICNP) only recognizes cultures as ‘type material’, thereby preventing the naming of uncultivated organisms. In this Consensus Statement, we propose two potential paths to solve this nomenclatural conundrum. One option is the adoption of previously proposed modifications to the ICNP to recognize DNA sequences as acceptable type material; the other option creates a nomenclatural code for uncultivated Archaea and Bacteria that could eventually be merged with the ICNP in the future. Regardless of the path taken, we believe that action is needed now within the scientific community to develop consistent rules for nomenclature of uncultivated taxa in order to provide clarity and stability, and to effectively communicate microbial diversity.

 
more » « less
Award ID(s):
1950770 1831599 1841658
NSF-PAR ID:
10159703
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Microbiology
Volume:
5
Issue:
8
ISSN:
2058-5276
Page Range / eLocation ID:
p. 987-994
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. TBD (Ed.)
    The Uncultivated Bacteria and Archaea dataset is a foundational collection of 7,903 genomes from uncultivated microorganisms. It highlights how microbial diversity is readily recovered using current tools and existing metagenomic datasets to help piece together the tree of life. 
    more » « less
  2. null (Ed.)
    Abstract The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes. 
    more » « less
  3. Spring, Stefan (Ed.)

    It has been proposed that the superphylum of Asgard Archaea may represent a historical link between the Archaea and Eukarya. Following the discovery of the Archaea, it was soon appreciated that archaeal ribosomes were more similar to those of Eukarya rather than Bacteria. Coupled with other eukaryotic-like features, it has been suggested that the Asgard Archaea may be directly linked to eukaryotes. However, the genomes of Bacteria and non-Asgard Archaea generally organize ribosome-related genes into clusters that likely function as operons. In contrast, eukaryotes typically do not employ an operon strategy. To gain further insight into conservation of the r-protein genes, the genome order of conserved ribosomal protein (r-protein) coding genes was identified in 17 Asgard genomes (thirteen complete genomes and four genomes with less than 20 contigs) and compared with those found previously in non-Asgard archaeal and bacterial genomes. A universal core of two clusters of 14 and 4 cooccurring r-proteins, respectively, was identified in both the Asgard and non-Asgard Archaea. The equivalent genes in the E. coli version of the cluster are found in the S10 and spc operons. The large cluster of 14 r-protein genes (uS19-uL22-uS3-uL29-uS17 from the S10 operon and uL14-uL24-uL5-uS14-uS8-uL6-uL18-uS5-uL30-uL15 from the spc operon) occurs as a complete set in the genomes of thirteen Asgard genomes (five Lokiarchaeotes, three Heimdallarchaeotes, one Odinarchaeote, and four Thorarchaeotes). Four less conserved clusters with partial bacterial equivalents were found in the Asgard. These were the L30e (str operon in Bacteria) cluster, the L18e (alpha operon in Bacteria) cluster, the S24e-S27ae-rpoE1 cluster, and the L31e, L12..L1 cluster. Finally, a new cluster referred to as L7ae was identified. In many cases, r-protein gene clusters/operons are less conserved in their organization in the Asgard group than in other Archaea. If this is generally true for nonribosomal gene clusters, the results may have implications for the history of genome organization. In particular, there may have been an early transition to or from the operon approach to genome organization. Other nonribosomal cellular features may support different relationships. For this reason, it may be important to consider ribosome features separately.

     
    more » « less
  4. Abstract

    Signal peptides help newly synthesized proteins reach the cell membrane or be secreted. As part of a biological process key to immune response and surveillance in humans, and associated with diseases, for example, Alzheimer, remnant signal peptides and other transmembrane segments are proteolyzed by the intramembrane aspartyl protease (IAP) enzyme family. Here, we identified IAP orthologs throughout the tree of life. In addition to eukaryotes, IAPs are encoded in metabolically diverse archaea from a wide range of environments. We found three distinct clades of archaeal IAPs: (a)Euryarchaeota(eg, halophilicHalobacteriales, methanogenicMethanosarcinalesandMethanomicrobiales, marinePoseidoniales, acidophilicThermoplasmatales, hyperthermophilicArchaeoglobusspp.), (b) DPANN, and (c)Bathyarchaeota,Crenarchaeota, andAsgard. IAPs were also present in bacterial genomes from uncultivated members of Candidate Phylum Radiation, perhaps due to horizontal gene transfer from DPANN archaeal lineages. Sequence analysis of the catalytic motif YD…GXGD (where X is any amino acid) in IAPs from archaea and bacteria reveals WD inLokiarchaeotaand many residue types in the X position. Gene neighborhood analysis in halophilic archaea shows IAP genes near corrinoid transporters (btuCDFgenes). In marineEuryarchaeota, a putative BtuF‐like domain is found in N‐terminus of the IAP gene, suggesting a role for these IAPs in metal ion cofactor or other nutrient scavenging. Interestingly, eukaryotic IAP family members appear to have evolved either fromEuryarchaeotaor fromAsgardarchaea. Taken together, our phylogenetic and bioinformatics analysis should prompt experiments to probe the biological roles of IAPs in prokaryotic secretomes.

     
    more » « less
  5. We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future. In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changing (a verb commonly used to express frustration with biological nomenclature), it is useful to think of nomenclature as a set of data points, which grows over time. For example, when synonymy happens, we do not erase the past, but rather record a new context for the name(s) in question. The biological concept changes, but the nomenclature (names) simply keeps adding up. Behind the scenes, nomenclature in TaxonWorks is represented by a set of nodes and edges, i.e., a mathematical graph, or network (e.g., Fig. 1). Most names (i.e., nodes in the network) are what TaxonWorks calls "protonyms," monomial epithets that are used to construct, for example, bionomial names (not to be confused with "protonym" sensu the ICZN). Protonyms are linked to other protonyms via relationships defined in NOMEN, an ontology that encodes governed rules of nomenclature. Within the system, all data, nodes and edges, can be cited, i.e., linked to a source and therefore anchored in time and tied to authorship, and annotated with a variety of annotation types (e.g., notes, confidence levels, tags). The actual building of the graphs is greatly simplified by multiple user-interfaces that allow scientists to review (e.g. Fig. 2), create, filter, and add to (again, not "change") the nomenclatural history. As in any complex knowledge-representation model, there are outlying scenarios, or edge cases that emerge, making certain human tasks more complex than others. TaxonWorks is no exception, it has limitations in terms of what and how some things can be represented. While many complex representations are hidden by simplified user-interfaces, some, for example, the handling of the ICZN's Family-group name, batch-loading of invalid relationships, and comparative syncing against external resources need more work to simplify the processes presently required to meet catalogers' needs. The depth at which TaxonWorks can capture nomenclature is only really valuable if it can be used by others. This is facilitated by the application programming interface (API) serving its data (https://api.taxonworks.org), serving text files, and by exports to standards like the emerging Catalog of Life Data Package. With reference to real-world problems, we illustrate different ways in which the API can be used, for example, as integrated into spreadsheets, through the use of command line scripts, and serve in the generation of public-facing websites. Behind all this effort are an increasing number of people recording help videos, developing documentation, and troubleshooting software and technical issues. Major contributions have come from developers at many skill levels, from high school to senior software engineers, illustrating that TaxonWorks leads in enabling both technical and domain-based contributions. The health and growth of this community is a key factor in TaxonWork's potential long-term impact in the effort to unify the names of Earth's species. 
    more » « less