skip to main content


Title: Sequence-based classification and identification of Fungi
Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.  more » « less
Award ID(s):
1557417
NSF-PAR ID:
10064154
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; « less
Date Published:
Journal Name:
Mycologia
ISSN:
1557-2536
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Fungus‐farming ants (Hymenoptera: Formicidae) have become model systems for exploring questions regarding the evolution of symbiosis. However, robust phylogenetic studies of both the ant agriculturalists and their fungal cultivars are necessary for addressing whether or not observed ant–fungus associations are the result of coevolution and, if so, whether that coevolution has been strict or diffuse. Here we focus on the evolutionary relationships of the species within the ant genusMyrmicocryptaand of their fungal cultivars. The fungus‐farming ant genusMyrmicocryptawas created by Fr. Smith in 1860 based on a single alate queen. Since then, 31 species and subspecies have been described. Until now, the genus has not received any taxonomic treatment and the relationships of the species within the genus have not been tested. Our molecular analyses, using ∼40 putative species and six protein‐coding (nuclear and mitochondrial) gene fragments, recoverMyrmicocryptaas monophyletic and as the sister group of the genusMycocepurusForel. The speciesM. tuberculataWeber is recovered as the sister to the rest ofMyrmicocrypta. The time‐calibrated phylogeny recovers the age of stem groupMyrmicocryptaplus its sister group as 45 Ma, whereas the inferred age for the crown groupMyrmicocryptais recovered as 27 Ma. Ancestral character‐state analyses suggest that the ancestor ofMyrmicocryptahad scale‐like or squamate hairs and that, although such hairs were once considered diagnostic for the genus, the alternative state of erect simple hairs has evolved at least seven independent times. Ancestral‐state analyses of observed fungal cultivar associations suggest that the most recent common ancestor ofMyrmicocryptacultivated clade 2 fungal species and that switches to clade 1 fungi have occurred at least five times. It is our hope that these results will encourage additional species‐level phylogenies of fungus‐farming ants and their fungal cultivars, which are necessary for understanding the evolutionary processes that gave rise to agriculture in ants and that produced the current diversity of mutualistic ant–fungus interactions.

     
    more » « less
  2. We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Importance Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA amplicon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies. 
    more » « less
  3. Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.  
    more » « less
  4. Abstract

    Effective research, education, and outreach efforts by theArabidopsis thalianacommunity, as well as other scientific communities that depend on Arabidopsis resources, depend vitally on easily available and publicly‐shared resources. These resources include reference genome sequence data and an ever‐increasing number of diverse data sets and data types.TAIR(The Arabidopsis Information Resource) and Araport (originally named the Arabidopsis Information Portal) are community informatics resources that provide tools, data, and applications to the more than 30,000 researchers worldwide that use in their work either Arabidopsis as a primary system of study or data derived from Arabidopsis. Four years after Araport's establishment, theIAICheld another workshop to evaluate the current status of Arabidopsis Informatics and chart a course for future research and development. The workshop focused on several challenges, including the need for reliable and current annotation, community‐defined common standards for data and metadata, and accessible and user‐friendly repositories/tools/methods for data integration and visualization. Solutions envisioned included (a) a centralized annotation authority to coalesce annotation from new groups, establish a consistent naming scheme, distribute this format regularly and frequently, and encourage and enforce its adoption. (b) Standards for data and metadata formats, which are essential, but challenging when comparing across diverse genotypes and in areas with less‐established standards (e.g., phenomics, metabolomics). Community‐established guidelines need to be developed. (c) A searchable, central repository for analysis and visualization tools. Improved versioning and user access would make tools more accessible. Workshop participants proposed a “one‐stop shop” website, an Arabidopsis “Super‐Portal” to link tools, data resources, programmatic standards, and best practice descriptions for each data type. This must have community buy‐in and participation in its establishment and development to encourage adoption.

     
    more » « less
  5. David, Lawrence A. (Ed.)
    ABSTRACT Shotgun metagenomic sequencing has transformed our understanding of microbial community ecology. However, preparing metagenomic libraries for high-throughput DNA sequencing remains a costly, labor-intensive, and time-consuming procedure, which in turn limits the utility of metagenomes. Several library preparation procedures have recently been developed to offset these costs, but it is unclear how these newer procedures compare to current standards in the field. In particular, it is not clear if all such procedures perform equally well across different types of microbial communities or if features of the biological samples being processed (e.g., DNA amount) impact the accuracy of the approach. To address these questions, we assessed how five different shotgun DNA sequence library preparation methods, including the commonly used Nextera Flex kit, perform when applied to metagenomic DNA. We measured each method’s ability to produce metagenomic data that accurately represent the underlying taxonomic and genetic diversity of the community. We performed these analyses across a range of microbial community types (e.g., soil, coral associated, and mouse gut associated) and input DNA amounts. We find that the type of community and amount of input DNA influence each method’s performance, indicating that careful consideration may be needed when selecting between methods, especially for low-complexity communities. However, the cost-effective preparation methods that we assessed are generally comparable to the current gold-standard Nextera DNA Flex kit for high-complexity communities. Overall, the results from this analysis will help expand and even facilitate access to metagenomic approaches in future studies. IMPORTANCE Metagenomic library preparation methods and sequencing technologies continue to advance rapidly, allowing researchers to characterize microbial communities in previously underexplored environmental samples and systems. However, widely accepted standardized library preparation methods can be cost-prohibitive. Newly available approaches may be less expensive, but their efficacy in comparison to standardized methods remains unknown. In this study, we compared five different metagenomic library preparation methods. We evaluated each method across a range of microbial communities varying in complexity and quantity of input DNA. Our findings demonstrate the importance of considering sample properties, including community type, composition, and DNA amount, when choosing the most appropriate metagenomic library preparation method. 
    more » « less