Abstract MotivationLearning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. ResultsIn this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. Availability and implementationTADA is available at https://github.com/tada-alg/TADA. Supplementary informationSupplementary data are available at Bioinformatics online. 
                        more » 
                        « less   
                    
                            
                            Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative’s Workshop and Follow-On Activities
                        
                    
    
            Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1714276
- PAR ID:
- 10490403
- Author(s) / Creator(s):
- ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more »
- Editor(s):
- Bucci, Vanni
- Publisher / Repository:
- American Society for Microbiology
- Date Published:
- Journal Name:
- mSystems
- Volume:
- 6
- Issue:
- 1
- ISSN:
- 2379-5077
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract BackgroundThe Spacecraft Assembly Facility (SAF) at the NASA’s Jet Propulsion Laboratory is the primary cleanroom facility used in the construction of some of the planetary protection (PP)-sensitive missions developed by NASA, including the Mars 2020 Perseverance Rover that launched in July 2020. SAF floor samples (n=98) were collected, over a 6-month period in 2016 prior to the construction of the Mars rover subsystems, to better understand the temporal and spatial distribution of bacterial populations (total, viable, cultivable, and spore) in this unique cleanroom. ResultsCleanroom samples were examined for total (living and dead) and viable (living only) microbial populations using molecular approaches and cultured isolates employing the traditional NASA standard spore assay (NSA), which predominantly isolated spores. The 130 NSA isolates were represented by 16 bacterial genera, of which 97% were identified as spore-formers via Sanger sequencing. The most spatially abundant isolate wasBacillus subtilis, and the most temporally abundant spore-former wasVirgibacillus panthothenticus. The 16S rRNA gene-targeted amplicon sequencing detected 51 additional genera not found in the NSA method. The amplicon sequencing of the samples treated with propidium monoazide (PMA), which would differentiate between viable and dead organisms, revealed a total of 54 genera: 46 viable non-spore forming genera and 8 viable spore forming genera in these samples. The microbial diversity generated by the amplicon sequencing corresponded to ~86% non-spore-formers and ~14% spore-formers. The most common spatially distributed genera wereSphinigobium,Geobacillus, andBacilluswhereas temporally distributed common genera wereAcinetobacter,Geobacilllus, andBacillus. Single-cell genomics detected 6 genera in the sample analyzed, with the most prominent beingAcinetobacter. ConclusionThis study clearly established that detecting spores via NSA does not provide a complete assessment for the cleanliness of spacecraft-associated environments since it failed to detect several PP-relevant genera that were only recovered via molecular methods. This highlights the importance of a methodological paradigm shift to appropriately monitor bioburden in cleanrooms for not only the aeronautical industry but also for pharmaceutical, medical industries, etc., and the need to employ molecular sequencing to complement traditional culture-based assays.more » « less
- 
            Abstract Microbes active in extreme cold are not as well explored as those of other extreme environments. Studies have revealed a substantial microbial diversity and identified cold‐specific microbiome molecular functions. We analyzed the metagenomes and metatranscriptomes of 20 snow samples collected in early and late spring in Svalbard, Norway using mi‐faser, our read‐based computational microbiome function annotation tool. Our results reveal a more diverse microbiome functional capacity and activity in the early‐ vs. late‐spring samples. We also find that functional dissimilarity between the same‐sample metagenomes and metatranscriptomes is significantly higher in early than late spring samples. These findings suggest that early spring samples may contain a larger fraction of DNA of dormant (or dead) organisms, while late spring samples reflect a new, metabolically active community. We further show that the abundance of sequencing reads mapping to the fatty acid synthesis‐related microbial pathways in late spring metagenomes and metatranscriptomes is significantly correlated with the organic acid levels measured in these samples. Similarly, the organic acid levels correlate with the pathway read abundances of geraniol degradation and inversely correlate with those of styrene degradation, suggesting a possible nutrient change. Our study thus highlights the activity of microbial degradation pathways of complex organic compounds previously unreported at low temperatures.more » « less
- 
            Microbiome studies have recently transitioned from experimental designs with a few hundred samples to designs spanning tens of thousands of samples. Modern studies such as the Earth Microbiome Project (EMP) afford the statistics crucial for untangling the many factors that influence microbial community composition. Analyzing those data used to require access to a compute cluster, making it both expensive and inconvenient. We show that recent improvements in both hardware and software now allow to compute key bioinformatics tasks on EMP-sized data in minutes using a gaming-class laptop, enabling much faster and broader microbiome science insights.more » « less
- 
            To fully exploit the potential of plant microbiome alterations to improve plant health, reliable methods must be used to prepare and characterize microbiome samples. The power of culture-independent studies is that they allow the charac- terization of novel microbial community members, but only microbial members consistently represented between different research groups are likely to become broadly applicable treatments. The identification of plant microbiome mem- bers can be affected by several experimental stages, including design, sample preparation, nucleic acid extraction, sequencing, and analysis. The protocols described here therefore aim to highlight crucial steps that experimenters should consider before beginning a plant microbiome study.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    