skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Breaking Barriers with Bread: Using the Sourdough Starter Microbiome to Teach High-Throughput Sequencing Techniques
ABSTRACT Widespread usage of high-throughput sequencing (HTS) in the LIFE SCIENCES has produced a demand for undergraduate and graduate institutions to offer classes exposing students to all aspects of HTS (sample acquisition, laboratory work, sequencing technologies, bioinformatics, and statistical analyses). Despite the increase in demand, many challenges exist for these types of classes. We advocate for the usage of the sourdough starter microbiome for implementing meta-amplicon sequencing. The relatively small community, dominated by a few taxa, enables potential contaminants to be easily identified, while between-sample differences can be quickly statistically assessed. Finally, bioinformatic pipelines and statistical analyses can be carried out on personal student laptops or in a teaching computer lab. In two semesters adopting this system, 12 of 14 students were able to effectively capture the sourdough starter microbiome, using the instructor’s paired sample as reference.  more » « less
Award ID(s):
1737237
PAR ID:
10382869
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Westenberg, Dave J.
Date Published:
Journal Name:
Journal of Microbiology & Biology Education
Volume:
23
Issue:
2
ISSN:
1935-7877
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Garrido, Daniel (Ed.)
    While research on the sourdough microbiome has primarily focused on lactic acid bacteria (LAB) and yeast, recent studies have found that acetic acid bacteria (AAB) are also common members. However, the ecology, genomic diversity, and functional contributions of AAB in sourdough remain unknown. To address this gap, we sequenced 29 AAB genomes, including three that represent putatively novel species, from a collection of over 500 sourdough starters surveyed globally from community scientists. We found variations in metabolic traits related to carbohydrate utilization, nitrogen metabolism, and alcohol production, as well as in genes related to mobile elements and defense mechanisms. Sourdough AAB genomes did not cluster when compared to AAB isolated from other environments, although a subset of gene functions was enriched in sourdough isolates. The lack of a sourdough-specific genomic cluster may reflect the nomadic lifestyle of AAB. To assess the consequences of AAB on the emergent function of sourdough starter microbiomes, we constructed synthetic starter microbiomes, varying only the AAB strain included. All AAB strains increased the acidification of synthetic sourdough starters relative to yeast and LAB by 18.5% on average. Different strains of AAB had distinct effects on the profile of synthetic starter volatiles. Taken together, our results begin to define the ways in which AAB shape emergent properties of sourdough and suggest that differences in gene content resulting from intraspecies diversification can have community-wide consequences on emergent function. 
    more » « less
  2. null (Ed.)
    Sourdough bread is an ancient fermented food that has sustained humans around the world for thousands of years. It is made from a sourdough ‘starter culture’ which is maintained, portioned, and shared among bread bakers around the world. The starter culture contains a community of microbes made up of yeasts and bacteria, which ferment the carbohydrates in flour and produce the carbon dioxide gas that makes the bread dough rise before baking. The different acids and enzymes produced by the microbial culture affect the bread’s flavor, texture and shelf life. However, for such a dependable staple, sourdough bread cultures and the mixture of microbes they contain have scarcely been characterized. Previous studies have looked at the composition of starter cultures from regions within Europe. But there has never been a comprehensive study of how the microbial diversity of sourdough starters varies across and between continents. To investigate this, Landis, Oliverio et al. used genetic sequencing to characterize the microbial communities of sourdough starters from the homes of 500 bread bakers in North America, Europe and Australasia. Bread makers often think their bread’s unique qualities are due to the local environment of where the sourdough starter was made. However, Landis, Oliverio et al. found that geographical location did not correlate with the diversity of the starter cultures studied. The data revealed that a group of microbes called acetic acid bacteria, which had been overlooked in past research, were relatively common in starter cultures. Moreover, starters with a greater abundance of this group of bacteria produced bread with a strong vinegar aroma and caused dough to rise at a slower rate. This research demonstrates which species of bacteria and yeast are most commonly found in sourdough starters, and suggests geographical location has little influence on the microbial diversity of these cultures. Instead, the diversity of microbes likely depends more on how the starter culture was made and how it is maintained over time. 
    more » « less
  3. High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the P-value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions. 
    more » « less
  4. Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryotic organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP). 
    more » « less
  5. High-throughput sequencing (HTS) is a modern DNA sequencing technology used to rapidly read thousands of genomic fragments from microorganisms given a sample. The large amount of data produced by this process makes deep learning, whose performance often scales with dataset size, a suitable fit for processing HTS samples. While deep learning models have utilized sets of DNA sequences to make informed predictions, to our knowledge, there are no models in the current literature capable of generating synthetic HTS samples, a tool which could enable experimenters to predict HTS samples given some environmental parameters. Furthermore, the unordered nature of HTS samples poses a challenge to nearly all deep learning architectures because they have an inherent dependence on input order. To address this gap in the literature, we introduce DNA Generative Adversarial Set Transformer (DNAGAST), the first model capable of generating synthetic HTS samples.We qualitatively and quantitatively demonstrate DNAGAST’s ability to produce realistic synthetic samples and explore various methods to mitigate mode-collapse. Additionally, we propose novel quantitative diversity metrics to measure the effects of mode-collapse for unstructured set-based data. 
    more » « less