skip to main content

Search for: All records

Creators/Authors contains: "Fuhrman, Jed A."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types.

    more » « less
  2. Cyanophages exert important top-down controls on their cyanobacteria hosts; however, concurrent analysis of both phage and host populations is needed to better assess phage–host interaction models. We analyzed picocyanobacteria Prochlorococcus and Synechococcus and T4-like cyanophage communities in Pacific Ocean surface waters using five years of monthly viral and cellular fraction metagenomes. Cyanophage communities contained thousands of mostly low-abundance (<2% relative abundance) species with varying temporal dynamics, categorized as seasonally recurring or non-seasonal and occurring persistently, occasionally, or sporadically (detected in ≥85%, 15-85%, or <15% of samples, respectively). Viromes contained mostly seasonal and persistent phages (~40% each), while cellular fraction metagenomes had mostly sporadic species (~50%), reflecting that these sample sets capture different steps of the infection cycle—virions from prior infections or within currently infected cells, respectively. Two groups of seasonal phages correlated to Synechococcus or Prochlorococcus were abundant in spring/summer or fall/winter, respectively. Cyanophages likely have a strong influence on the host community structure, as their communities explained up to 32% of host community variation. These results support how both seasonally recurrent and apparent stochastic processes, likely determined by host availability and different host-range strategies among phages, are critical to phage–host interactions and dynamics, consistent with both the Kill-the-Winner and the Bank models. 
    more » « less
  3. Abstract

    Free-living and particle-associated marine prokaryotes have physiological, genomic, and phylogenetic differences, yet factors influencing their temporal dynamics remain poorly constrained. In this study, we quantify the entire microbial community composition monthly over several years, including viruses, prokaryotes, phytoplankton, and total protists, from the San-Pedro Ocean Time-series using ribosomal RNA sequencing and viral metagenomics. Canonical analyses show that in addition to physicochemical factors, the double-stranded DNA viral community is the strongest factor predicting free-living prokaryotes, explaining 28% of variability, whereas the phytoplankton (via chloroplast 16S rRNA) community is strongest with particle-associated prokaryotes, explaining 31% of variability. Unexpectedly, protist community explains little variability. Our findings suggest that biotic interactions are significant determinants of the temporal dynamics of prokaryotes, and the relative importance of specific interactions varies depending on lifestyles. Also, warming influenced the prokaryotic community, which largely remained oligotrophic summer-like throughout 2014–15, with cyanobacterial populations shifting from cold-water ecotypes to warm-water ecotypes.

    more » « less
  4. Abstract

    The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at

    more » « less
  5. Bacteria are single-celled organisms that live out their lives at a microscopic scale. We can find bacteria everywhere we look for them, including inside of our own bodies. Bacteria are incredibly diverse and come in many shapes and sizes. They also vary widely in how they live and grow. Some bacteria grow very quickly and others grow slowly. We wanted to measure the growth of many different types of bacteria in the environment. Unfortunately, some species of bacteria are very difficult to grow in the laboratory. To get around this, we designed a method to predict how fast a type of bacteria can grow, just from its DNA. This way, if we have the DNA of a bacterial species, we can measure its growth even if we cannot get it to grow in our laboratory. 
    more » « less
  6. Abstract Motivation

    Phage–host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage–host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH.


    On the validation set, ContigNet achieves 72–85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60–70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts.

    Availability and implementation

    The source code of ContigNet and related datasets can be downloaded from

    more » « less
  7. Abstract

    Community dynamics are central in microbial ecology, yet we lack studies comparing diversity patterns among marine protists and prokaryotes over depth and multiple years. Here, we characterized microbes at the San-Pedro Ocean Time series (2005–2018), using SSU rRNA gene sequencing from two size fractions (0.2–1 and 1–80 μm), with a universal primer set that amplifies from both prokaryotes and eukaryotes, allowing direct comparisons of diversity patterns in a single set of analyses. The 16S + 18S rRNA gene composition in the small size fraction was mostly prokaryotic (>92%) as expected, but the large size fraction unexpectedly contained 46–93% prokaryotic 16S rRNA genes. Prokaryotes and protists showed opposite vertical diversity patterns; prokaryotic diversity peaked at mid-depth, protistan diversity at the surface. Temporal beta-diversity patterns indicated prokaryote communities were much more stable than protists. Although the prokaryotic communities changed monthly, the average community stayed remarkably steady over 14 years, showing high resilience. Additionally, particle-associated prokaryotes were more diverse than smaller free-living ones, especially at deeper depths, contributed unexpectedly by abundant and diverse SAR11 clade II. Eukaryotic diversity was strongly correlated with the diversity of particle-associated prokaryotes but not free-living ones, reflecting that physical associations result in the strongest interactions, including symbioses, parasitism, and decomposer relationships.

    more » « less
  8. Rappe, Michael S. (Ed.)
    ABSTRACT Bacterial biodegradation is a significant contributor to remineralization of polycyclic aromatic hydrocarbons (PAHs)—toxic and recalcitrant components of crude oil as well as by-products of partial combustion chronically introduced into seawater via atmospheric deposition. The Deepwater Horizon oil spill demonstrated the speed at which a seed PAH-degrading community maintained by chronic inputs responds to acute pollution. We investigated the diversity and functional potential of a similar seed community in the chronically polluted Port of Los Angeles (POLA), using stable isotope probing with naphthalene, deep-sequenced metagenomes, and carbon incorporation rate measurements at the port and in two sites in the San Pedro Channel. We demonstrate the ability of the community of degraders at the POLA to incorporate carbon from naphthalene, leading to a quick shift in microbial community composition to be dominated by the normally rare Colwellia and Cycloclasticus . We show that metagenome-assembled genomes (MAGs) belonged to these naphthalene degraders by matching their 16S-rRNA gene with experimental stable isotope probing data. Surprisingly, we did not find a full PAH degradation pathway in those genomes, even when combining genes from the entire microbial community, leading us to hypothesize that promiscuous dehydrogenases replace canonical naphthalene degradation enzymes in this site. We compared metabolic pathways identified in 29 genomes whose abundance increased in the presence of naphthalene to generate genomic-based recommendations for future optimization of PAH bioremediation at the POLA, e.g., ammonium as opposed to urea, heme or hemoproteins as an iron source, and polar amino acids. IMPORTANCE Oil spills in the marine environment have a devastating effect on marine life and biogeochemical cycles through bioaccumulation of toxic hydrocarbons and oxygen depletion by hydrocarbon-degrading bacteria. Oil-degrading bacteria occur naturally in the ocean, especially where they are supported by chronic inputs of oil or other organic carbon sources, and have a significant role in degradation of oil spills. Polycyclic aromatic hydrocarbons are the most persistent and toxic component of crude oil. Therefore, the bacteria that can break those molecules down are of particular importance. We identified such bacteria at the Port of Los Angeles (POLA), one of the busiest ports worldwide, and characterized their metabolic capabilities. We propose chemical targets based on those analyses to stimulate the activity of these bacteria in case of an oil spill in the Port POLA. 
    more » « less
  9. Gilbert, Jack A. (Ed.)
    ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n  = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data. 
    more » « less
  10. null (Ed.)
    Maximal growth rate is a basic parameter of microbial lifestyle that varies over several orders of magnitude, with doubling times ranging from a matter of minutes to multiple days. Growth rates are typically measured using laboratory culture experiments. Yet, we lack sufficient understanding of the physiology of most microbes to design appropriate culture conditions for them, severely limiting our ability to assess the global diversity of microbial growth rates. Genomic estimators of maximal growth rate provide a practical solution to survey the distribution of microbial growth potential, regardless of cultivation status. We developed an improved maximal growth rate estimator and predicted maximal growth rates from over 200,000 genomes, metagenome-assembled genomes, and single-cell amplified genomes to survey growth potential across the range of prokaryotic diversity; extensions allow estimates from 16S rRNA sequences alone as well as weighted community estimates from metagenomes. We compared the growth rates of cultivated and uncultivated organisms to illustrate how culture collections are strongly biased toward organisms capable of rapid growth. Finally, we found that organisms naturally group into two growth classes and observed a bias in growth predictions for extremely slow-growing organisms. These observations ultimately led us to suggest evolutionary definitions of oligotrophy and copiotrophy based on the selective regime an organism occupies. We found that these growth classes are associated with distinct selective regimes and genomic functional potentials. 
    more » « less