skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Advancing Source Tracking: Systematic Review and Source-Specific Genome Database Curation of Fecally Shed Prokaryotes
Advancements within fecal source tracking (FST) studies are complicated by a lack of knowledge regarding the genetic content and distribution of fecally shed microbial populations. To address this gap, we performed a systematic literature review and curated a large collection of genomes (n = 26,018) representing fecally shed prokaryotic species across broad and narrow source categories commonly implicated in FST studies of recreational waters (i.e., cats, dogs, cows, seagulls, chickens, pigs, birds, ruminants, human feces, and wastewater). We find that across these sources the total number of prokaryotic genomes recovered from materials meeting our initial inclusion criteria varied substantially across fecal sources: from none in seagulls to 9,085 in pigs. We examined genome sequences recovered from these metagenomic and isolation-based studies extensively via comparative genomic approaches to characterize trends across source categories and produce a finalized genome database for each source category which is available online (n = 12,730). On average, 81% of the genomes representing species-level populations occur only within a single source. Using fecal slurries to test the performance of each source database, we report read capture rates that vary with fecal source alpha diversity and database size. We expect this resource to be useful to FST-related objectives, One Health research, and sanitation efforts globally.  more » « less
Award ID(s):
2136146
PAR ID:
10639020
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
ACS Publications
Date Published:
Journal Name:
Environmental Science & Technology Letters
Volume:
11
Issue:
9
ISSN:
2328-8930
Page Range / eLocation ID:
931 to 939
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jouline, Igor B (Ed.)
    ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings. 
    more » « less
  2. Background: With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is para- mount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ, however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes. Results: In this study, we introduce the Viral Eukaryotic Bacterial Archaeal (VEBA) open- source software suite developed to recover genomes from all domains. To our knowl- edge, VEBA is the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes. VEBA implements a novel iterative binning procedure and hybrid sample-specific/ multi-sample framework that yields more genomes than any existing methodology alone. VEBA includes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classifi- cation. VEBA also provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally, VEBA is the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments. VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives. Conclusions: The VEBA software suite allows for the in silico recovery of microorgan- isms from all domains of life by integrating cutting edge algorithms in novel ways. VEBA fully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions of VEBA to the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the frexibility to perform specific analytical tasks. VEBA allos for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets. 
    more » « less
  3. Few studies have addressed viral diversity in lemurs despite their unique evolutionary history on the island of Madagascar and high risk of extinction. Further, while a large number of studies on animal viromes focus on fecal samples, understanding viral diversity across multiple sample types and seasons can reveal complex viral community structures within and across species. Groups of captive lemurs at the Duke Lemur Center (Durham, NC, USA), a conservation and research center, provide an opportunity to build foundational knowledge on lemur-associated viromes. We sampled individuals from seven lemur species, i.e., collared lemur (Eulemur collaris), crowned lemur (Eulemur coronatus), blue-eyed black lemur (Eulemur flavifrons), ring-tailed lemur (Lemur catta), Coquerel’s sifaka (Propithecus coquereli), black-and-white ruffed lemur (Varecia variegata variegata), and red ruffed lemur (Varecia rubra), across two lemur families (Lemuridae, Indriidae). Fecal, blood, and saliva samples were collected from Coquerel’s sifaka and black-and-white ruffed lemur individuals across two sampling seasons to diversify virome biogeography and temporal sampling. Using viral metagenomic workflows, the complete genomes of anelloviruses (n = 4), cressdnaviruses (n = 47), caudoviruses (n = 15), inoviruses (n = 34), and microviruses (n = 537) were determined from lemur blood, feces, and saliva. Many virus genomes, especially bacteriophages, identified in this study were present across multiple lemur species. Overall, the work presented here uses a viral metagenomics approach to investigate viral communities inhabiting the blood, oral cavity, and feces of healthy captive lemurs. 
    more » « less
  4. Novembre, J (Ed.)
    Abstract In a genetically admixed population, admixed individuals possess genealogical and genetic ancestry from multiple source groups. Under a mechanistic model of admixture, we study the number of distinct ancestors from the source populations that the admixture represents. Combining a mechanistic admixture model with a recombination model that describes the probability that a genealogical ancestor is a genetic ancestor, for a member of a genetically admixed population, we count genetic ancestors from the source populations—those genealogical ancestors from the source populations who contribute to the genome of the modern admixed individual. We compare patterns in the numbers of genealogical and genetic ancestors across the generations. To illustrate the enumeration of genetic ancestors from source populations in an admixed group, we apply the model to the African-American population, extending recent results on the numbers of African and European genealogical ancestors that contribute to the pedigree of an African-American chosen at random, so that we also evaluate the numbers of African and European genetic ancestors who contribute to random African-American genomes. The model suggests that the autosomal genome of a random African-American born in the interval 1960–1965 contains genetic contributions from a mean of 162 African (standard deviation 47, interquartile range 127–192) and 32 European ancestors (standard deviation 14, interquartile range 21–43). The enumeration of genetic ancestors can potentially be performed in other diploid species in which admixture and recombination models can be specified. 
    more » « less
  5. Betancourt, Andrea (Ed.)
    Abstract Local adaptation can lead to elevated genetic differentiation at the targeted genetic variant and nearby sites. Selective sweeps come in different forms, and depending on the initial and final frequencies of a favored variant, very different patterns of genetic variation may be produced. If local selection favors an existing variant that had already recombined onto multiple genetic backgrounds, then the width of elevated genetic differentiation (high FST) may be too narrow to detect using a typical windowed genome scan, even if the targeted variant becomes highly differentiated. We, therefore, used a simulation approach to investigate the power of SNP-level FST (specifically, the maximum SNP FST value within a window, or FST_MaxSNP) to detect diverse scenarios of local adaptation, and compared it against whole-window FST and the Comparative Haplotype Identity statistic. We found that FST_MaxSNP had superior power to detect complete or mostly complete soft sweeps, but lesser power than full-window statistics to detect partial hard sweeps. Nonetheless, the power of FST_MaxSNP depended highly on sample size, and confident outliers depend on robust precautions and quality control. To investigate the relative enrichment of FST_MaxSNP outliers from real data, we applied the two FST statistics to a panel of Drosophila melanogaster populations. We found that FST_MaxSNP had a genome-wide enrichment of outliers compared with demographic expectations, and though it yielded a lesser enrichment than window FST, it detected mostly unique outlier genes and functional categories. Our results suggest that FST_MaxSNP is highly complementary to typical window-based approaches for detecting local adaptation, and merits inclusion in future genome scans and methodologies. 
    more » « less