skip to main content

Title: Hidden Diversity within Common Protozoan Parasites as Revealed by a Novel Genomotyping Scheme
ABSTRACT Giardia duodenalis (syn. Giardia lamblia , Giardia intestinalis ) is the causative agent of giardiasis, one of the most common diarrheal infections in humans. Evolutionary relationships among G. duodenalis genotypes (or subtypes) of assemblage B, one of two genetic assemblages causing the majority of human infections, remain unclear due to poor phylogenetic resolution of current typing methods. In this study, we devised a methodology to identify new markers for a streamlined multilocus sequence typing (MLST) scheme based on comparisons of all core genes against the phylogeny of whole-genome sequences (WGS). Our analysis identified three markers with resolution comparable to that of WGS data. Using newly designed PCR primers for our novel MLST loci, we typed an additional 68 strains of assemblage B. Analyses of these strains and previously determined genome sequences showed that genomes of this assemblage can be assigned to 16 clonal complexes, each with unique gene content that is apparently tuned to differential virulence and ecology. Obtaining new genomes of Giardia spp. and other eukaryotic microbial pathogens remains challenging due to difficulties in culturing the parasites in the laboratory. Hence, the methods described here are expected to be widely applicable to other pathogens of interest and advance our understanding of their ecology and evolution. IMPORTANCE Giardia duodenalis assemblage B is a major waterborne pathogen and the most commonly identified genotype causing human giardiasis worldwide. The lack of morphological characters for classification requires the use of molecular techniques for strain differentiation; however, the absence of scalable and affordable next-generation sequencing (NGS)-based typing methods has prevented meaningful advancements in high-resolution molecular typing for further understanding of the evolution and epidemiology of assemblage B. Prior studies have reported high sequence diversity but low phylogenetic resolution at standard loci in assemblage B, highlighting the necessity of identifying new markers for accurate and robust molecular typing. Data from comparative analyses of available genomes in this study identified three loci that together form a novel high-resolution typing scheme with high concordance to whole-genome-based phylogenomics and which should aid in future public health endeavors related to this parasite. In addition, data from newly characterized strains suggest evidence of biogeographic and ecologic endemism.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Björkroth, Johanna
Date Published:
Journal Name:
Applied and Environmental Microbiology
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Wolbachia are widespread intracellular bacteria that mediate many important biological processes in arthropod species. In this study, we identified 210 conserved single-copy genes in 33 genome-sequenced Wolbachia strains in the A, B, C, D, E and F supergroups. Phylogenomic analyses with these core genes indicate that all 33 Wolbachia strains maintain the supergroup relationship, which was classified previously based on the multilocus sequence typing (MLST) genes. Using an interclade recombination screening method, 14 inter-supergroup recombination events were discovered in six genes (2.9%) among 210 single copy orthologs. This finding suggests a relatively low frequency of intergroup recombination. Interestingly, they have occurred not only between A and B supergroups (9 events), but also between A and E supergroups (5 events). Maintenance of such transfers suggests possible roles in Wolbachia infection related functions. Comparisons of strain divergence using the five genes of the MLST system show a high correlation (Pearson correlation coefficient r = 0.98) between MLST and whole genome divergences, indicating that MLST is a reliable method for identifying related strains when whole genome data are not available. The phylogenomic analysis and the identified core gene set in our study will serve as a valuable foundation for strain identification and the investigation of recombination and genome evolution in Wolbachia. 
    more » « less
  2. Alanio, Alexandre (Ed.)
    ABSTRACT <p>Modern taxonomic classification is often based on phylogenetic analyses of a few molecular markers, although single-gene studies are still common. Here, we leverage genome-scale molecular phylogenetics (phylogenomics) of species and populations to reconstruct evolutionary relationships in a dense data set of 710 fungal genomes from the biomedically and technologically important genus<italic>Aspergillus</italic>. To do so, we generated a novel set of 1,362 high-quality molecular markers specific for<italic>Aspergillus</italic>and provided profile Hidden Markov Models for each, facilitating their use by others. Examining the resulting phylogeny helped resolve ongoing taxonomic controversies, identified new ones, and revealed extensive strain misidentification (7.59% of strains were previously misidentified), underscoring the importance of population-level sampling in species classification. These findings were corroborated using the current standard, taxonomically informative loci. These findings suggest that phylogenomics of species and populations can facilitate accurate taxonomic classifications and reconstructions of the Tree of Life.</p><sec><title>IMPORTANCE

    Identification of fungal species relies on the use of molecular markers. Advances in genomic technologies have made it possible to sequence the genome of any fungal strain, making it possible to use genomic data for the accurate assignment of strains to fungal species (and for the discovery of new ones). We examined the usefulness and current limitations of genomic data using a large data set of 710 publicly available genomes from multiple strains and species of the biomedically, agriculturally, and industrially important genusAspergillus. Our evolutionary genomic analyses revealed that nearly 8% of publicly availableAspergillusgenomes are misidentified. Our work highlights the usefulness of genomic data for fungal systematic biology and suggests that systematic genome sequencing of multiple strains, including reference strains (e.g., type strains), of fungal species will be required to reduce misidentification errors in public databases.

    more » « less
  3. Canine distemper virus (CDV) is a multi-host pathogen with variable clinical outcomes of infection across and within species. We used whole-genome sequencing (WGS) to search for viral markers correlated with clinical distemper in African lions. To identify candidate markers, we first documented single-nucleotide polymorphisms (SNPs) differentiating CDV strains associated with different clinical outcomes in lions in East Africa. We then conducted evolutionary analyses on WGS from all global CDV lineages to identify loci subject to selection. SNPs that both differentiated East African strains and were under selection were mapped to a phylogenetic tree representing global CDV diversity to assess if candidate markers correlated with documented outbreaks of clinical distemper in lions (n = 3). Of 54 SNPs differentiating East African strains, ten were under positive or episodic diversifying selection and 20 occurred in the clinical strain despite strong purifying selection at those loci. Candidate markers were in functional domains of the RNP complex (n = 19), the matrix protein (n = 4), on CDV glycoproteins (n = 5), and on the V protein (n = 1). We found mutations at two loci in common between sequences from three CDV outbreaks of clinical distemper in African lions; one in the signaling lymphocytic activation molecule receptor (SLAM)-binding region of the hemagglutinin protein and another in the catalytic center of phosphodiester bond formation on the large polymerase protein. These results suggest convergent evolution at these sites may have a functional role in clinical distemper outbreaks in African lions and uncover potential novel barriers to pathogenicity in this species. 
    more » « less
  4. null (Ed.)
    The co-existence of rats and humans in urban environments has long been a cause for concern regarding human health because of the potential for rats to harbor and transmit disease-causing pathogens. Here, we analyze whole-genome sequence (WGS) data from 41 Escherichia coli isolates collected from rat feces from 12 locations within the city of Chicago, IL, United States to determine the potential for rats to serve as a reservoir for pathogenic E. coli and describe its population structure. We identified 25 different serotypes, none of which were isolated from strains containing significant virulence markers indicating the presence of Shiga toxin-producing and other disease-causing E . coli . Nor did the E. coli isolates harbor any particularly rare stress tolerant or antimicrobial resistance genes. We then compared the isolates against a public database of approximately 100,000 E. coli and Shigella isolates of primarily food, food facility, or clinical origin. We found that only one isolate was genetically similar to genome sequences in the database. Phylogenetic analyses showed that isolates cluster by serotype, and there was little geographic structure (e.g., isolation by distance) among isolates. However, a greater signal of isolation by distance was observed when we compared genetic and geographic distances among isolates of the same serotype. This suggests that E. coli serotypes are independent lineages and recombination between serotypes is rare. 
    more » « less
  5. Yoshizawa, Kazunori (Ed.)
    Abstract The order Psocodea includes the two historically recognized groups Psocoptera (free-living bark lice) and Phthiraptera (parasitic lice) that were once considered separate orders. Psocodea is divided in three suborders: Trogiomorpha, Troctomorpha, and Psocomorpha, the latter being the largest within the free-living groups. Despite the increasing number of transcriptomes and whole genome sequence (WGS) data available for this group, the relationships among the six known infraorders within Psocomorpha remain unclear. Here, we evaluated the utility of a bait set designed specifically for parasitic lice belonging to suborder Troctomorpha to extract UCE loci from transcriptome and WGS data of 55 bark louse species and explored the phylogenetic relationships within Psocomorpha using these UCE loci markers. Taxon sampling was heavily focused on the families Lachesillidae and Elipsocidae, whose relationships have been problematic in prior phylogenetic studies. We successfully recovered a total of 2,622 UCE loci, with a 40% completeness matrix containing 2,081 UCE loci and an 80% completeness matrix containing 178 UCE loci. The average number of UCE loci recovered for the 55 species was 1,401. The WGS data sets produced a larger number of UCE loci (1,495) on average than the transcriptome data sets (972). Phylogenetic relationships reconstructed with Maximum Likelihood and coalescent-based analysis were concordant regarding the paraphyly of Lachesillidae and Elipsocidae. Branch support values were generally lower in analyses that used a fewer number of loci, even though they had higher matrix completeness. 
    more » « less