skip to main content

Title: Network-Based Prediction of Novel CRISPR-Associated Genes in Metagenomes
ABSTRACT A diversity of clustered regularly interspaced short palindromic repeat (CRISPR)-Cas systems provide adaptive immunity to bacteria and archaea through recording “memories” of past viral infections. Recently, many novel CRISPR-associated proteins have been discovered via computational studies, but those studies relied on biased and incomplete databases of assembled genomes. We avoided these biases and applied a network theory approach to search for novel CRISPR-associated genes by leveraging subtle ecological cooccurrence patterns identified from environmental metagenomes. We validated our method using existing annotations and discovered 32 novel CRISPR-associated gene families. These genes span a range of putative functions, with many potentially regulating the response to infection. IMPORTANCE Every branch on the tree of life, including microbial life, faces the threat of viral pathogens. Over the course of billions of years of coevolution, prokaryotes have evolved a great diversity of strategies to defend against viral infections. One of these is the CRISPR adaptive immune system, which allows microbes to “remember” past infections in order to better fight them in the future. There has been much interest among molecular biologists in CRISPR immunity because this system can be repurposed as a tool for precise genome editing. Recently, a number of comparative genomics approaches have been used to detect novel CRISPR-associated genes in databases of genomes with great success, potentially leading to the development of new genome-editing tools. Here, we developed novel methods to search for these distinct classes of genes directly in environmental samples (“metagenomes”), thus capturing a more complete picture of the natural diversity of CRISPR-associated genes.  more » « less
Award ID(s):
Author(s) / Creator(s):
Chia, Nicholas
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Anti-CRISPR (Acr) loci/operons encode Acr proteins and Acr-associated (Aca) proteins. Forty-five Acr families have been experimentally characterized inhibiting seven subtypes of CRISPR-Cas systems. We have developed a bioinformatics pipeline to identify genomic loci containing Acr homologs and/or Aca homologs by combining three computational approaches: homology, guilt-by-association, and self-targeting spacers. Homology search found thousands of Acr homologs in bacterial and viral genomes, but most are homologous to AcrIIA7 and AcrIIA9. Investigating the gene neighborhood of these Acr homologs revealed that only a small percentage (23.0% in bacteria and 8.2% in viruses) of them have neighboring Aca homologs and thus form Acr-Aca operons. Surprisingly, although a self-targeting spacer is a strong indicator of the presence of Acr genes in a genome, a large percentage of Acr-Aca loci are found in bacterial genomes without self-targeting spacers or even without complete CRISPR-Cas systems. Additionally, for Acr homologs from genomes with self-targeting spacers, homology-based Acr family assignments do not always agree with the self-targeting CRISPR-Cas subtypes. Last, by investigating Acr genomic loci coexisting with self-targeting spacers in the same genomes, five known subtypes (I-C, I-E, I-F, II-A, and II-C) and five new subtypes (I-B, III-A, III-B, IV-A, and V-U4) of Acrs were inferred. Based on these findings, we conclude that the discovery of new anti-CRISPRs should not be restricted to genomes with self-targeting spacers and loci with Acr homologs. The evolutionary arms race of CRISPR-Cas systems and anti-CRISPR systems may have driven the adaptive and rapid gain and loss of these elements in closely related genomes. IMPORTANCE As a naturally occurring adaptive immune system, CRISPR-Cas (clustered regularly interspersed short palindromic repeats–CRISPR-associated genes) systems are widely found in bacteria and archaea to defend against viruses. Since 2013, the application of various bacterial CRISPR-Cas systems has become very popular due to their development into targeted and programmable genome engineering tools with the ability to edit almost any genome. As the natural off-switch of CRISPR-Cas systems, anti-CRISPRs have a great potential to serve as regulators of CRISPR-Cas tools and enable safer and more controllable genome editing. This study will help understand the relative usefulness of the three bioinformatics approaches for new Acr discovery, as well as guide the future development of new bioinformatics tools to facilitate anti-CRISPR research. The thousands of Acr homologs and hundreds of new anti-CRISPR loci identified in this study will be a valuable data resource for genome engineers to search for new CRISPR-Cas regulators. 
    more » « less
  2. Koomey, Michael (Ed.)
    ABSTRACT Elizabethkingia anophelis is an emerging global multidrug-resistant opportunistic pathogen. We assessed the diversity among 13 complete genomes and 23 draft genomes of E. anophelis strains derived from various environmental settings and human infections from different geographic regions around the world from 1950s to the present. Putative integrative and conjugative elements (ICEs) were identified in 31/36 (86.1%) strains in the study. A total of 52 putative ICEs (including eight degenerated elements lacking integrases) were identified and categorized into three types based on the architecture of the conjugation module and the phylogeny of the relaxase, coupling protein, TraG, and TraJ protein sequences. The type II and III ICEs were found to integrate adjacent to tRNA genes, while type I ICEs integrate into intergenic regions or into a gene. The ICEs carry various cargo genes, including transcription regulator genes and genes conferring antibiotic resistance. The adaptive immune CRISPR-Cas system was found in nine strains, including five strains in which CRISPR-Cas machinery and ICEs coexist at different locations on the same chromosome. One ICE-derived spacer was present in the CRISPR locus in one strain. ICE distribution in the strains showed no geographic or temporal patterns. The ICEs in E. anophelis differ in architecture and sequence from CTnDOT, a well-studied ICE prevalent in Bacteroides spp. The categorization of ICEs will facilitate further investigations of the impact of ICE on virulence, genome epidemiology, and adaptive genomics of E. anophelis . IMPORTANCE Elizabethkingia anophelis is an opportunistic human pathogen, and the genetic diversity between strains from around the world becomes apparent as more genomes are sequenced. Genome comparison identified three types of putative ICEs in 31 of 36 strains. The diversity of ICEs suggests that they had different origins. One of the ICEs was discovered previously from a large E. anophelis outbreak in Wisconsin in the United States; this ICE has integrated into the mutY gene of the outbreak strain, creating a mutator phenotype. Similar to ICEs found in many bacterial species, ICEs in E. anophelis carry various cargo genes that enable recipients to resist antibiotics and adapt to various ecological niches. The adaptive immune CRISPR-Cas system is present in nine of 36 strains. An ICE-derived spacer was found in the CRISPR locus in a strain that has no ICE, suggesting a past encounter and effective defense against ICE. 
    more » « less
  3. null (Ed.)
    Abstract CRISPR–Cas is an anti-viral mechanism of prokaryotes that has been widely adopted for genome editing. To make CRISPR–Cas genome editing more controllable and safer to use, anti-CRISPR proteins have been recently exploited to prevent excessive/prolonged Cas nuclease cleavage. Anti-CRISPR (Acr) proteins are encoded by (pro)phages/(pro)viruses, and have the ability to inhibit their host's CRISPR–Cas systems. We have built an online database AcrDB ( by scanning ∼19 000 genomes of prokaryotes and viruses with AcrFinder, a recently developed Acr-Aca (Acr-associated regulator) operon prediction program. Proteins in Acr-Aca operons were further processed by two machine learning-based programs (AcRanker and PaCRISPR) to obtain numerical scores/ranks. Compared to other anti-CRISPR databases, AcrDB has the following unique features: (i) It is a genome-scale database with the largest collection of data (39 799 Acr-Aca operons containing Aca or Acr homologs); (ii) It offers a user-friendly web interface with various functions for browsing, graphically viewing, searching, and batch downloading Acr-Aca operons; (iii) It focuses on the genomic context of Acr and Aca candidates instead of individual Acr protein family and (iv) It collects data with three independent programs each having a unique data mining algorithm for cross validation. AcrDB will be a valuable resource to the anti-CRISPR research community. 
    more » « less
  4. Taylor, John W. (Ed.)
    ABSTRACT Mycoviruses are widespread and purportedly common throughout the fungal kingdom, although most are known from hosts in the two most recently diverged phyla, Ascomycota and Basidiomycota, together called Dikarya. To augment our knowledge of mycovirus prevalence and diversity in underexplored fungi, we conducted a large-scale survey of fungi in the earlier-diverging lineages, using both culture-based and transcriptome-mining approaches to search for RNA viruses. In total, 21.6% of 333 isolates were positive for RNA mycoviruses. This is a greater proportion than expected based on previous taxonomically broad mycovirus surveys and is suggestive of a strong phylogenetic component to mycoviral infection. Our newly found viral sequences are diverse, composed of double-stranded RNA, positive-sense single-stranded RNA (ssRNA), and negative-sense ssRNA genomes and include novel lineages lacking representation in the public databases. These identified viruses could be classified into 2 orders, 5 families, and 5 genera; however, half of the viruses remain taxonomically unassigned. Further, we identified a lineage of virus-like sequences in the genomes of members of Phycomycetaceae and Mortierellales that appear to be novel genes derived from integration of a viral RNA-dependent RNA polymerase gene. The two screening methods largely agreed in their detection of viruses; thus, we suggest that the culture-based assay is a cost-effective means to quickly assess whether a laboratory culture is virally infected. This study used culture collections and publicly available transcriptomes to demonstrate that mycoviruses are abundant in laboratory cultures of early-diverging fungal lineages. The function and diversity of mycoviruses found here will help guide future studies into mycovirus origins and ecological functions. IMPORTANCE Viruses are key drivers of evolution and ecosystem function and are increasingly recognized as symbionts of fungi. Fungi in early-diverging lineages are widespread, ecologically important, and comprise the majority of the phylogenetic diversity of the kingdom. Viruses infecting early-diverging lineages of fungi have been almost entirely unstudied. In this study, we screened fungi for viruses by two alternative approaches: a classic culture-based method and by transcriptome-mining. The results of our large-scale survey demonstrate that early-diverging lineages have higher infection rates than have been previously reported in other fungal taxa and that laboratory strains worldwide are host to infections, the implications of which are unknown. The function and diversity of mycoviruses found in these basal fungal lineages will help guide future studies into mycovirus origins and their evolutionary ramifications and ecological impacts. 
    more » « less
  5. ABSTRACT Bacteria of the phylum Verrucomicrobia are prevalent and are particularly common in soil and freshwater environments. Their cosmopolitan distribution and reported capacity for polysaccharide degradation suggests members of Verrucomicrobia are important contributors to carbon cycling across Earth’s ecosystems. Despite their prevalence, the Verrucomicrobia are underrepresented in isolate collections and genome databases; consequently, their ecophysiological roles may not be fully realized. Here, we expand genomic sampling of the Verrucomicrobia phylum by describing a novel genus, “ Candidatus Marcellius,” belonging to the order Opitutales . “ Ca. Marcellius” was recovered from a shale-derived produced fluid metagenome collected 313 days after hydraulic fracturing, the deepest environment from which a member of the Verrucomicrobia has been recovered to date. We uncover genomic attributes that may explain the capacity of this organism to inhabit a shale gas well, including the potential for utilization of organic polymers common in hydraulic fracturing fluids, nitrogen fixation, adaptation to high salinities, and adaptive immunity via CRISPR-Cas. To illuminate the phylogenetic and environmental distribution of these metabolic and adaptive traits across the Verrucomicrobia phylum, we performed a comparative genomic analysis of 31 publicly available, nearly complete Verrucomicrobia genomes. Our genomic findings extend the environmental distribution of the Verrucomicrobia 2.3 kilometers into the terrestrial subsurface. Moreover, we reveal traits widely encoded across members of the Verrucomicrobia , including the capacity to degrade hemicellulose and to adapt to physical and biological environmental perturbations, thereby contributing to the expansive habitat range reported for this phylum. IMPORTANCE The Verrucomicrobia phylum of bacteria is widespread in many different ecosystems; however, its role in microbial communities remains poorly understood. Verrucomicrobia are often low-abundance community members, yet previous research suggests they play a major role in organic carbon degradation. While Verrucomicrobia remain poorly represented in culture collections, numerous genomes have been reconstructed from metagenomic data sets in recent years. The study of genomes from across the phylum allows for an extensive assessment of their potential ecosystem roles. The significance of this work is (i) the recovery of a novel genus of Verrucomicrobia from 2.3 km in the subsurface with the ability to withstand the extreme conditions that characterize this environment, and (ii) the most extensive assessment of ecophysiological traits encoded by Verrucomicrobia genomes to date. We show that members of this phylum are specialist organic polymer degraders that can withstand a wider range of environmental conditions than previously thought. 
    more » « less