Reverse transcriptases (RTs) are found in different systems including group II introns, Diversity Generating Retroelements (DGRs), retrons, CRISPR-Cas systems, and Abortive Infection (Abi) systems in prokaryotes. Different classes of RTs can play different roles, such as template switching and mobility in group II introns, spacer acquisition in CRISPR-Cas systems, mutagenic retrohoming in DGRs, programmed cell suicide in Abi systems, and recently discovered phage defense in retrons. While some classes of RTs have been studied extensively, others remain to be characterized. There is a lack of computational tools for identifying and characterizing various classes of RTs. In this study, we built a tool (called myRT) for identification and classification of prokaryotic RTs. In addition, our tool provides information about the genomic neighborhood of each RT, providing potential functional clues. We applied our tool to predict RTs in all complete and draft bacterial genomes, and created a collection that can be used for exploration of putative RTs and their associated protein domains. Application of myRT to metagenomes showed that gut metagenomes encode proportionally more RTs related to DGRs, outnumbering retron-related RTs, as compared to the collection of reference genomes. MyRT is both available as a standalone software (https://github.com/mgtools/myRT) and also through more »
- Publication Date:
- NSF-PAR ID:
- 10364263
- Journal Name:
- Nucleic Acids Research
- Volume:
- 50
- Issue:
- 5
- Page Range or eLocation-ID:
- p. e29-e29
- ISSN:
- 0305-1048
- Publisher:
- Oxford University Press
- Sponsoring Org:
- National Science Foundation
More Like this
-
Chia, Nicholas (Ed.)ABSTRACT A diversity of clustered regularly interspaced short palindromic repeat (CRISPR)-Cas systems provide adaptive immunity to bacteria and archaea through recording “memories” of past viral infections. Recently, many novel CRISPR-associated proteins have been discovered via computational studies, but those studies relied on biased and incomplete databases of assembled genomes. We avoided these biases and applied a network theory approach to search for novel CRISPR-associated genes by leveraging subtle ecological cooccurrence patterns identified from environmental metagenomes. We validated our method using existing annotations and discovered 32 novel CRISPR-associated gene families. These genes span a range of putative functions, with many potentially regulating the response to infection. IMPORTANCE Every branch on the tree of life, including microbial life, faces the threat of viral pathogens. Over the course of billions of years of coevolution, prokaryotes have evolved a great diversity of strategies to defend against viral infections. One of these is the CRISPR adaptive immune system, which allows microbes to “remember” past infections in order to better fight them in the future. There has been much interest among molecular biologists in CRISPR immunity because this system can be repurposed as a tool for precise genome editing. Recently, a number of comparative genomics approachesmore »
-
Abstract There is an increasing interest in the clustered regularly interspaced short palindromic repeats CRISPR-associated protein (CRISPR-Cas) system to reveal potential virus–host dynamics. The universal and most conserved Cas protein,
cas1 is an ideal marker to elucidate CRISPR-Cas ecology. We constructed eight Hidden Markov Models (HMMs) and assembledcas1 directly from metagenomes by a targeted-gene assembler, Xander, to improve detection capacity and resolve the diverse CRISPR-Cas systems. The eight HMMs were first validated by recovering all 17cas1 subtypes from the simulated metagenome generated from 91 prokaryotic genomes across 11 phyla. We challenged the targeted method with 48 metagenomes from a tallgrass prairie in Central Oklahoma recovering 3394cas1 . Among those, 88 were near full length, 5 times more than in de-novo assemblies from the Oklahoma metagenomes. To validate the host assignment bycas1 , the targeted-assembledcas1 was mapped to the de-novo assembled contigs. All the phylum assignments of those mapped contigs were assigned independent of CRISPR-Cas genes on the same contigs and consistent with the host taxonomies predicted by the mappedcas1 . We then investigated whether 8 years of soil warming alteredcas1 prevalence within the communities. A shift in microbial abundances was observed during the year with the biggest temperature differential (mean 4.16 °C above ambient).cas1 prevalence increased and even in the phylamore » -
Abstract Background CRISPR-Cas (clustered regularly interspaced short palindromic repeats—CRISPR-associated proteins) systems are adaptive immune systems commonly found in prokaryotes that provide sequence-specific defense against invading mobile genetic elements (MGEs). The memory of these immunological encounters are stored in CRISPR arrays, where spacer sequences record the identity and history of past invaders. Analyzing such CRISPR arrays provide insights into the dynamics of CRISPR-Cas systems and the adaptation of their host bacteria to rapidly changing environments such as the human gut.
Results In this study, we utilized 601 publicly available
Bacteroides fragilis genome isolates from 12 healthy individuals, 6 of which include longitudinal observations, and 222 availableB. fragilis reference genomes to update the understanding ofB. fragilis CRISPR-Cas dynamics and their differential activities. Analysis of longitudinal genomic data showed that some CRISPR array structures remained relatively stable over time whereas others involved radical spacer acquisition during some periods, and diverse CRISPR arrays (associated with multiple isolates) co-existed in the same individuals with some persisted over time. Furthermore, features of CRISPR adaptation, evolution, and microdynamics were highlighted through an analysis of host-MGE network, such as modules of multiple MGEs and hosts, reflecting complex interactions betweenB. fragilis and its invaders mediated through the CRISPR-Cas systems.Conclusions We made available of all annotated CRISPR-Casmore »
-
ABSTRACT Anti-CRISPR (Acr) loci/operons encode Acr proteins and Acr-associated (Aca) proteins. Forty-five Acr families have been experimentally characterized inhibiting seven subtypes of CRISPR-Cas systems. We have developed a bioinformatics pipeline to identify genomic loci containing Acr homologs and/or Aca homologs by combining three computational approaches: homology, guilt-by-association, and self-targeting spacers. Homology search found thousands of Acr homologs in bacterial and viral genomes, but most are homologous to AcrIIA7 and AcrIIA9. Investigating the gene neighborhood of these Acr homologs revealed that only a small percentage (23.0% in bacteria and 8.2% in viruses) of them have neighboring Aca homologs and thus form Acr-Aca operons. Surprisingly, although a self-targeting spacer is a strong indicator of the presence of Acr genes in a genome, a large percentage of Acr-Aca loci are found in bacterial genomes without self-targeting spacers or even without complete CRISPR-Cas systems. Additionally, for Acr homologs from genomes with self-targeting spacers, homology-based Acr family assignments do not always agree with the self-targeting CRISPR-Cas subtypes. Last, by investigating Acr genomic loci coexisting with self-targeting spacers in the same genomes, five known subtypes (I-C, I-E, I-F, II-A, and II-C) and five new subtypes (I-B, III-A, III-B, IV-A, and V-U4) of Acrs were inferred. Basedmore »
-
Abstract CRISPR–Cas is an anti-viral mechanism of prokaryotes that has been widely adopted for genome editing. To make CRISPR–Cas genome editing more controllable and safer to use, anti-CRISPR proteins have been recently exploited to prevent excessive/prolonged Cas nuclease cleavage. Anti-CRISPR (Acr) proteins are encoded by (pro)phages/(pro)viruses, and have the ability to inhibit their host's CRISPR–Cas systems. We have built an online database AcrDB (http://bcb.unl.edu/AcrDB) by scanning ∼19 000 genomes of prokaryotes and viruses with AcrFinder, a recently developed Acr-Aca (Acr-associated regulator) operon prediction program. Proteins in Acr-Aca operons were further processed by two machine learning-based programs (AcRanker and PaCRISPR) to obtain numerical scores/ranks. Compared to other anti-CRISPR databases, AcrDB has the following unique features: (i) It is a genome-scale database with the largest collection of data (39 799 Acr-Aca operons containing Aca or Acr homologs); (ii) It offers a user-friendly web interface with various functions for browsing, graphically viewing, searching, and batch downloading Acr-Aca operons; (iii) It focuses on the genomic context of Acr and Aca candidates instead of individual Acr protein family and (iv) It collects data with three independent programs each having a unique data mining algorithm for cross validation. AcrDB will be a valuable resource to themore »