skip to main content


Search for: All records

Creators/Authors contains: "Paez-Espino, David"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyze 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical, and gene neighborhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter. 
    more » « less
    Free, publicly-accessible full text available October 19, 2024
  2. Abstract

    Our current knowledge of host–virus interactions in biofilms is limited to computational predictions based on laboratory experiments with a small number of cultured bacteria. However, natural biofilms are diverse and chiefly composed of uncultured bacteria and archaea with no viral infection patterns and lifestyle predictions described to date. Herein, we predict the first DNA sequence-based host–virus interactions in a natural biofilm. Using single-cell genomics and metagenomics applied to a hot spring mat of the Cone Pool in Mono County, California, we provide insights into virus–host range, lifestyle and distribution across different mat layers. Thirty-four out of 130 single cells contained at least one viral contig (26%), which, together with the metagenome-assembled genomes, resulted in detection of 59 viruses linked to 34 host species. Analysis of single-cell amplification kinetics revealed a lack of active viral replication on the single-cell level. These findings were further supported by mapping metagenomic reads from different mat layers to the obtained host–virus pairs, which indicated a low copy number of viral genomes compared to their hosts. Lastly, the metagenomic data revealed high layer specificity of viruses, suggesting limited diffusion to other mat layers. Taken together, these observations indicate that in low mobility environments with high microbial abundance, lysogeny is the predominant viral lifestyle, in line with the previously proposed “Piggyback-the-Winner” theory.

     
    more » « less
  3. null (Ed.)
    Abstract The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes. 
    more » « less
  4. CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity.

     
    more » « less
  5. null (Ed.)