Abstract Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92–97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.
more »
« less
Reference-free structural variant detection in microbiomes via long-read co-assembly graphs
Abstract Motivation: The study of bacterial genome dynamics is vital for understanding the mechanisms underlying microbial adaptation, growth, and their impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing all metagenomic samples in a series (time or other metric) into a single co-assembly graph. The log fold change in graph coverage between successive samples is then calculated to call SVs that are thriving or declining. Results: We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, particularly as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux. Availability and implementation: rhea is open source and available at: https://github.com/treangenlab/rhea.
more »
« less
- PAR ID:
- 10518292
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 40
- Issue:
- Supplement_1
- ISSN:
- 1367-4803
- Format(s):
- Medium: X Size: p. i58-i67
- Size(s):
- p. i58-i67
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Motivation We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. Results In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. Availability and implementation Meltos is available at https://github.com/ih-lab/Meltos. Contact imh2003@med.cornell.edu Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
The occurrence and formation of genomic structural variants (SVs) is known to be influenced by the 3D chromatin architecture, but the extent and magnitude have been challenging to study. Here, we apply Hi-C to study chromatin organization before and after induction of chromothripsis in human cells. We use Hi-C to manually assemble the derivative chromosomes following the occurrence of massive complex rearrangements, which allows us to study the sources of SV formation and their consequences on gene regulation. We observe an action–reaction interplay whereby the 3D chromatin architecture directly impacts the location and formation of SVs. In turn, the SVs reshape the chromatin organization to alter the local topologies, replication timing, and gene regulation in cis . We show that SVs have a strong tendency to occur between similar chromatin compartments and replication timing regions. Moreover, we find that SVs frequently occur at 3D loop anchors, that SVs can cause a switch in chromatin compartments and replication timing, and that this is a major source of SV-mediated effects on nearby gene expression changes. Finally, we provide evidence for a general mechanistic bias of the 3D chromatin on SV occurrence using data from more than 2700 patient-derived cancer genomes.more » « less
-
Abstract Structural variants (SVs)—including duplications, deletions, and inversions of DNA—can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single‐nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well‐documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single‐nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever‐expanding SV compendium propelled by biotechnology advancements.more » « less
-
The Antarctic marine environment is a dynamic ecosystem where microorganisms play an important role in key biogeochemical cycles. Despite the role that microbes play in this ecosystem, little is known about the genetic and metabolic diversity of Antarctic marine microbes. In this study we leveraged DNA samples collected by the Palmer Long Term Ecological Research (LTER) project to sequence shotgun metagenomes of 48 key samples collected across the marine ecosystem of the western Antarctic Peninsula (wAP). We developed an in silico metagenomics pipeline (iMAGine) for processing metagenomic data and constructing metagenome-assembled genomes (MAGs), identifying a diverse genomic repertoire related to the carbon, sulfur, and nitrogen cycles. A novel analytical approach based on gene coverage was used to understand the differences in microbial community functions across depth and region. Our results showed that microbial community functions were partitioned based on depth. Bacterial members harbored diverse genes for carbohydrate transformation, indicating the availability of processes to convert complex carbons into simpler bioavailable forms. We generated 137 dereplicated MAGs giving us a new perspective on the role of prokaryotes in the coastal wAP. In particular, the presence of mixotrophic prokaryotes capable of autotrophic and heterotrophic lifestyles indicated a metabolically flexible community, which we hypothesize enables survival under rapidly changing conditions. Overall, the study identified key microbial community functions and created a valuable sequence library collection for future Antarctic genomics research.more » « less
An official website of the United States government
