skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CoverM: read alignment statistics for metagenomics
Abstract SummaryGenome-centric analysis of metagenomic samples is a powerful method for understanding the function of microbial communities. Calculating read coverage is a central part of analysis, enabling differential coverage binning for recovery of genomes and estimation of microbial community composition. Coverage is determined by processing read alignments to reference sequences of either contigs or genomes. Per-reference coverage is typically calculated in an ad-hoc manner, with each software package providing its own implementation and specific definition of coverage. Here we present a unified software package CoverM which calculates several coverage statistics for contigs and genomes in an ergonomic and flexible manner. It uses “Mosdepth arrays” for computational efficiency and avoids unnecessary I/O overhead by calculating coverage statistics from streamed read alignment results. Availability and implementationCoverM is free software available at https://github.com/wwood/coverm. CoverM is implemented in Rust, with Python (https://github.com/apcamargo/pycoverm) and Julia (https://github.com/JuliaBinaryWrappers/CoverM_jll.jl) interfaces.  more » « less
Award ID(s):
2022070
PAR ID:
10593466
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Alkan, Can
Publisher / Repository:
Oxford Academic
Date Published:
Journal Name:
Bioinformatics
Volume:
41
Issue:
4
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundExploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. ResultsWe present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. ConclusionsBinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted athttps://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data. 
    more » « less
  2. Ouangraoua, Aida (Ed.)
    Abstract SummaryTRACK is a user-friendly Snakemake workflow designed to streamline the discovery and comparison of tandem repeats (TRs) across species. TRACK facilitates the cataloging and filtering of TRs based on reference genomes or T2T transcripts, and applies reciprocal LiftOver and sequence alignment methods to identify putative homologous TRs between species. For further analyses, TRACK can be used to genotype TRs and subsequently estimate and plot basic population genetic statistics. By incorporating key functionalities within an integrated workflow, TRACK enhances TR analysis accessibility and reproducibility, while offering flexibility for the user. Availability and implementationThe TRACK toolkit with step-by-step tutorial is freely available at https://github.com/caroladam/track. 
    more » « less
  3. Alkan, Can (Ed.)
    Abstract SummaryWith the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching. We have developed wgatools, a cross-platform, ultrafast toolkit that supports a range of whole-genome alignment formats, offering practical tools for conversion, processing, evaluation, and visualization of alignments, thereby facilitating population-level genome analysis and advancing functional and evolutionary genomics. Availability and implementationwgatools supports diverse formats and can process, filter, and statistically evaluate alignments, perform alignment-based variant calling, and visualize alignments both locally and genome-wide. Built with Rust for efficiency and safe memory usage, it ensures fast performance and can handle large datasets consisting of hundreds of genomes. wgatools is published as free software under the MIT open-source license, and its source code is freely available at https://github.com/wjwei-handsome/wgatools and https://zenodo.org/records/14882797. 
    more » « less
  4. Fiston-Lavier, Anna-Sophie (Ed.)
    Abstract MotivationHaplotagging is a method for linked-read sequencing, which leverages the cost-effectiveness and throughput of short-read sequencing while retaining part of the long-range haplotype information captured by long-read sequencing. Despite its utility and advantages over similar methods, existing linked-read analytical pipelines are incompatible with haplotagging data. ResultsWe describe Harpy, a modular and user-friendly software pipeline for processing all stages of haplotagged linked-read data, from raw sequence data to phased genotypes and structural variant detection. Availability and implementationhttps://github.com/pdimens/harpy. 
    more » « less
  5. Abstract Background:Despite the recent surge of viral metagenomic studies, it remains a significant challenge to recover complete virus genomes from metagenomic data. The majority of viral contigs generated fromde novoassembly programs are highly fragmented, presenting significant challenges to downstream analysis and inference.Methods:We have developed Virseqimprover, a computational pipeline that can extend assembled contigs to complete or nearly complete genomes while maintaining extension quality. Virseqimprover first examines whether there is any chimeric sequence based on read coverage, breaks the sequence into segments if there is, then extends the longest segment with uniform coverage, and repeats these procedures until the sequence cannot be extended. Finally, Virseqimprover annotates the gene content of the resulting sequence.Conclusion:Virseqimprover has good performance on correcting and extending viral contigs to their full lengths, hence can be a useful tool to improve the completeness and minimize the assembly errors of viral contigs. Both a web server and a conda package for Virseqimprover are provided to the research community free of charge. 
    more » « less