Abstract MotivationMetagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. ResultsWe develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. Availability and implementationHiFine is available at https://github.com/dyxstat/HiFine. Supplementary informationSupplementary data are available at Bioinformatics online.
more »
« less
SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data
Abstract Summary Here, we introduce SNIKT, a command-line tool for sequence-independent visual confirmation and input-assisted removal of adapter contamination in whole-genome shotgun or metagenomic shotgun long-read sequencing DNA or RNA data. Availability and Implementation SNIKT is implemented in R and is compatible with Unix-like platforms. The source code, along with documentation, is freely available under an MIT license at https://github.com/piyuranjan/SNIKT. Supplementary information Supplementary data are available at Bioinformatics online.
more »
« less
- Award ID(s):
- 2030454
- PAR ID:
- 10374335
- Editor(s):
- Alkan, Can
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 38
- Issue:
- 15
- ISSN:
- 1367-4803
- Page Range / eLocation ID:
- 3830 to 3832
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
metaviralSPAdes: Assembly of Viruses From Metagenomic Data Abstract Motivation: Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth's virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies. Results: We describe a metaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked metaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models, and demonstrated that it improves on the state-of-the-art viral identification pipelines. Availability: metaviralSPAdes includes viralAssembly, viralVerify, and viralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/. Supplementary information: Supplementary data are available at Bioinformatics online.more » « less
-
HolistIC: leveraging Hi–C and whole genome shotgun sequencing for double minute chromosome discoveryAbstract MotivationDouble minute (DM) chromosomes are acentric extrachromosomal DNA artifacts that are frequently observed in the cells of numerous cancers. They are highly amplified and contain oncogenes and drug-resistance genes, making their presence a challenge for effective cancer treatment. Algorithmic discovery of DM can potentially improve bench-derived therapies for cancer treatment. A hindrance to this task is that DMs evolve, yielding circular chromatin that shares segments from progenitor DMs. This creates DMs with overlapping amplicon coordinates. Existing DM discovery algorithms use whole genome shotgun sequencing (WGS) in isolation, which can potentially incorrectly classify DMs that share overlapping coordinates. ResultsIn this study, we describe an algorithm called ‘HolistIC’ that can predict DMs in tumor genomes by integrating WGS and Hi–C sequencing data. The consolidation of these sources of information resolves ambiguity in DM amplicon prediction that exists in DM prediction with WGS data used in isolation. We implemented and tested our algorithm on the tandem Hi–C and WGS datasets of three cancer datasets and a simulated dataset. Results on the cancer datasets demonstrated HolistIC’s ability to predict DMs from Hi–C and WGS data in tandem. The results on the simulated data showed the HolistIC can accurately distinguish DMs that have overlapping amplicon coordinates, an advance over methods that predict extrachromosomal amplification using WGS data in isolation. Availability and implementationOur software, named ‘HolistIC’, is available at http://www.github.com/mhayes20/HolistIC. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract SummaryDifferential Expression Gene Explorer (DrEdGE) is a web-based tool that guides genomicists through easily creating interactive online data visualizations, which colleagues can query according to their own conditions to discover genes, samples or patterns of interest. We demonstrate DrEdGE’s features with three example websites generated from publicly available datasets—human neuronal tissue, mouse embryonic tissue and Caenorhabditis elegans whole embryos. DrEdGE increases the utility of large genomics datasets by removing technical obstacles to independent exploration. Availability and implementationFreely available at http://dredge.bio.unc.edu. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Przytycka, Teresa (Ed.)Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.more » « less
An official website of the United States government

