skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data
Abstract Summary Here, we introduce SNIKT, a command-line tool for sequence-independent visual confirmation and input-assisted removal of adapter contamination in whole-genome shotgun or metagenomic shotgun long-read sequencing DNA or RNA data. Availability and Implementation SNIKT is implemented in R and is compatible with Unix-like platforms. The source code, along with documentation, is freely available under an MIT license at https://github.com/piyuranjan/SNIKT. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
2030454
PAR ID:
10374335
Author(s) / Creator(s):
; ; ;
Editor(s):
Alkan, Can
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
15
ISSN:
1367-4803
Page Range / eLocation ID:
3830 to 3832
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationMetagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. ResultsWe develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. Availability and implementationHiFine is available at https://github.com/dyxstat/HiFine. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. metaviralSPAdes: Assembly of Viruses From Metagenomic Data Abstract Motivation: Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth's virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies. Results: We describe a metaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked metaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models, and demonstrated that it improves on the state-of-the-art viral identification pipelines. Availability: metaviralSPAdes includes viralAssembly, viralVerify, and viralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/. Supplementary information: Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract MotivationDouble minute (DM) chromosomes are acentric extrachromosomal DNA artifacts that are frequently observed in the cells of numerous cancers. They are highly amplified and contain oncogenes and drug-resistance genes, making their presence a challenge for effective cancer treatment. Algorithmic discovery of DM can potentially improve bench-derived therapies for cancer treatment. A hindrance to this task is that DMs evolve, yielding circular chromatin that shares segments from progenitor DMs. This creates DMs with overlapping amplicon coordinates. Existing DM discovery algorithms use whole genome shotgun sequencing (WGS) in isolation, which can potentially incorrectly classify DMs that share overlapping coordinates. ResultsIn this study, we describe an algorithm called ‘HolistIC’ that can predict DMs in tumor genomes by integrating WGS and Hi–C sequencing data. The consolidation of these sources of information resolves ambiguity in DM amplicon prediction that exists in DM prediction with WGS data used in isolation. We implemented and tested our algorithm on the tandem Hi–C and WGS datasets of three cancer datasets and a simulated dataset. Results on the cancer datasets demonstrated HolistIC’s ability to predict DMs from Hi–C and WGS data in tandem. The results on the simulated data showed the HolistIC can accurately distinguish DMs that have overlapping amplicon coordinates, an advance over methods that predict extrachromosomal amplification using WGS data in isolation. Availability and implementationOur software, named ‘HolistIC’, is available at http://www.github.com/mhayes20/HolistIC. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  4. Abstract SummaryDifferential Expression Gene Explorer (DrEdGE) is a web-based tool that guides genomicists through easily creating interactive online data visualizations, which colleagues can query according to their own conditions to discover genes, samples or patterns of interest. We demonstrate DrEdGE’s features with three example websites generated from publicly available datasets—human neuronal tissue, mouse embryonic tissue and Caenorhabditis elegans whole embryos. DrEdGE increases the utility of large genomics datasets by removing technical obstacles to independent exploration. Availability and implementationFreely available at http://dredge.bio.unc.edu. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  5. Valencia, Alfonso (Ed.)
    Abstract Motivation Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. Results We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. Availability and implementation projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. Contact gsteinobrien@jhmi.edu or ejfertig@jhmi.edu Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less