skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mapping Data - bacterial MAGs, bacterial contigs, and viral MAGs/contigs
Mapping data used in fractional abundance calculations for bacterial and viral communities. Mapping was performed with SMALT v0.7.6.  more » « less
Award ID(s):
2424579
PAR ID:
10646031
Author(s) / Creator(s):
Publisher / Repository:
figshare
Date Published:
Subject(s) / Keyword(s):
Sequence analysis Genomics and transcriptomics
Format(s):
Medium: X Size: 773156673 Bytes
Size(s):
773156673 Bytes
Sponsoring Org:
National Science Foundation
More Like this
  1. Bacterial Metagenome Assembled Genomes (bMAGs) from stony coral metagenomes. Bins from MaxBin2 v2.2.7 (Wu et al., 2015), MetaBat2 v2.15 (Kang et al., 2019), and CONCOCT v1.1.0 (Alneberg et al., 2014) were consolidated and improved with the metaWRAP v1.2.1 (Uritskiy et al., 2018). Bins were then quality filtered to ≥50% completion and ≤10% contamination with CheckM2 v1.0.2 (Chklovski et al., 2023). This dataset has been dereplicated using fastANI (--similarity-threshold 0.95). 
    more » « less
  2. Abstract SummaryA chimeric contig is contig that has been incorrectly assembled, i.e. a contig that contains one or more mis-joins. The detection of chimeric contigs can be carried out either by aligning assembled contigs to genome-wide maps (e.g. genetic, physical or optical maps) or by mapping sequenced reads to the assembled contigs. Here, we introduce a software tool called Chimericognizer that takes advantage of one or more Bionano Genomics optical maps to accurately detect and correct chimeric contigs. Experimental results show that Chimericognizer is very accurate, and significantly better than the chimeric detection method offered by the Bionano Hybrid Scaffold pipeline. Chimericognizer can also detect and correct chimeric optical molecules. Availability and implementationhttps://github.com/ucrbioinfo/Chimericognizer Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types. 
    more » « less
  4. METHODS: Soil samples (6 total) were collected at the Stordalen Mire site in 2019 from two depths (1-5 & 20-24 cm below ground) across three habitats (Palsa, Bog, and Fen). DNA was extracted based on the protocol described by Li et al. (2024). For short reads, libraries were prepared at the Joint Genome Institute (JGI) with the KAPA Hyperprep kit, and sequenced with Illumina NovaSeq 6000. For long reads, libraries were prepared with the SMRTbell Express Template Prep Kit 2.0 (PacBio), then sequenced using PacBio Sequel IIe at JGI. PacBio data was processed at JGI to form filtered CCS (Circular Consensus Sequencing) reads.  Assemblies were generated with short-only, long-only, and hybrid read sources: Short-only was assembled with metaSPAdes (v3.15.4) using Aviary (v0.5.3) with default parameters. Long-only was assembled with metaFlye (v2.9-b1768) using Aviary (v0.5.3) with default parameters. Hybrid assembly was performed using Aviary v0.5.3 with default parameters. This involved a step-down procedure with long-read assembly through metaFlye (v2.9-b1768), followed by short-read polishing by Racon (v1.4.3), Pilon (v1.24) and then Racon again. Next, reads that didn't map to high-quality metaFlye contigs were hybrid assembled with SPAdes (--meta option) and binned out with MetaBAT2 (v2.1.5). For each bin, the reads within the bin were hybrid assembled using Unicycler (v0.4.8). The high-coverage metaFlye contigs and Unicycler contigs were then combined to form the assembly fasta file. Genome recovery was performed using Aviary v0.5.3 with samples chosen for differential abundance binning by Bin Chicken (v0.4.2) using SingleM metapackage S3.0.5. This involved initial read mapping through CoverM (v0.6.1) using minimap2 (v2.18) and binning by MetaBAT, MetaBAT2 (v2.1.5), VAMB (v3.0.2), SemiBin (v1.3.1), Rosella (v0.4.2), CONCOCT (v1.1.0) and MaxBin2 (v2.2.7). Genomes were analyzed using CheckM2 (v1.0.2) and clustered at 95% ANI using Galah (v0.4.0).   FILES: EMERGE_MAGs_2019_long-short-hybrid.tar.gz - Archive containing the MAG files (.fna). metadata_MAGs_2019_EMERGE.tsv - Table containing source sample names and accessions, GTDB classifications, CheckM2 quality information, NCBI GenomeBatch- and MIMAG(6.0)-formatted attributes, and other metadata for the MAGs.   FUNDING: This research is a contribution of the EMERGE Biology Integration Institute (https://emerge-bii.github.io/), funded by the National Science Foundation, Biology Integration Institutes Program, Award # 2022070. This study was also funded by the Genomic Science Program of the United States Department of Energy Office of Biological and Environmental Research, grant #s DE-SC0004632. DE-SC0010580. and DE-SC0016440. We thank the Swedish Polar Research Secretariat and SITES for the support of the work done at the Abisko Scientific Research Station. SITES is supported by the Swedish Research Council's grant 4.3-2021-00164. Data from the Joint Genome Institute (JGI) was collected under BER Support Science Proposal 503530 (DOI: 10.46936/10.25585/60001148), conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. 
    more » « less
  5. This release (MAGs v2) is a major new version of this metagenome-assembled genome (MAG) set. All previous releases on this page (which only differ in the metadata) are designated "MAGs v1." The current release (MAGs v2) uses CheckM2 v1.0.2 filtering (≥70% completeness, ≤10% contamination) to expand this dataset to include 36,419 MAGs, with the following subcategories: Cronin_v1:  Manually-curated subset of the "Field" category from MAGs v1. Cronin_v2:  MAGs from raw bin filtering on the same assemblies used to generate Cronin_v1. Woodcroft_v2:  MAGs from raw bin filtering on the same assemblies used to generate the MAGs reported in Woodcroft & Singleton et al. (2018). SIPS:  Updated genomes from samples originating from a stable isotope probing (SIP) incubation experiment by Moira Hough et al. ("SIP" in MAGs v1), re-analyzed due to read truncation and sample linkage issues in MAGs v1. JGI:  Expanded set of genomes from the Joint Genome Institute's metagenome annotation pipeline.   FILES: Emerge_MAGs_v2.tar.gz - Archive containing the MAG files (.fna). metadata_MAGs_v2_EMERGE.tsv - Table containing source sample names and accessions, GTDB taxonomy information, CheckM2 quality reports, NCBI GenomeBatch- and MIMAG(6.0)-formatted sample attributes and other metadata for the MAGs.    FUNDING: This research is a contribution of the EMERGE Biology Integration Institute (https://emerge-bii.github.io/), funded by the National Science Foundation, Biology Integration Institutes Program, Award # 2022070. This study was also funded by the Genomic Science Program of the United States Department of Energy Office of Biological and Environmental Research, grant #s DE-SC0004632. DE-SC0010580. and DE-SC0016440. We thank the Swedish Polar Research Secretariat and SITES for the support of the work done at the Abisko Scientific Research Station. SITES is supported by the Swedish Research Council's grant 4.3-2021-00164. Data collected at the Joint Genome Institute was generated under the following awards: The majority of sequencing at JGI was supported by BER Support Science Proposal 503530 (DOI: 10.46936/10.25585/60001148), conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sequencing of SIP samples was performed under the Facilities Integrating Collaborations for User Science (FICUS) initiative (proposal 503547; award DOI: 10.46936/fics.proj.2017.49950/60006215) and used resources at the DOE Joint Genome Institute (https://ror.org/04xm1d337) and the Environmental Molecular Sciences Laboratory (https://ror.org/04rc0xn13), which are DOE Office of Science User Facilities. Both facilities are sponsored by the Office of Biological and Environmental Research and operated under Contract Nos. DE-AC02-05CH11231 (JGI) and DE-AC05-76RL01830 (EMSL). 
    more » « less