The microbiomes of tropical corals are actively studied using 16S rRNA gene amplicons to understand microbial roles in coral health, metabolism, and disease resistance. However, due to the prokaryotic origins of mitochondria, primers targeting bacterial and archaeal 16S rRNA genes may also amplify homologous 12S mitochondrial rRNA genes from the host coral, associated microbial eukaryotes, and encrusting organisms. Standard microbial bioinformatics pipelines attempt to identify and remove these sequences by comparing them to reference taxonomies. However, commonly used tools have severely under-annotated mitochondrial sequences in 1440 coral microbiomes from the Global Coral Microbiome Project, preventing annotation of over 95% of reads in some samples. This issue persists when using Greengenes or SILVA prokaryotic reference taxonomies, and in other hosts, including 16S studies of vertebrates, and of marine sponges. Worse, mitochondrial under-annotation varies between coral families and across coral compartments, biasing comparisons of - and -diversity. By supplementing existing reference taxonomies with over 3000 animal mitochondrial rRNA gene sequences, we resolved roughly 97% of unique unclassified sequences as mitochondrial. These additional sequences did not cause a false elevation in mitochondrial annotations in mock communities with known compositions. We recommend using these extended taxonomies for coral microbiome analysis and whenever eukaryotic contamination may be a concern.
more »
« less
Transcriptome analysis provides genome annotation and expression profiles in the central nervous system of Lymnaea stagnalis at different ages
Abstract Background The pond snail, Lymnaea stagnalis ( L. stagnalis ), has served as a valuable model organism for neurobiology studies due to its simple and easily accessible central nervous system (CNS). L. stagnalis has been widely used to study neuronal networks and recently gained popularity for study of aging and neurodegenerative diseases. However, previous transcriptome studies of L. stagnalis CNS have been exclusively carried out on adult L. stagnalis only. As part of our ongoing effort studying L. stagnalis neuronal growth and connectivity at various developmental stages, we provide the first age-specific transcriptome analysis and gene annotation of young (3 months), adult (6 months), and old (18 months) L. stagnalis CNS. Results Using the above three age cohorts, our study generated 55–69 millions of 150 bp paired-end RNA sequencing reads using the Illumina NovaSeq 6000 platform. Of these reads, ~ 74% were successfully mapped to the reference genome of L. stagnalis . Our reference-based transcriptome assembly predicted 42,478 gene loci, of which 37,661 genes encode coding sequences (CDS) of at least 100 codons. In addition, we provide gene annotations using Blast2GO and functional annotations using Pfam for ~ 95% of these sequences, contributing to the largest number of annotated genes in L. stagnalis CNS so far. Moreover, among 242 previously cloned L. stagnalis genes, we were able to match ~ 87% of them in our transcriptome assembly, indicating a high percentage of gene coverage. The expressional differences for innexins, FMRFamide, and molluscan insulin peptide genes were validated by real-time qPCR. Lastly, our transcriptomic analyses revealed distinct, age-specific gene clusters, differentially expressed genes, and enriched pathways in young, adult, and old CNS. More specifically, our data show significant changes in expression of critical genes involved in transcription factors, metabolisms (e.g. cytochrome P450), extracellular matrix constituent, and signaling receptor and transduction (e.g. receptors for acetylcholine, N-Methyl-D-aspartic acid, and serotonin), as well as stress- and disease-related genes in young compared to either adult or old snails. Conclusions Together, these datasets are the largest and most updated L. stagnalis CNS transcriptomes, which will serve as a resource for future molecular studies and functional annotation of transcripts and genes in L. stagnalis .
more »
« less
- Award ID(s):
- 1916563
- PAR ID:
- 10376900
- Date Published:
- Journal Name:
- BMC Genomics
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 1471-2164
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Holland, J. (Ed.)Abstract Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more “complete” genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation.more » « less
-
Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.more » « less
-
Abstract Single-cell RNA sequencing is a powerful technique that continues to expand across various biological applications. However, incomplete 3′-UTR annotations can impede single-cell analysis resulting in genes that are partially or completely uncounted. Performing single-cell RNA sequencing with incomplete 3′-UTR annotations can hinder the identification of cell identities and gene expression patterns and lead to erroneous biological inferences. We demonstrate that performing single-cell isoform sequencing in tandem with single-cell RNA sequencing can rapidly improve 3′-UTR annotations. Using threespine stickleback fish (Gasterosteus aculeatus), we show that gene models resulting from a minimal embryonic single-cell isoform sequencing dataset retained 26.1% greater single-cell RNA sequencing reads than gene models from Ensembl alone. Furthermore, pooling our single-cell sequencing isoforms with a previously published adult bulk Iso-Seq dataset from stickleback, and merging the annotation with the Ensembl gene models, resulted in a marginal improvement (+0.8%) over the single-cell isoform sequencing only dataset. In addition, isoforms identified by single-cell isoform sequencing included thousands of new splicing variants. The improved gene models obtained using single-cell isoform sequencing led to successful identification of cell types and increased the reads identified of many genes in our single-cell RNA sequencing stickleback dataset. Our work illuminates single-cell isoform sequencing as a cost-effective and efficient mechanism to rapidly annotate genomes for single-cell RNA sequencing.more » « less
-
Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.more » « less
An official website of the United States government

