Abstract The ability to profile transcriptomes and characterize global gene expression changes has been greatly enabled by the development of RNA sequencing technologies (RNA-seq). However, the process of generating sequencing-compatible cDNA libraries from RNA samples can be time-consuming and expensive, especially for bacterial mRNAs which lack poly(A)-tails that are often used to streamline this process for eukaryotic samples. Compared to the increasing throughput and decreasing cost of sequencing, library preparation has had limited advances. Here, we describe bacterial-multiplexed-seq (BaM-seq), an approach that enables simple barcoding of many bacterial RNA samples that decreases the time and cost of library preparation. We also present targeted-bacterial-multiplexed-seq (TBaM-seq) that allows for differential expression analysis of specific gene panels with over 100-fold enrichment in read coverage. In addition, we introduce the concept of transcriptome redistribution based on TBaM-seq that dramatically reduces the required sequencing depth while still allowing for quantification of both highly and lowly abundant transcripts. These methods accurately measure gene expression changes with high technical reproducibility and agreement with gold standard, lower throughput approaches. Together, use of these library preparation protocols allows for fast, affordable generation of sequencing libraries.
more »
« less
peaksat: an R package for ChIP-seq peak saturation analysis
Abstract BackgroundEpigenomic profiling assays such as ChIP-seq have been widely used to map the genome-wide enrichment profiles of chromatin-associated proteins and posttranslational histone modifications. Sequencing depth is a key parameter in experimental design and quality control. However, due to variable sequencing depth requirements across experimental conditions, it can be challenging to determine optimal sequencing depth, particularly for projects involving multiple targets or cell types. ResultsWe developed thepeaksatR package to provide target read depth estimates for epigenomic experiments based on the analysis of peak saturation curves. We appliedpeaksatto establish the distinctive read depth requirements for ChIP-seq studies of histone modifications in different cell lines. Usingpeaksat,we were able to estimate the target read depth required per library to obtain high-quality peak calls for downstream analysis. In addition,peaksatwas applied to other sequence-enrichment methods including CUT&RUN and ATAC-seq. Conclusionpeaksataddresses a need for researchers to make informed decisions about whether their sequencing data has been generated to an adequate depth and subsequently sufficient meaningful peaks, and failing that, how many more reads would be required per library.peaksatis applicable to other sequence-based methods that include calling peaks in their analysis.
more »
« less
- Award ID(s):
- 1826689
- PAR ID:
- 10392914
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- BMC Genomics
- Volume:
- 24
- Issue:
- 1
- ISSN:
- 1471-2164
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard “multimapped” reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.more » « less
-
ABSTRACT Metagenomics is a powerful tool for characterising viruses, with broad applications across diverse disciplines, from understanding the ecology and evolutionary history of viruses to identifying causative agents of emerging outbreaks with unknown aetiology. Additionally, metagenomic data contains valuable information about the amount of virus present within samples. However, we have yet to leverage metagenomics to assess viral load, which is a key epidemiological parameter. To effectively use sequencing outputs to inform transmission, we need to understand the relationship between read depth and viral load across a diverse set of viruses. Here, using target enrichment sequencing, we investigated the detection and recovery of virus genomes by spiking known concentrations of DNA and RNA viruses into wild rodent faecal samples. In total, 15 experimental replicates were sequenced with target enrichment sequencing and compared to shotgun sequencing of the same background samples. Target enriched sequencing recovered all spike-in viruses at every concentration (102, 103, and 105± 1 log genome copies) and showed a log-linear relationship between spike-in concentration and mean read depth. Background viruses (includingKobuvirusandCardiovirus) were recovered consistently across all biological and technical replicates, but genome coverage was variable between virus genera and likely reflected the composition of target enrichment probe panel. Overall, our study highlights the strengths and weaknesses of using commercially available panels to quantify and characterise wildlife viromes, and underscores the importance of probe panel design for accurately interpreting coverage and read depth. To advance the use of metagenomics for understanding virus transmission, further research will be needed to elucidate how sequencing strategy (e.g. library depth, pooling), virome composition, and probe design influence viral read counts and genome coverage.more » « less
-
Abstract SummaryWith the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract Histone post‐translational modifications (PTMs) play important roles in many biological processes, including gene regulation and chromatin dynamics, and are thus of high interest across many fields of biological research. Chromatin immunoprecipitation coupled with sequencing (ChIP‐seq) is a powerful tool to profile histone PTMsin vivo. This method, however, is largely dependent on the specificity and availability of suitable commercial antibodies. While mass spectrometry (MS)–based proteomic approaches to quantitatively measure histone PTMs have been developed in mammals and several other model organisms, such methods are currently not readily available in plants. One major challenge for the implementation of such methods in plants has been the difficulty in isolating sufficient amounts of pure, high‐quality histones, a step rendered difficult by the presence of the cell wall. Here, we developed a high‐yielding histone extraction and purification method optimized forArabidopsis thalianathat can be used to obtain high‐quality histones for MS. In contrast to other methods used in plants, this approach is relatively simple, and does not require membranes or additional specialized steps, such as gel excision or chromatography, to extract highly purified histones. We also describe methods for producing MS‐ready histone peptides through chemical labeling and digestion. Finally, we describe an optimized method to quantify and analyze the resulting histone PTM data using a modified version of EpiProfile 2.0 for Arabidopsis. In all, the workflow described here can be used to measure changes to histone PTMs resulting from various treatments, stresses, and time courses, as well as in different mutant lines. © 2022 Wiley Periodicals LLC. Basic Protocol 1: Nuclear isolation and histone acid extraction Basic Protocol 2: Peptide labeling, digestion, and desalting Basic Protocol 3: Histone HPLC‐MS/MS and data analysismore » « less
An official website of the United States government
