skip to main content


Title: peaksat: an R package for ChIP-seq peak saturation analysis
Abstract Background

Epigenomic profiling assays such as ChIP-seq have been widely used to map the genome-wide enrichment profiles of chromatin-associated proteins and posttranslational histone modifications. Sequencing depth is a key parameter in experimental design and quality control. However, due to variable sequencing depth requirements across experimental conditions, it can be challenging to determine optimal sequencing depth, particularly for projects involving multiple targets or cell types.

Results

We developed thepeaksatR package to provide target read depth estimates for epigenomic experiments based on the analysis of peak saturation curves. We appliedpeaksatto establish the distinctive read depth requirements for ChIP-seq studies of histone modifications in different cell lines. Usingpeaksat,we were able to estimate the target read depth required per library to obtain high-quality peak calls for downstream analysis. In addition,peaksatwas applied to other sequence-enrichment methods including CUT&RUN and ATAC-seq.

Conclusion

peaksataddresses a need for researchers to make informed decisions about whether their sequencing data has been generated to an adequate depth and subsequently sufficient meaningful peaks, and failing that, how many more reads would be required per library.peaksatis applicable to other sequence-based methods that include calling peaks in their analysis.

 
more » « less
Award ID(s):
1826689
NSF-PAR ID:
10392914
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Genomics
Volume:
24
Issue:
1
ISSN:
1471-2164
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Chromatin immunoprecipitation followed by next‐generation sequencing (ChIP‐seq) is a technique to detect genomic regions containing protein‐DNA interaction, such as transcription factor binding sites or regions containing histone modifications. One goal of the analysis of ChIP‐seq experiments is to identify genomic loci enriched for sequencing reads pertaining to DNA bound to the factor of interest. The accurate identification of such regions aids in the understanding of epigenomic marks and gene regulatory mechanisms. Given the reduction of massively parallel sequencing costs, methods to detect consensus regions of enrichment across multiple samples are of interest. Here, we present a statistical model to detect broad consensus regions of enrichment from ChIP‐seq technical or biological replicates through a class of zero‐inflated mixed‐effects hidden Markov models. We show that the proposed model outperforms existing methods for consensus peak calling in common epigenomic marks by accounting for the excess zeros and sample‐specific biases. We apply our method to data from the Encyclopedia of DNA Elements and Roadmap Epigenomics projects and also from an extensive simulation study.

     
    more » « less
  2. Abstract

    Cleavage under targets and release using nuclease (CUT&RUN) is a recently developed chromatin profiling technique that uses a targeted micrococcal nuclease cleavage strategy to obtain high‐resolution binding profiles of protein factors or to map histones with specific post‐translational modifications. Due to its high sensitivity, CUT&RUN allows quality binding profiles to be obtained with only a fraction of the starting material and sequencing depth typically required for other chromatin profiling techniques such as chromatin immunoprecipitation. Although CUT&RUN has been widely adopted in multiple model systems, it has rarely been utilized inCaenorhabditis elegans, a model system of great importance to genomic research. Cell dissociation techniques, which are required for this approach, can be challenging inC. elegansdue to the toughness of the worm's cuticle and the sensitivity of the cells themselves. Here, we describe a robust CUT&RUN protocol for use inC. elegansto determine the genome‐wide localization of protein factors and specific histone marks. With a simple protocol utilizing live, uncrosslinked tissue as the starting material, performing CUT&RUN in worms has the potential to produce physiologically relevant data at a higher resolution than chromatin immunoprecipitation. This protocol involves a simple dissociation step to uniformly permeabilize worms while avoiding sample loss or cell damage, resulting in high‐quality CUT&RUN profiles with as few as 100 worms and detectable signal with as few as 10 worms. This represents a significant advancement over chromatin immunoprecipitation, which typically uses thousands or hundreds of thousands of worms for a single experiment. The protocols presented here provide a detailed description of worm growth, sample preparation, CUT&RUN workflow, library preparation for high‐throughput sequencing, and a basic overview of data analysis, making CUT&RUN simple and accessible for any worm lab. © 2022 Wiley Periodicals LLC.

    Basic Protocol 1: Growth and synchronization ofC. elegans

    Basic Protocol 2: Worm dissociation, sample preparation, and optimization

    Basic Protocol 3: CUT&RUN chromatin profiling

    Alternate Protocol: Improving CUT&RUN signal using a secondary antibody

    Basic Protocol 4: CUT&RUN library preparation for Illumina high‐throughput sequencing

    Basic Protocol 5: Basic data analysis using Linux

     
    more » « less
  3. Abstract

    Histone post‐translational modifications (PTMs) play important roles in many biological processes, including gene regulation and chromatin dynamics, and are thus of high interest across many fields of biological research. Chromatin immunoprecipitation coupled with sequencing (ChIP‐seq) is a powerful tool to profile histone PTMsin vivo. This method, however, is largely dependent on the specificity and availability of suitable commercial antibodies. While mass spectrometry (MS)–based proteomic approaches to quantitatively measure histone PTMs have been developed in mammals and several other model organisms, such methods are currently not readily available in plants. One major challenge for the implementation of such methods in plants has been the difficulty in isolating sufficient amounts of pure, high‐quality histones, a step rendered difficult by the presence of the cell wall. Here, we developed a high‐yielding histone extraction and purification method optimized forArabidopsis thalianathat can be used to obtain high‐quality histones for MS. In contrast to other methods used in plants, this approach is relatively simple, and does not require membranes or additional specialized steps, such as gel excision or chromatography, to extract highly purified histones. We also describe methods for producing MS‐ready histone peptides through chemical labeling and digestion. Finally, we describe an optimized method to quantify and analyze the resulting histone PTM data using a modified version of EpiProfile 2.0 for Arabidopsis. In all, the workflow described here can be used to measure changes to histone PTMs resulting from various treatments, stresses, and time courses, as well as in different mutant lines. © 2022 Wiley Periodicals LLC.

    Basic Protocol 1: Nuclear isolation and histone acid extraction

    Basic Protocol 2: Peptide labeling, digestion, and desalting

    Basic Protocol 3: Histone HPLC‐MS/MS and data analysis

     
    more » « less
  4. Abstract

    The ability to profile transcriptomes and characterize global gene expression changes has been greatly enabled by the development of RNA sequencing technologies (RNA-seq). However, the process of generating sequencing-compatible cDNA libraries from RNA samples can be time-consuming and expensive, especially for bacterial mRNAs which lack poly(A)-tails that are often used to streamline this process for eukaryotic samples. Compared to the increasing throughput and decreasing cost of sequencing, library preparation has had limited advances. Here, we describe bacterial-multiplexed-seq (BaM-seq), an approach that enables simple barcoding of many bacterial RNA samples that decreases the time and cost of library preparation. We also present targeted-bacterial-multiplexed-seq (TBaM-seq) that allows for differential expression analysis of specific gene panels with over 100-fold enrichment in read coverage. In addition, we introduce the concept of transcriptome redistribution based on TBaM-seq that dramatically reduces the required sequencing depth while still allowing for quantification of both highly and lowly abundant transcripts. These methods accurately measure gene expression changes with high technical reproducibility and agreement with gold standard, lower throughput approaches. Together, use of these library preparation protocols allows for fast, affordable generation of sequencing libraries.

     
    more » « less
  5. Abstract

    The development of next-generation sequencing (NGS) enabled a shift from array-based genotyping to directly sequencing genomic libraries for high-throughput genotyping. Even though whole-genome sequencing was initially too costly for routine analysis in large populations such as breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to capitalize on whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage to genotype various breeding populations, a limitation comes in the time and cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on the amount of data required, and could be extended to 3,072 samples or more. Panels of doubled haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T.aestivumxHordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1 × down to 0.01 × per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.

     
    more » « less