Abstract Assessing the dynamics of chromatin features and transcription factor (TF) binding at scale remains a significant challenge in plants. Here, we present PHILO (Plant HIgh-throughput LOw input) ChIP-seq, a high-throughput ChIP-seq platform that enables the cost-effective and extensive capture of TF binding and genome-wide distributions of histone modifications. The PHILO ChIP-seq pipeline is adaptable to many plant species, requires very little starting material (1mg), and provides the option to use MNase (micrococcal nuclease) for chromatin fragmentation. By employing H3K9ac PHILO ChIP-seq on eight Arabidopsis thaliana jasmonic acid (JA) pathway mutants, with the simultaneous processing of over 100 samples, we not only recapitulated but also expanded the current understanding of the intricate interplay between the master TFs MYC2/3/4 and various chromatin regulators. Additionally, our analyses brought to light previously unknown histone acetylation patterns within the regulatory regions of MYC2 target genes in Arabidopsis, which is also conserved in tomato (Solanum lycopersicum). In summary, our PHILO ChIP-seq platform demonstrates its high effectiveness in investigating TF binding and chromatin dynamics on a large scale in plants, paving the way for the cost-efficient realization of complex experimental setups.
more »
« less
Characterizing DNA recognition preferences of transcription factors using global couplings and high-throughput sequencing
Abstract DNA–transcription factor (TF) interactions are essential for gene regulation. Fully characterizing TF recognition specificities and identifying their genomic binding targets are important to understand TF function and regulatory networks. Recently, high-throughput sequencing technology HT-SELEX (high-throughput systematic evolution of ligands by exponential enrichment) has been used to measure hundreds of TFs, providing massive datasets that comprise TF binding preferences. However, there is a need to develop comprehensive computational modeling to fully extract and characterize critical TF binding preferences and fail to distinguish genome-wide binding targets. In this study, we developed a global pairwise model called DCA-Scapes trained with experimental HT-SELEX data. Our approach uncovered high-resolution TF recognition specificity landscapes, enabled the prediction of in vivo binding sequences, and was validated with ChIP-seq (ChIP sequencing) data. In addition, the DCA-Scapes model was utilized to refine the locations of binding regions and accurately identify the binding sites within the ChIP-seq enriched peaks. Moreover, we extended our model to cover the entire human genome, uncovering potential TF target sites that exhibit tissue-specific TF recognition across various cellular environments.
more »
« less
- Award ID(s):
- 1943442
- PAR ID:
- 10612253
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 53
- Issue:
- 12
- ISSN:
- 0305-1048
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.more » « less
-
Komeili, Arash (Ed.)ABSTRACT Histone proteins are found across diverse lineages of Archaea , many of which package DNA and form chromatin. However, previous research has led to the hypothesis that the histone-like proteins of high-salt-adapted archaea, or halophiles, function differently. The sole histone protein encoded by the model halophilic species Halobacterium salinarum , HpyA, is nonessential and expressed at levels too low to enable genome-wide DNA packaging. Instead, HpyA mediates the transcriptional response to salt stress. Here we compare the features of genome-wide binding of HpyA to those of HstA, the sole histone of another model halophile, Haloferax volcanii . hstA , like hpyA , is a nonessential gene. To better understand HpyA and HstA functions, protein-DNA binding data (chromatin immunoprecipitation sequencing [ChIP-seq]) of these halophilic histones are compared to publicly available ChIP-seq data from DNA binding proteins across all domains of life, including transcription factors (TFs), nucleoid-associated proteins (NAPs), and histones. These analyses demonstrate that HpyA and HstA bind the genome infrequently in discrete regions, which is similar to TFs but unlike NAPs, which bind a much larger genomic fraction. However, unlike TFs that typically bind in intergenic regions, HpyA and HstA binding sites are located in both coding and intergenic regions. The genome-wide dinucleotide periodicity known to facilitate histone binding was undetectable in the genomes of both species. Instead, TF-like and histone-like binding sequence preferences were detected for HstA and HpyA, respectively. Taken together, these data suggest that halophilic archaeal histones are unlikely to facilitate genome-wide chromatin formation and that their function defies categorization as a TF, NAP, or histone. IMPORTANCE Most cells in eukaryotic species—from yeast to humans—possess histone proteins that pack and unpack DNA in response to environmental cues. These essential proteins regulate genes necessary for important cellular processes, including development and stress protection. Although the histone fold domain originated in the domain of life Archaea , the function of archaeal histone-like proteins is not well understood relative to those of eukaryotes. We recently discovered that, unlike histones of eukaryotes, histones in hypersaline-adapted archaeal species do not package DNA and can act as transcription factors (TFs) to regulate stress response gene expression. However, the function of histones across species of hypersaline-adapted archaea still remains unclear. Here, we compare hypersaline histone function to a variety of DNA binding proteins across the tree of life, revealing histone-like behavior in some respects and specific transcriptional regulatory function in others.more » « less
-
SUMMARY The stilbenoid pathway is responsible for the production of resveratrol in grapevine (Vitis viniferaL.). A few transcription factors (TFs) have been identified as regulators of this pathway but the extent of this control has not been deeply studied. Here we show how DNA affinity purification sequencing (DAP‐Seq) allows for the genome‐wide TF‐binding site interrogation in grape. We obtained 5190 and 4443 binding events assigned to 4041 and 3626 genes for MYB14 and MYB15, respectively (approximately 40% of peaks located within −10 kb of transcription start sites). DAP‐Seq of MYB14/MYB15 was combined with aggregate gene co‐expression networks (GCNs) built from more than 1400 transcriptomic datasets from leaves, fruits, and flowers to narrow down bound genes to a set of high confidence targets. The analysis of MYB14, MYB15, and MYB13, a third uncharacterized member of Subgroup 2 (S2), showed that in addition to the few previously known stilbene synthase (STS) targets, these regulators bind to 30 of 47STSfamily genes. Moreover, all three MYBs bind to severalPAL,C4H, and4CLgenes, in addition to shikimate pathway genes, theWRKY03stilbenoid co‐regulator and resveratrol‐modifying gene candidates among which ROMT2‐3 were validated enzymatically. A high proportion of DAP‐Seq bound genes were induced in the activated transcriptomes of transientMYB15‐overexpressing grapevine leaves, validating our methodological approach for delimiting TF targets. Overall, Subgroup 2 R2R3‐MYBs appear to play a key role in binding and directly regulating several primary and secondary metabolic steps leading to an increased flux towards stilbenoid production. The integration of DAP‐Seq and reciprocal GCNs offers a rapid framework for gene function characterization using genome‐wide approaches in the context of non‐model plant species and stands up as a valid first approach for identifying gene regulatory networks of specialized metabolism.more » « less
-
Abstract Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to identify factor binding to genomic DNA and chromatin modifications. ChIP-seq data analysis is affected by genomic regions that generate ultra-high artifactual signals. To remove these signals from ChIP-seq data, the Encyclopedia of DNA Elements (ENCODE) project developed comprehensive sets of regions defined by low mappability and ultra-high signals called blacklists for human, mouse (Mus musculus), nematode (Caenorhabditis elegans), and fruit fly (Drosophila melanogaster). However, blacklists are not currently available for many model and nonmodel species. Here, we describe an alternative approach for removing false-positive peaks called greenscreen. Greenscreen is easy to implement, requires few input samples, and uses analysis tools frequently employed for ChIP-seq. Greenscreen removes artifactual signals as effectively as blacklists in Arabidopsis thaliana and human ChIP-seq dataset while covering less of the genome and dramatically improves ChIP-seq peak calling and downstream analyses. Greenscreen filtering reveals true factor binding overlap and occupancy changes in different genetic backgrounds or tissues. Because it is effective with as few as two inputs, greenscreen is readily adaptable for use in any species or genome build. Although developed for ChIP-seq, greenscreen also identifies artifactual signals from other genomic datasets including Cleavage Under Targets and Release Using Nuclease. We present an improved ChIP-seq pipeline incorporating greenscreen that detects more true peaks than other methods.more » « less
An official website of the United States government
