A unified hypothesis-free feature extraction framework for diverse epigenomic data

Balcı, Ali_Tuğrul (ORCID:0000000214616733); Chikina, Maria; Mahony, ed., Shaun

doi:10.1093/bioadv/vbaf013

Abstract MotivationEpigenetic assays using next-generation sequencing have furthered our understanding of the functional genomic regions and the mechanisms of gene regulation. However, a single assay produces billions of data points, with limited information about the biological process due to numerous sources of technical and biological noise. To draw biological conclusions, numerous specialized algorithms have been proposed to summarize the data into higher-order patterns, such as peak calling and the discovery of differentially methylated regions. The key principle underlying these approaches is the search for locally consistent patterns. ResultsWe propose L0 segmentation as a universal framework for extracting locally coherent signals for diverse epigenetic sources. L0 serves to compress the input signal by approximating it as a piecewise constant. We implement a highly scalable L0 segmentation with additional loss functions designed for sequencing epigenetic data types including Poisson loss for single tracks and binomial loss for methylation/coverage data. We show that the L0 segmentation approach retains the salient features of the data yet can identify subtle features, such as transcription end sites, missed by other analytic approaches. Availability and implementationOur approach is implemented as an R package “l01segmentation” with a C++ backend. Available at https://github.com/boooooogey/l01segmentation.

More Like this