NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ChromGene: gene-based modeling of epigenomic data

https://doi.org/10.1186/s13059-023-03041-5

Jaroszewicz, Artur; Ernst, Jason (September 2023, Genome Biology)

Abstract Various computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses. We present ChromGene, a method based on a mixture of learned hidden Markov models, to annotate genes based on multiple epigenomic maps across the gene body and flanks. We provide ChromGene assignments for over 100 cell and tissue types. We characterize the mixture components in terms of gene expression, constraint, and other gene annotations. The ChromGene method and annotations will provide a useful resource for gene-based epigenomic analyses.
more » « less
Universal chromatin state annotation of the mouse genome

https://doi.org/10.1186/s13059-023-02994-x

Vu, Ha; Ernst, Jason (June 2023, Genome Biology)

Abstract A large-scale application of the “stacked modeling” approach for chromatin state discovery previously provides a single “universal” chromatin state annotation of thehumangenome based jointly on data from many cell and tissue types. Here, we produce an analogous chromatin state annotation formousebased on 901 datasets assaying 14 chromatin marks in 26 cell or tissue types. To characterize each chromatin state, we relate the states to external annotations and compare them to analogously definedhumanstates. We expect the universal chromatin state annotation formouseto be a useful resource for studying this key model organism’s genome.
more » « less
A framework for group-wise summarization and comparison of chromatin state annotations

https://doi.org/10.1093/bioinformatics/btac722

Vu, Ha; Koch, Zane; Fiziev, Petko; Ernst, Jason; Martelli, ed., Pier Luigi (November 2022, Bioinformatics)

Abstract MotivationGenome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution. ResultsWe developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP’s probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution. Availability and implementationThe CSREP source code and generated data are available at http://github.com/ernstlab/csrep. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Universal annotation of the human genome through integration of over a thousand epigenomic datasets

https://doi.org/10.1186/s13059-021-02572-z

Vu, Ha; Ernst, Jason (January 2022, Genome Biology)

Abstract BackgroundGenome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. ResultsUsing a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. ConclusionsThe full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.
more » « less
Single-nucleotide conservation state annotation of the SARS-CoV-2 genome

https://doi.org/10.1038/s42003-021-02231-w

Kwon, Soo Bin; Ernst, Jason (June 2021, Communications Biology)

Abstract Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.
more » « less
Leveraging allelic imbalance to refine fine-mapping for eQTL studies

https://doi.org/10.1371/journal.pgen.1008481

Zou, Jennifer; Hormozdiari, Farhad; Jew, Brandon; Castel, Stephane E.; Lappalainen, Tuuli; Ernst, Jason; Sul, Jae Hoon; Eskin, Eleazar; Wen, Xiaoquan (December 2019, PLOS Genetics)

Full Text Available
DNA methylation networks underlying mammalian traits

https://doi.org/10.1126/science.abq5693

Haghani, Amin; Li, Caesar Z.; Robeck, Todd R.; Zhang, Joshua; Lu, Ake T.; Ablaeva, Julia; Acosta-Rodríguez, Victoria A.; Adams, Danielle M.; Alagaili, Abdulaziz N.; Almunia, Javier; et al (August 2023, Science)

Using DNA methylation profiles (n= 15,456) from 348 mammalian species, we constructed phyloepigenetic trees that bear marked similarities to traditional phylogenetic ones. Using unsupervised clustering across all samples, we identified 55 distinct cytosine modules, of which 30 are related to traits such as maximum life span, adult weight, age, sex, and human mortality risk. Maximum life span is associated with methylation levels inHOXLsubclass homeobox genes and developmental processes and is potentially regulated by pluripotency transcription factors. The methylation state of some modules responds to perturbations such as caloric restriction, ablation of growth hormone receptors, consumption of high-fat diets, and expression of Yamanaka factors. This study reveals an intertwined evolution of the genome and epigenome that mediates the biological characteristics and traits of different mammalian species.
more » « less
Full Text Available

Search for: All records