skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PeakMatcher facilitates updated Aedes aegypti embryonic cis-regulatory element map
Abstract Background The Aedes aegypti mosquito is a threat to human health across the globe. The A. aegypti genome was recently re-sequenced and re-assembled. Due to a combination of long-read PacBio and Hi-C sequencing, the AaegL5 assembly is chromosome complete and significantly improves the assembly in key areas such as the M/m sex-determining locus. Release of the updated genome assembly has precipitated the need to reprocess historical functional genomic data sets, including cis -regulatory element (CRE) maps that had previously been generated for A. aegypti. Results We re-processed and re-analyzed the A. aegypti whole embryo FAIRE seq data to create an updated embryonic CRE map for the AaegL5 genome. We validated that the new CRE map recapitulates key features of the original AaegL3 CRE map. Further, we built on the improved assembly in the M/m locus to analyze overlaps of open chromatin regions with genes. To support the validation, we created a new method (PeakMatcher) for matching peaks from the same experimental data set across genome assemblies. Conclusion Use of PeakMatcher software, which is available publicly under an open-source license, facilitated the release of an updated and validated CRE map, which is available through the NIH GEO. These findings demonstrate that PeakMatcher software will be a useful resource for validation and transferring of previous annotations to updated genome assemblies.  more » « less
Award ID(s):
1947257
PAR ID:
10276798
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Hereditas
Volume:
158
Issue:
1
ISSN:
1601-5223
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Shao, Mingfu (Ed.)
    Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca . 
    more » « less
  2. Valencia, Alfonso (Ed.)
    Abstract Motivation Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however, for most species, only the reference genome is well-annotated. Results One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously annotated reference genome. Here, we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.3% of human protein-coding genes to a chimpanzee genome assembly with 98.2% sequence identity. Availability and implementation Liftoff can be installed via bioconda and PyPI. In addition, the source code for Liftoff is available at https://github.com/agshumate/Liftoff. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Improvements in DNA sequencing technology and computational methods have led to a substantial increase in the creation of high-quality genome assemblies of many species. To understand the biology of these genomes, annotation of gene features and other functional elements is essential; however for most species, only the reference genome is well-annotated. One strategy to annotate new or improved genome assemblies is to map or ‘lift over’ the genes from a previously-annotated reference genome. Here we describe Liftoff, a new genome annotation lift-over tool capable of mapping genes between two assemblies of the same or closely-related species. Liftoff aligns genes from a reference genome to a target genome and finds the mapping that maximizes sequence identity while preserving the structure of each exon, transcript, and gene. We show that Liftoff can accurately map 99.9% of genes between two versions of the human reference genome with an average sequence identity >99.9%. We also show that Liftoff can map genes across species by successfully lifting over 98.4% of human protein-coding genes to a chimpanzee genome assembly with 98.7% sequence identity. Availability The source code for Liftoff is available at https://github.com/agshumate/Liftoff 
    more » « less
  4. Genomic characterization of Cannabis sativa has accelerated rapidly in the last decade as sequencing costs have decreased and public and private interest in the species has increased. Here, we present seven new chromosome-level haplotype-phased genomes of C. sativa. All of these genotypes were alive at the time of publication, and several have numerous years of associated phenotype data. We performed a k-mer-based pangenome analysis to contextualize these assemblies within over 200 existing assemblies. This allowed us to identify unique haplotypes and genomic diversity among Cannabis sativa genotypes. We leveraged linkage maps constructed from F2 progeny of two of the assembled genotypes to characterize the recombination rate across the genome showing strong periphery-biased recombination. Lastly, we re-aligned a bulk segregant analysis dataset for the major-effect flowering locus Early1 to several of the new assemblies to evaluate the impact of reference bias on the mapping results and narrow the locus to a smaller region of the chromosome. These new assemblies, combined with the continued propagation of the genotypes, will contribute to the growing body of genomic resources for C. sativa to accelerate future research efforts. 
    more » « less
  5. Pyhäjärvi, T (Ed.)
    Abstract Blackberries (Rubus spp.) are the fourth most economically important berry crop worldwide. Genome assemblies and annotations have been developed for Rubus species in subgenus Idaeobatus, including black raspberry (R. occidentalis), red raspberry (R. idaeus), and R. chingii, but very few genomic resources exist for blackberries and their relatives in subgenus Rubus. Here we present a chromosome-length assembly and annotation of the diploid blackberry germplasm accession “Hillquist” (R. argutus). “Hillquist” is the only known source of primocane-fruiting (annual-fruiting) in tetraploid fresh-market blackberry breeding programs and is represented in the pedigree of many important cultivars worldwide. The “Hillquist” assembly, generated using Pacific Biosciences long reads scaffolded with high-throughput chromosome conformation capture sequencing, consisted of 298 Mb, of which 270 Mb (90%) was placed on 7 chromosome-length scaffolds with an average length of 38.6 Mb. Approximately 52.8% of the genome was composed of repetitive elements. The genome sequence was highly collinear with a novel maternal haplotype-resolved linkage map of the tetraploid blackberry selection A-2551TN and genome assemblies of R. chingii and red raspberry. A total of 38,503 protein-coding genes were predicted, of which 72% were functionally annotated. Eighteen flowering gene homologs within a previously mapped locus aligning to an 11.2 Mb region on chromosome Ra02 were identified as potential candidate genes for primocane-fruiting. The utility of the “Hillquist” genome has been demonstrated here by the development of the first genotyping-by-sequencing-based linkage map of tetraploid blackberry and the identification of possible candidate genes for primocane-fruiting. This chromosome-length assembly will facilitate future studies in Rubus biology, genetics, and genomics and strengthen applied breeding programs. 
    more » « less