Highly accurate long-read HiFi sequencing data for five complex genomes

Hon, Ting; Mars, Kristin; Young, Greg; Tsai, Yu-Chih (ORCID:0000000229580278); Karalius, Joseph W. (ORCID:0000000335921339); Landolin, Jane M.; Maurer, Nicholas; Kudrna, David; Hardigan, Michael A.; Steiner, Cynthia C.; Knapp, Steven J. (ORCID:0000000164985409); Ware, Doreen (ORCID:0000000281253821); Shapiro, Beth (ORCID:0000000227337776); Peluso, Paul; Rank, David R. (ORCID:0000000192136965)

doi:10.1038/s41597-020-00743-4

Citation Details

Highly accurate long-read HiFi sequencing data for five complex genomes

Abstract

The PacBio^®HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomesMus musculusandZea mays, as well as two complex genomes, octoploidFragaria × ananassaand the diploid anuranRana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

NSF-PAR ID:: 10202066

Author(s) / Creator(s):: Hon, Ting; Mars, Kristin; Young, Greg; Tsai, Yu-Chih; Karalius, Joseph W.; Landolin, Jane M.; Maurer, Nicholas; Kudrna, David; Hardigan, Michael A.; Steiner, Cynthia C.; Knapp, Steven J.; Ware, Doreen; Shapiro, Beth; Peluso, Paul; Rank, David R.

Publisher / Repository:: Nature Publishing Group

Date Published:: 2020-11-17

Journal Name:: Scientific Data

Volume:: 7

Issue:: 1

ISSN:: 2052-4463

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1038/s41597-020-00743-4

More Like this