Efficient assembly of nanopore reads via highly accurate and intact error correction

Chen, Ying; Nie, Fan; Xie, Shang-Qian; Zheng, Ying-Feng; Dai, Qi; Bray, Thomas; Wang, Yao-Xin; Xing, Jian-Feng; Huang, Zhi-Jian (ORCID:0000000316013802); Wang, De-Peng; He, Li-Juan; Luo, Feng (ORCID:0000000248132403); Wang, Jian-Xin (ORCID:0000000315160480); Liu, Yi-Zhi; Xiao, Chuan-Le (ORCID:0000000246800682)

doi:10.1038/s41467-020-20236-7

Citation Details

Efficient assembly of nanopore reads via highly accurate and intact error correction

Abstract

Long nanopore reads are advantageous in de novo genome assembly. However, nanopore reads usually have broad error distribution and high-error-rate subsequences. Existing error correction tools cannot correct nanopore reads efficiently and effectively. Most methods trim high-error-rate subsequences during error correction, which reduces both the length of the reads and contiguity of the final assembly. Here, we develop an error correction, and de novo assembly tool designed to overcome complex errors in nanopore reads. We propose an adaptive read selection and two-step progressive method to quickly correct nanopore reads to high accuracy. We introduce a two-stage assembler to utilize the full length of nanopore reads. Our tool achieves superior performance in both error correction and de novo assembling nanopore reads. It requires only 8122 hours to assemble a 35X coverage human genome and achieves a 2.47-fold improvement in NG50. Furthermore, our assembly of the human WERI cell line shows an NG50 of 22 Mbp. The high-quality assembly of nanopore reads can significantly reduce false positives in structure variation detection.

Award ID(s):: 1759856

NSF-PAR ID:: 10208577

Author(s) / Creator(s):: Chen, Ying; Nie, Fan; Xie, Shang-Qian; Zheng, Ying-Feng; Dai, Qi; Bray, Thomas; Wang, Yao-Xin; Xing, Jian-Feng; Huang, Zhi-Jian; Wang, De-Peng; He, Li-Juan; Luo, Feng; Wang, Jian-Xin; Liu, Yi-Zhi; Xiao, Chuan-Le

Publisher / Repository:: Nature Publishing Group

Date Published:: 2021-01-04

Journal Name:: Nature Communications

Volume:: 12

Issue:: 1

ISSN:: 2041-1723

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1038/s41467-020-20236-7

More Like this