Semi-automated assembly of high-quality diploid human reference genomes

Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A.; Carnevali, Paolo; Chaisson, Mark J.; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S.; Fulton, Lucinda L.; Garg, Shilpa; Gerton, Jennifer L.; Ghurye, Jay; Granat, Anastasiya; Green, Richard E.; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B.; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O.; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W.; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E.; Olson, Nathan D.; Pesout, Trevor; Potapova, Tamara; Puiu, Daniela; Regier, Allison; Ruan, Jue; Salzberg, Steven L.; Sanders, Ashley D.; Schatz, Michael C.; Schmitt, Anthony; Schneider, Valerie A.; Selvaraj, Siddarth; Shafin, Kishwar; Shumate, Alaina; Stitziel, Nathan O.; Stober, Catherine; Torrance, James; Wagner, Justin; Wang, Jianxin; Wenger, Aaron; Xiao, Chuanle; Zimin, Aleksey V.; Zhang, Guojie; Wang, Ting; Li, Heng; Garrison, Erik; Haussler, David; Hall, Ira; Zook, Justin M.; Eichler, Evan E.; Phillippy, Adam M.; Paten, Benedict; Howe, Kerstin; Miga, Karen H.

doi:10.1038/s41586-022-05325-5

Citation Details

Semi-automated assembly of high-quality diploid human reference genomes

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements. more »

Award ID(s):: 1350041 1920103

PAR ID:: 10389480

Author(s) / Creator(s):: Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A.; Carnevali, Paolo; Chaisson, Mark J.; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly more » « less

Date Published:: 2022-11-17

Journal Name:: Nature

Volume:: 611

Issue:: 7936

ISSN:: 0028-0836

Page Range / eLocation ID:: 519 to 531

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1038/s41586-022-05325-5

More Like this