Graph pangenome captures missing heritability and empowers tomato breeding

Zhou, Yao; Zhang, Zhiyang; Bao, Zhigui; Li, Hongbo; Lyu, Yaqing; Zan, Yanjun; Wu, Yaoyao; Cheng, Lin; Fang, Yuhan; Wu, Kun; Zhang, Jinzhe; Lyu, Hongjun; Lin, Tao; Gao, Qiang; Saha, Surya; Mueller, Lukas; Fei, Zhangjun; Städler, Thomas; Xu, Shizhong; Zhang, Zhiwu; Speed, Doug; Huang, Sanwen

doi:10.1038/s41586-022-04808-9

Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

More Like this