NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Graph pangenome captures missing heritability and empowers tomato breeding

https://doi.org/10.1038/s41586-022-04808-9

Zhou, Yao; Zhang, Zhiyang; Bao, Zhigui; Li, Hongbo; Lyu, Yaqing; Zan, Yanjun; Wu, Yaoyao; Cheng, Lin; Fang, Yuhan; Wu, Kun; et al (June 2022, Nature)

Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.
more » « less
Full Text Available
Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize

https://doi.org/10.1101/gr.266528.120

Song, Baoxing; Buckler, Edward S.; Wang, Hai; Wu, Yaoyao; Rees, Evan; Kellogg, Elizabeth A.; Gates, Daniel J.; Khaipho-Burch, Merritt; Bradbury, Peter J.; Ross-Ibarra, Jeffrey; et al (July 2021, Genome Research)
null (Ed.)
Full Text Available
A multiple alignment workflow shows the effect of repeat masking and parameter tuning on alignment in plants

https://doi.org/10.1002/tpg2.20204

Wu, Yaoyao; Johnson, Lynn; Song, Baoxing; Romay, Cinta; Stitzer, Michelle; Siepel, Adam; Buckler, Edward; Scheben, Armin (April 2022, The Plant Genome)

Abstract Alignments of multiple genomes are a cornerstone of comparative genomics, but generating these alignments remains technically challenging and often impractical. We developed themsa_pipelineworkflow (https://bitbucket.org/bucklerlab/msa_pipeline) to allow practical and sensitive multiple alignment of diverged plant genomes and calculation of conservation scores with minimal user inputs. As high repeat content and genomic divergence are substantial challenges in plant genome alignment, we also explored the effect of different masking approaches and parameters of the LAST aligner using genome assemblies of 33 grass species. Compared with conventional masking with RepeatMasker, a masking approach based onk‐mers (nucleotide sequences ofklength) increased the alignment rate of coding sequence and noncoding functional regions by 25 and 14%, respectively. We further found that default alignment parameters generally perform well, but parameter tuning can increase the alignment rate for noncoding functional regions by over 52% compared with default LAST settings. Finally, by increasing alignment sensitivity from the default baseline, parameter tuning can increase the number of noncoding sites that can be scored for conservation by over 76%. Overall, tuning of masking and alignment parameters can generate optimized multiple alignments to drive biological discovery in plants.
more » « less

Search for: All records