Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1,2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3,4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.more » « less
-
null (Ed.)Abstract House mice from 4 replicate lines selectively bred for 61 generations for voluntary wheel-running behavior were compared with 4 non-selected control lines using multiple genome-wide analytical techniques on both haplotype and single nucleotide polymorphism data......more » « less
-
Abstract The interactions among genes and between genes and environment contribute significantly to the phenotypic variation of complex traits and may be possible explanations for missing heritability. However, to our knowledge no existing tool can address the two kinds of interactions. Here we propose a novel linear mixed model that considers not only the additive effects of biological markers but also the interaction effects of marker pairs. Interaction effect is demonstrated as a 2D association. Based on this linear mixed model, we developed a pipeline, namely PATOWAS. PATOWAS can be used to study transcriptome-wide and metabolome-wide associations in addition to genome-wide associations. Our case analysis with real rice recombinant inbred lines (RILs) at three omics levels demonstrates that 2D association mapping and integrative omics are able to provide a systems biology view into the analyzed traits, leading toward an answer about how genes, transcripts, proteins, and metabolites work together to produce an observable phenotype.
-
Abstract Mendel's law of segregation explains why genetic variation can be maintained over time. In diploid organisms, an offspring receives one allele from each parent, not just half of the blended genetic material of the parents. Which of the two alleles is received is purely random. This stochastic process generates genetic variation among members of the same family, called Mendelian segregation variance or within‐family variance. In statistics, the genetic value of a quantitative trait for an offspring follows a mixture distribution consisting of the four alleles of the two parents, guided by a Mendelian variable from each parent. The mixture model allows us to partition the total genetic variance into between‐family and within‐family variances. In the absence of inbreeding, the genetic variance splits half to the between‐family variance and half to the within‐family variance. With inbreeding, however, the between‐family variance is increased at the cost of the within‐family variance, leading to a net increase of the total genetic variance. This study defines multiple Mendelian variables and develops a mixture model of quantitative genetics. The phenomenon that allelic variance is maintained over time is guided by “the law of conservation of allelic variance” in biology, which is comparable to “the law of conservation of mass” in physics.
-
Abstract Motivation Joint reconstruction of multiple gene regulatory networks (GRNs) using gene expression data from multiple tissues/conditions is very important for understanding common and tissue/condition-specific regulation. However, there are currently no computational models and methods available for directly constructing such multiple GRNs that not only share some common hub genes but also possess tissue/condition-specific regulatory edges.
Results In this paper, we proposed a new graphic Gaussian model for joint reconstruction of multiple gene regulatory networks (JRmGRN), which highlighted hub genes, using gene expression data from several tissues/conditions. Under the framework of Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log likelihood function. We formulated it as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. The performance of JRmGRN was first evaluated with synthetic data and the results showed that JRmGRN outperformed several other methods for reconstruction of GRNs. We also applied our method to real Arabidopsis thaliana RNA-seq data from two light regime conditions in comparison with other methods, and both common hub genes and some conditions-specific hub genes were identified with higher accuracy and precision.
Availability and implementation JRmGRN is available as a R program from: https://github.com/wenpingd.
Supplementary information Supplementary data are available at Bioinformatics online.
-
Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators.
Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO.
Supplementary information Supplementary data are available at Bioinformatics online.
-
Summary Hybrid breeding is the main strategy for improving productivity in many crops, especially in rice and maize. Genomic hybrid breeding is a technology that uses whole‐genome markers to predict future hybrids. Predicted superior hybrids are then field evaluated and released as new hybrid cultivars after their superior performances are confirmed. This will increase the opportunity of selecting true superior hybrids with minimum costs. Here, we used genomic best linear unbiased prediction to perform hybrid performance prediction using an existing rice population of 1495 hybrids. Replicated 10‐fold cross‐validations showed that the prediction abilities on ten agronomic traits ranged from 0.35 to 0.92. Using the 1495 rice hybrids as a training sample, we predicted six agronomic traits of 100 hybrids derived from half diallel crosses involving 21 parents that are different from the parents of the hybrids in the training sample. The prediction abilities were relatively high, varying from 0.54 (yield) to 0.92 (grain length). We concluded that the current population of 1495 hybrids can be used to predict hybrids from seemingly unrelated parents. Eventually, we used this training population to predict all potential hybrids of cytoplasm male sterile lines from 3000 rice varieties from the 3K Rice Genome Project. Using a breeding index combining 10 traits, we identified the top and bottom 200 predicted hybrids. SNP genotypes of the training population and parameters estimated from this training population are available for general uses and further validation in genomic hybrid prediction of all potential hybrids generated from all varieties of rice.