skip to main content

Search for: All records

Award ID contains: 1822330

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.


    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.


    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (

    more » « less
  2. Societal Impact Statement

    The current rate of global biodiversity loss creates a pressing need to increase efficiency and throughput of extinction risk assessments in plants. We must assess as many plant species as possible, working with imperfect knowledge, to address the habitat loss and extinction threats of the Anthropocene. Using the biodiversity database, Botanical Information and Ecology Network (BIEN), and the Andropogoneae grass tribe as a case study, we demonstrate that large‐scale, preliminary conservation assessments can play a fundamental role in accelerating plant conservation pipelines and setting priorities for more in‐depth investigations.


    The International Union for the Conservation of Nature (IUCN) Red List criteria are widely used to determine extinction risks of plant and animal life. Here, we used The Red List's criterion B, Geographic Range Size, to provide preliminary conservation assessments of the members of a large tribe of grasses, the Andropogoneae, with ~1100 species, including maize, sorghum, and sugarcane and their wild relatives.

    We used georeferenced occurrence data from the Botanical Information and Ecology Network (BIEN) and automated individual species assessments using ConR to demonstrate efficacy and accuracy in using time‐saving tools for conservation research. We validated our results with those from the IUCN‐recommended assessment tool, GeoCAT.

    We discovered a remarkably large gap in digitized information, with slightly more than 50% of the Andropogoneae lacking sufficient information for assessment. ConR and GeoCAT largely agree on which taxa are of least concern (>90%) or possibly threatened (<10%), highlighting that automating assessments with ConR is a viable strategy for preliminary conservation assessments of large plant groups. Results for crop wild relatives are similar to those for the entire dataset.

    Increasing digitization and collection needs to be a high priority. Available rapid assessment tools can then be used to identify species that warrant more comprehensive investigation.

    more » « less
  3. Abstract

    Poa pratensis, commonly known as Kentucky bluegrass, is a popular cool-season grass species used as turf in lawns and recreation areas globally. Despite its substantial economic value, a reference genome had not previously been assembled due to the genome’s relatively large size and biological complexity that includes apomixis, polyploidy, and interspecific hybridization. We report here a fortuitous de novo assembly and annotation of a P. pratensis genome. Instead of sequencing the genome of a C4 grass, we accidentally sampled and sequenced tissue from a weedy P. pratensis whose stolon was intertwined with that of the C4 grass. The draft assembly consists of 6.09 Gbp with an N50 scaffold length of 65.1 Mbp, and a total of 118 scaffolds, generated using PacBio long reads and Bionano optical map technology. We annotated 256K gene models and found 58% of the genome to be composed of transposable elements. To demonstrate the applicability of the reference genome, we evaluated population structure and estimated genetic diversity in P. pratensis collected from three North American prairies, two in Manitoba, Canada and one in Colorado, USA. Our results support previous studies that found high genetic diversity and population structure within the species. The reference genome and annotation will be an important resource for turfgrass breeding and study of bluegrasses.

    more » « less
  4. Abstract

    Alignments of multiple genomes are a cornerstone of comparative genomics, but generating these alignments remains technically challenging and often impractical. We developed themsa_pipelineworkflow ( to allow practical and sensitive multiple alignment of diverged plant genomes and calculation of conservation scores with minimal user inputs. As high repeat content and genomic divergence are substantial challenges in plant genome alignment, we also explored the effect of different masking approaches and parameters of the LAST aligner using genome assemblies of 33 grass species. Compared with conventional masking with RepeatMasker, a masking approach based onk‐mers (nucleotide sequences ofklength) increased the alignment rate of coding sequence and noncoding functional regions by 25 and 14%, respectively. We further found that default alignment parameters generally perform well, but parameter tuning can increase the alignment rate for noncoding functional regions by over 52% compared with default LAST settings. Finally, by increasing alignment sensitivity from the default baseline, parameter tuning can increase the number of noncoding sites that can be scored for conservation by over 76%. Overall, tuning of masking and alignment parameters can generate optimized multiple alignments to drive biological discovery in plants.

    more » « less
  5. Summary

    Inflorescence architecture in plants is often complex and challenging to quantify, particularly for inflorescences of cereal grasses. Methods for capturing inflorescence architecture and for analyzing the resulting data are limited to a few easily captured parameters that may miss the rich underlying diversity.

    Here, we apply X‐ray computed tomography combined with detailed morphometrics, offering new imaging and computational tools to analyze three‐dimensional inflorescence architecture. To show the power of this approach, we focus on the panicles ofSorghum bicolor, which vary extensively in numbers, lengths, and angles of primary branches, as well as the three‐dimensional shape, size, and distribution of the seed.

    We imaged and comprehensively evaluated the panicle morphology of 55 sorghum accessions that represent the five botanical races in the most common classification system of the species, defined by genetic data. We used our data to determine the reliability of the morphological characters for assigning specimens to race and found that seed features were particularly informative.

    However, the extensive overlap between botanical races in multivariate trait space indicates that the phenotypic range of each group extends well beyond its overall genetic background, indicating unexpectedly weak correlation between morphology, genetic identity, and domestication history.

    more » « less
  6. Free, publicly-accessible full text available December 31, 2024
  7. The origins of maize were the topic of vigorous debate for nearly a century, but neither the current genetic model nor earlier archaeological models account for the totality of available data, and recent work has highlighted the potential contribution of a wild relative,Zea maysssp.mexicana. Our population genetic analysis reveals that the origin of modern maize can be traced to an admixture between ancient maize andZea maysssp.mexicanain the highlands of Mexico some 4000 years after domestication began. We show that variation in admixture is a key component of maize diversity, both at individual loci and for additive genetic variation underlying agronomic traits. Our results clarify the origin of modern maize and raise new questions about the anthropogenic mechanisms underlying dispersal throughout the Americas.

    more » « less
    Free, publicly-accessible full text available December 1, 2024
  8. Free, publicly-accessible full text available September 1, 2024
  9. Qu, Li-Jia (Ed.)

    Pleiotropy—when a single gene controls two or more seemingly unrelated traits—has been shown to impact genes with effects on flowering time, leaf architecture, and inflorescence morphology in maize. However, the genome-wide impact of biological pleiotropy across all maize phenotypes is largely unknown. Here, we investigate the extent to which biological pleiotropy impacts phenotypes within maize using GWAS summary statistics reanalyzed from previously published metabolite, field, and expression phenotypes across the Nested Association Mapping population and Goodman Association Panel. Through phenotypic saturation of 120,597 traits, we obtain over 480 million significant quantitative trait nucleotides. We estimate that only 1.56–32.3% of intervals show some degree of pleiotropy. We then assess the relationship between pleiotropy and various biological features such as gene expression, chromatin accessibility, sequence conservation, and enrichment for gene ontology terms. We find very little relationship between pleiotropy and these variables when compared to permuted pleiotropy. We hypothesize that biological pleiotropy of common alleles is not widespread in maize and is highly impacted by nuisance terms such as population structure and linkage disequilibrium. Natural selection on large standing natural variation in maize populations may target wide and large effect variants, leaving the prevalence of detectable pleiotropy relatively low.

    more » « less