skip to main content

Title: On the potential of Angiosperms353 for population genomic studies
PREMISE The successful application of universal targeted sequencing markers, such as those developed for the Angiosperms353 probe set, within populations could reduce or eliminate the need for specific marker development, while retaining the benefits of full-gene sequences in population-level analyses. However, whether the Angiosperms353 markers provide sufficient variation within species to calculate demographic parameters is untested. METHODS Using herbarium specimens from a 50-year-old floristic survey in Texas, we sequenced 95 samples from 24 species using the Angiosperms353 probe set. Our data workflow calls variants within species and prepares data for population genetic analysis using standard metrics. In our case study, gene recovery was affected by genomic library concentration only at low concentrations and displayed limited phylogenetic bias. RESULTS We identified over 1000 segregating variants with zero missing data for 92% of species and demonstrate that Angiosperms353 markers contain sufficient variation to estimate pairwise nucleotide diversity (π)—typically between 0.002 and 0.010, with most variation found in flanking non-coding regions. In a subset of variants that were filtered to reduce linkage, we uncovered high heterozygosity in many species, suggesting that denser sampling within species should permit estimation of gene flow and population dynamics. DISCUSSION Angiosperms353 should benefit conservation genetic studies by providing universal repeatable markers, low missing data, and more » haplotype information, while permitting inclusion of decades-old herbarium specimens. « less
; ; ;
Award ID(s):
1753800 1902078
Publication Date:
Journal Name:
Applications in Plant Sciences
Sponsoring Org:
National Science Foundation
More Like this
  1. Background

    Vestimentiferan tubeworms are some of the most recognizable fauna found at deep-sea cold seeps, isolated environments where hydrocarbon rich fluids fuel biological communities. Several studies have investigated tubeworm population structure; however, much is still unknown about larval dispersal patterns at Gulf of Mexico (GoM) seeps. As such, researchers have applied microsatellite markers as a measure for documenting the transport of vestimentiferan individuals. In the present study, we investigate the utility of microsatellites to be cross-amplified within the escarpiid clade of seep vestimentiferans, by determining if loci originally developed forEscarpiaspp. could be amplified in the GoM seep tubeworm,Seepiophila jonesi. Additionally, we determine if cross-amplified loci can reliably uncover the same signatures of high gene flow seen in a previous investigation ofS. jonesi.


    Seventy-sevenS. jonesiindividuals were collected from eight seep sites across the upper Louisiana slope (<1,000 m) in the GoM. Forty-eight microsatellite loci that were originally developed forEscarpia laminata(18 loci) andEscarpia southwardae(30 loci) were tested to determine if they were homologous and polymorphic inS. jonesi. Loci found to be both polymorphic and of high quality were used to test for significant population structuring inS. jonesi.


    Microsatellite pre-screening identified 13 (27%) of theEscarpialoci were homologous and polymorphic inS. jonesi, revealing that microsatellitesmore »can be amplified within the escarpiid clade of vestimentiferans. Our findings uncovered low levels of heterozygosity and a lack of genetic differentiation amongstS. jonesifrom various sites and regions, in line with previous investigations that employed species-specific polymorphic loci onS. jonesiindividuals retrieved from both the same and different seep sites. The lack of genetic structure identified from these populations supports the presence of significant gene flow via larval dispersal in mixed oceanic currents.


    The ability to develop “universal” microsatellites reduces the costs associated with these analyses and allows researchers to track and investigate a wider array of taxa, which is particularly useful for organisms living at inaccessible locations such as the deep sea. Our study highlights that non-species specific microsatellites can be amplified across large evolutionary distances and still yield similar findings as species-specific loci. Further, these results show thatS. jonesicollected from various localities in the GoM represents a single panmictic population, suggesting that dispersal of lecithotrophic larvae by deep sea currents is sufficient to homogenize populations. These data are consistent with the high levels of gene flow seen inEscarpiaspp., which advocates that differences in microhabitats of seep localities lead to variation in biogeography of separate species.

    « less
  2. Charleston, Michael (Ed.)
    Abstract We present a 517-gene phylogenetic framework for the breadfruit genus Artocarpus (ca. 70 spp., Moraceae), making use of silica-dried leaves from recent fieldwork and herbarium specimens (some up to 106 years old) to achieve 96% taxon sampling. We explore issues relating to assembly, paralogous loci, partitions, and analysis method to reconstruct a phylogeny that is robust to variation in data and available tools. Although codon partitioning did not result in any substantial topological differences, the inclusion of flanking noncoding sequence in analyses significantly increased the resolution of gene trees. We also found that increasing the size of data sets increased convergence between analysis methods but did not reduce gene-tree conflict. We optimized the HybPiper targeted-enrichment sequence assembly pipeline for short sequences derived from degraded DNA extracted from museum specimens. Although the subgenera of Artocarpus were monophyletic, revision is required at finer scales, particularly with respect to widespread species. We expect our results to provide a basis for further studies in Artocarpus and provide guidelines for future analyses of data sets based on target enrichment data, particularly those using sequences from both fresh and museum material, counseling careful attention to the potential of off-target sequences to improve resolution. [Artocarpus; Moraceae;more »noncoding sequences; phylogenomics; target enrichment.]« less
  3. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implementedmore »a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx.« less
  4. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.


    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.


    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibratedmore »Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (

    « less
  5. Abstract

    As genomic-scale data sets become economically feasible for most organisms, a key question for conservation biology is whether the increased resolution offered by new genomic approaches justifies repeating earlier studies based on traditional markers, rather than investing those same time and monetary resources in less-known species. Genomic studies offer clear advantages when the objective is to identify adaptive loci that may be critical to conservation policy-makers. However, the answer is far less certain for the population and landscape studies based on neutral loci that dominate the conservation genetics research agenda. We used Restriction-site Associated DNA sequencing (RADseq) to revisit earlier molecular studies of the IUCN Critically Endangered Magdalena River turtle (Podocnemis lewyana), documenting the conservation insights gained by increasing the number of neutral markers by several orders of magnitude. Earlier research indicated that P. lewyana has the lowest genetic diversity known for any chelonian, and little or no population differentiation among independent rivers. In contrast, the RADseq data revealed discrete population structure with isolation-by-distance within river segments and identified precise population breaks clearly delineating management units. It also confirmed that the species does not have extremely low heterozygosity and that effective population sizes are probably sufficient to maintain long-termmore »evolutionary potential. Contrary to earlier inferences from more limited population genetic markers, our genomic data suggest that management strategies should shift from active genetic rescue to more passive protection without extreme interventions. We conclude with a list of examples of conservation studies in other vertebrates indicating that for many systems a genomic update is worth the investment.

    « less