skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An updated and extended version of the Melastomataceae probe set for target capture
Abstract PremiseA probe set was previously designed to target 384 nuclear loci in the Melastomataceae family; however, when trying to use it, we encountered several practical and conceptual problems, such as the presence of sequences in reverse complement, intronic regions with stop codons, and other issues. This raised concerns regarding the use of this probe set for sequence recovery in Melastomataceae. MethodsIn order to correct these issues, we cleaned the Melastomataceae probe set, extended it with additional sequences, and compared its performance with the original version. ResultsThe final probe set targets 396 putative nuclear loci represented by 6009 template sequences. The probe set has been made available, along with details on the cleaning process, for reproducibility. We show that the new probe set performs better than the original version in terms of sequence recovery. DiscussionThis updated, extended, and cleaned probe set will improve the availability of phylogenomic resources across the Melastomataceae family. It is fully compatible with sequence recovery and extraction pipelines. The cleaning process can also be applied to any plant‐targeting probe set that would need to be cleaned or updated if new genomic resources for the targeted taxa become available.  more » « less
Award ID(s):
2001357 2002270
PAR ID:
10519499
Author(s) / Creator(s):
;
Publisher / Repository:
Botanical Society of America
Date Published:
Journal Name:
Applications in Plant Sciences
ISSN:
2168-0450
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract PremiseRubiaceae is among the most species‐rich plant families, as well as one of the most morphologically and geographically diverse. Currently available phylogenies have mostly relied on few genomic and plastid loci, as opposed to large‐scale genomic data. Target enrichment provides the ability to generate sequence data for hundreds to thousands of phylogenetically informative, single‐copy loci, which often leads to improved phylogenetic resolution at both shallow and deep taxonomic scales; however, a publicly accessible Rubiaceae‐specific probe set that allows for comparable phylogenetic inference across clades is lacking. MethodsHere, we use publicly accessible genomic resources to identify putatively single‐copy nuclear loci for target enrichment in two Rubiaceae groups: tribe Hillieae (Cinchonoideae) and tribal complex Palicoureeae+Psychotrieae (Rubioideae). We sequenced 2270 exonic regions corresponding to 1059 loci in our target clades and generated in silico target enrichment sequences for other Rubiaceae taxa using our designed probe set. To test the utility of our probe set for phylogenetic inference across Rubiaceae, we performed a coalescent‐aware phylogenetic analysis using a subset of 27 Rubiaceae taxa from 10 different tribes and three subfamilies, and one outgroup in Apocynaceae. ResultsWe recovered an average of 75% and 84% of targeted exons and loci, respectively, per Rubiaceae sample. Probes designed using genomic resources from a particular subfamily were most efficient at targeting sequences from taxa in that subfamily. The number of paralogs recovered during assembly varied for each clade. Phylogenetic inference of Rubiaceae with our target regions resolves relationships at various scales. Relationships are largely consistent with previous studies of relationships in the family with high support (≥0.98 local posterior probability) at nearly all nodes and evidence of gene tree discordance. DiscussionOur probe set, which we call Rubiaceae2270x, was effective for targeting loci in species across and even outside of Rubiaceae. This probe set will facilitate phylogenomic studies in Rubiaceae and advance systematics and macroevolutionary studies in the family. 
    more » « less
  2. Abstract PremiseTarget sequence capture (Hyb‐Seq) is a cost‐effective sequencing strategy that employs RNA probes to enrich for specific genomic sequences. By targeting conserved low‐copy orthologs, Hyb‐Seq enables efficient phylogenomic investigations. Here, we present Asparagaceae1726—a Hyb‐Seq probe set targeting 1726 low‐copy nuclear genes for phylogenomics in the angiosperm family Asparagaceae—which will aid the often‐challenging delineation and resolution of evolutionary relationships within Asparagaceae. MethodsHere we describe and validate the Asparagaceae1726 probe set (https://github.com/bentzpc/Asparagaceae1726) in six of the seven subfamilies of Asparagaceae. We perform phylogenomic analyses with these 1726 loci and evaluate how inclusion of paralogs and bycatch plastome sequences can enhance phylogenomic inference with target‐enriched data sets. ResultsWe recovered at least 82% of target orthologs from all sampled taxa, and phylogenomic analyses resulted in strong support for all subfamilial relationships. Additionally, topology and branch support were congruent between analyses with and without inclusion of target paralogs, suggesting that paralogs had limited effect on phylogenomic inference. DiscussionAsparagaceae1726 is effective across the family and enables the generation of robust data sets for phylogenomics of any Asparagaceae taxon. Asparagaceae1726 establishes a standardized set of loci for phylogenomic analysis in Asparagaceae, which we hope will be widely used for extensible and reproducible investigations of diversification in the family. 
    more » « less
  3. Abstract PremiseA family‐specific probe set for sunflowers, Compositae‐1061, enables family‐wide phylogenomic studies and investigations at lower taxonomic levels, but may lack resolution at genus to species levels, especially in groups complicated by polyploidy and hybridization. MethodsWe developed a Hyb‐Seq probe set, Compositae‐ParaLoss‐1272, that targets orthologous loci in Asteraceae. We tested its efficiency across the family by simulating target enrichment sequencing in silico. Additionally, we tested its effectiveness at lower taxonomic levels in the historically complex genusPackera. We performed Hyb‐Seq with Compositae‐ParaLoss‐1272 for 19Packerataxa that were previously studied using Compositae‐1061. The resulting sequences from each probe set, plus a combination of both, were used to generate phylogenies, compare topologies, and assess node support. ResultsWe report that Compositae‐ParaLoss‐1272 captured loci across all tested Asteraceae members, had less gene tree discordance, and retained longer loci than Compositae‐1061. Most notably, Compositae‐ParaLoss‐1272 recovered substantially fewer paralogous sequences than Compositae‐1061, with only ~5% of the recovered loci reporting as paralogous, compared to ~59% with Compositae‐1061. DiscussionGiven the complexity of plant evolutionary histories, assigning orthology for phylogenomic analyses will continue to be challenging. However, we anticipate Compositae‐ParaLoss‐1272 will provide improved resolution and utility for studies of complex groups and lower taxonomic levels in the sunflower family. 
    more » « less
  4. Abstract BackgroundAdding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information with high fidelity. ResultsWe present EMMA (Extending Multiple alignments using MAFFT--add) for the problem of adding a set of unaligned sequences into a multiple sequence alignment (i.e., a constraint alignment). EMMA builds on MAFFT--add, which is also designed to add sequences into a given constraint alignment. EMMA improves on MAFFT--add methods by using a divide-and-conquer framework to scale its most accurate version, MAFFT-linsi--add, to constraint alignments with many sequences. We show that EMMA has an accuracy advantage over other techniques for adding sequences into alignments under many realistic conditions and can scale to large datasets with high accuracy (hundreds of thousands of sequences). EMMA is available athttps://github.com/c5shen/EMMA. ConclusionsEMMA is a new tool that provides high accuracy and scalability for adding sequences into an existing alignment. 
    more » « less
  5. Robinson, Peter (Ed.)
    Abstract MotivationThe Jaccard similarity on k-mer sets has shown to be a convenient proxy for sequence identity. By avoiding expensive base-level alignments and comparing reduced sequence representations, tools such as MashMap can scale to massive numbers of pairwise comparisons while still providing useful similarity estimates. However, due to their reliance on minimizer winnowing, previous versions of MashMap were shown to be biased and inconsistent estimators of Jaccard similarity. This directly impacts downstream tools that rely on the accuracy of these estimates. ResultsTo address this, we propose the minmer winnowing scheme, which generalizes the minimizer scheme by use of a rolling minhash with multiple sampled k-mers per window. We show both theoretically and empirically that minmers yield an unbiased estimator of local Jaccard similarity, and we implement this scheme in an updated version of MashMap. The minmer-based implementation is over 10 times faster than the minimizer-based version under the default ANI threshold, making it well-suited for large-scale comparative genomics applications. Availability and implementationMashMap3 is available at https://github.com/marbl/MashMap. 
    more » « less