skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Syotti: scalable bait design for DNA enrichment
Abstract MotivationBait enrichment is a protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes (‘baits’) are designed, manufactured and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. Metsky et al. demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. ResultsWe formalize the problem of designing baits by defining the Minimum Bait Cover problem, show that the problem is NP-hard even under very restrictive assumptions, and design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 min to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 17% of the data in 72 h. Availability and implementationhttps://github.com/jnalanko/syotti. Supplementary informationSupplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
2118251 2118252
PAR ID:
10368242
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
Supplement_1
ISSN:
1367-4803
Format(s):
Medium: X Size: p. i177-i184
Size(s):
p. i177-i184
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationMetagenomic binning aims to retrieve microbial genomes directly from ecosystems by clustering metagenomic contigs assembled from short reads into draft genomic bins. Traditional shotgun-based binning methods depend on the contigs’ composition and abundance profiles and are impaired by the paucity of enough samples to construct reliable co-abundance profiles. When applied to a single sample, shotgun-based binning methods struggle to distinguish closely related species only using composition information. As an alternative binning approach, Hi-C-based binning employs metagenomic Hi-C technique to measure the proximity contacts between metagenomic fragments. However, spurious inter-species Hi-C contacts inevitably generated by incorrect ligations of DNA fragments between species link the contigs from varying genomes, weakening the purity of final draft genomic bins. Therefore, it is imperative to develop a binning pipeline to overcome the shortcomings of both types of binning methods on a single sample. ResultsWe develop HiFine, a novel binning pipeline to refine the binning results of metagenomic contigs by integrating both Hi-C-based and shotgun-based binning tools. HiFine designs a strategy of fragmentation for the original bin sets derived from the Hi-C-based and shotgun-based binning methods, which considerably increases the purity of initial bins, followed by merging fragmented bins and recruiting unbinned contigs. We demonstrate that HiFine significantly improves the existing binning results of both types of binning methods and achieves better performance in constructing species genomes on publicly available datasets. To the best of our knowledge, HiFine is the first pipeline to integrate different types of tools for the binning of metagenomic contigs. Availability and implementationHiFine is available at https://github.com/dyxstat/HiFine. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract PremiseThe genusCalceolaria(Calceolariaceae) is emblematic of the Andes, is hypothesized to have originated as a recent, rapid radiation, and has important taxonomic needs. Additionally, the genus is a model for the study of specialized pollination systems, as its flowers are nectarless and many offer floral oils as a pollination reward collected by specialist bees. Despite their evolutionary and ecological significance, obtaining a resolved phylogeny for the group has proved difficult. To address this challenge, we present a new bait set for targeted sequencing of nuclear loci in Calceolariaceae and close relatives. MethodsWe developed a bioinformatic workflow to use incomplete, low‐coverage genomes of 10Calceolariaspecies to identify single‐copy loci suitable for phylogenetic studies and design baits for targeted sequencing. ResultsOur approach resulted in the identification of 809 single‐copy loci (733 noncoding and 76 coding regions) and the development of 39,937 baits, which we validated in silico (10 specimens) and in vitro (29 Calceolariaceae and six outgroups). In both cases, the data allowed us to recover robust phylogenetic estimates. DiscussionOur results demonstrate the appropriateness of the bait set for sequencing recent and historic specimens of Calceolariaceae and close relatives, and open new doors for further investigation of the evolutionary history of this hyperdiverse genus. 
    more » « less
  3. Abstract MotivationAs genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree. ResultsWe study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the given set of gene trees. We extend a seminal algorithm by Brodal et al. to solve this problem in quasi-linear time. In simulated studies and on a large empirical data, we show that completion of gene trees using other gene trees is relatively accurate and, unlike the case where a species tree is available, is unbiased. Availability and implementationOur method, tripVote, is available at https://github.com/uym2/tripVote. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  4. ABSTRACT Alluaud's little yellow ant,Plagiolepis alluaudiEmery 1894, (Hymenoptera: Formicidae), is an emerging nuisance species in floriculture and residential areas around the globe. Originally described from Madagascar, it ranks among the smallest widespread formicine pests. To date, no evaluations of management protocols for this species have been reported. In ants, feeding preference is related to ant body size and viscosity and nutritional content of the food source. Optimizing these factors could lead to improved bait performance. To assess population management implications of various bait parameters on a small pest ant species, four commercial ant baits of varying viscosities, active ingredient (AI) group and concentration, and nutritional content were evaluated in laboratory and field assays againstP. alluaudi. All four products negatively affectedP. alluaudisurvival compared to the untreated control, and all products were associated with greater visitation compared to the control, suggesting all AIs tested are viable candidates forP. alluaudimanagement. However, their direct use for population management in the field may be limited, as feeding cessation was eventually observed on all four baits. When baits were diluted with water, viscosity was reduced and survival was initially higher compared to with undiluted baits. However, similarly low levels of survival were maintained over time. Most importantly, we found in a 2‐year observational field study involving sustained baiting within an infested structure that only the bait formulation with the lowest overall viscosity was able to alleviateP. alluaudinuisance indoors. Our results suggest that diluting baits may be a viable strategy for targeting very small pest ant species, and the greater time to lethality of diluted baits, resulting from reduced toxicant concentration, may be a reasonable trade‐off allowing smaller ant species to continue feeding for a sufficient duration on a bait formulation. 
    more » « less
  5. Abstract PremiseThe preservation of plant tissues in ethanol is conventionally viewed as problematic. Here, we show that leaf preservation in ethanol combined with proteinase digestion can provide high‐quality DNA extracts. Additionally, as a pretreatment, ethanol can facilitate DNA extraction for recalcitrant samples. MethodsDNA was isolated from leaves preserved with 96% ethanol or from silica‐desiccated leaf samples and herbarium fragments that were pretreated with ethanol. DNA was extracted from herbarium tissues using a special ethanol pretreatment protocol, and these extracts were compared with those obtained using the standard cetyltrimethylammonium bromide (CTAB) method. ResultsDNA extracted from tissue preserved in, or pretreated with, ethanol was less fragmented than DNA from tissues without pretreatment. Adding proteinase digestion to the lysis step increased the amount of DNA obtained from the ethanol‐pretreated tissues. The combination of the ethanol pretreatment with liquid nitrogen freezing and a sorbitol wash prior to cell lysis greatly improved the quality and yield of DNA from the herbarium tissue samples. DiscussionThis study critically reevaluates the consequences of ethanol for plant tissue preservation and expands the utility of pretreatment methods for molecular and phylogenomic studies. 
    more » « less