Abstract BackgroundGiven a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a “similar sequence”. Traditionally, “similar sequence” was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. ResultsIn this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in$$\mathcal {O} (|t| + |p| + \ell ^2)$$ time and$$\mathcal {O} (\ell \log \ell )$$ space, where |t| is the number of$$k$$ -mers inside the sketch of the reference, |p| is the number of$$k$$ -mers inside the read’s sketch and$$\ell$$ is the number of times that$$k$$ -mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm’s performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.
more »
« less
Maptcha: an efficient parallel workflow for hybrid genome scaffolding
Abstract BackgroundGenome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome. ResultsIn this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called , is aimed at generating long scaffolds of a target genome, from two sets of input sequences—an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a$$\langle $$ contig,contig$$\rangle $$ graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic “wiring” heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds. ConclusionsOur experiments with on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of to generate significantly longer scaffolds in low coverage settings ($$1\times $$ –$$10\times $$ ).
more »
« less
- PAR ID:
- 10531940
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Volume:
- 25
- Issue:
- 1
- ISSN:
- 1471-2105
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In this paper we prove a higher dimensional analogue of Carleson’s$$\varepsilon ^{2}$$ conjecture. Given two arbitrary disjoint Borel sets$$\Omega ^{+},\Omega ^{-}\subset \mathbb{R}^{n+1}$$ , and$$x\in \mathbb{R}^{n+1}$$ ,$$r>0$$ , we denote$$ \varepsilon _{n}(x,r) := \frac{1}{r^{n}}\, \inf _{H^{+}} \mathcal{H}^{n} \left ( ((\partial B(x,r)\cap H^{+}) \setminus \Omega ^{+}) \cup (( \partial B(x,r)\cap H^{-}) \setminus \Omega ^{-})\right ), $$ where the infimum is taken over all open affine half-spaces$$H^{+}$$ such that$$x \in \partial H^{+}$$ and we define$$H^{-}= \mathbb{R}^{n+1} \setminus \overline{H^{+}}$$ . Our first main result asserts that the set of points$$x\in \mathbb{R}^{n+1}$$ where$$ \int _{0}^{1} \varepsilon _{n}(x,r)^{2} \, \frac{dr}{r}< \infty $$ is$$n$$ -rectifiable. For our second main result we assume that$$\Omega ^{+}$$ ,$$\Omega ^{-}$$ are open and that$$\Omega ^{+}\cup \Omega ^{-}$$ satisfies the capacity density condition. For each$$x \in \partial \Omega ^{+} \cup \partial \Omega ^{-}$$ and$$r>0$$ , we denote by$$\alpha ^{\pm }(x,r)$$ the characteristic constant of the (spherical) open sets$$\Omega ^{\pm }\cap \partial B(x,r)$$ . We show that, up to a set of$$\mathcal{H}^{n}$$ measure zero,$$x$$ is a tangent point for both$$\partial \Omega ^{+}$$ and$$\partial \Omega ^{-}$$ if and only if$$ \int _{0}^{1} \min (1,\alpha ^{+}(x,r) + \alpha ^{-}(x,r) -2) \frac{dr}{r} < \infty . $$ The first result is new even in the plane and the second one improves and extends to higher dimensions the$$\varepsilon ^{2}$$ conjecture of Carleson.more » « less
-
AbstractWe develop a two-timing perturbation analysis to provide quantitative insights on the existence of temporal ratchets in an exemplary system of a particle moving in a tank of fluid in response to an external vibration of the tank. We consider two-mode vibrations with angular frequencies$$\omega $$ and$$\alpha \omega $$ , where$$\alpha $$ is a rational number. If$$\alpha $$ is a ratio of odd and even integers (e.g.,$$\tfrac{2}{1},\,\tfrac{3}{2},\,\tfrac{4}{3}$$ ), the system yields a net response: here, a nonzero time-average particle velocity. Our first-order perturbation solution predicts the existence of temporal ratchets for$$\alpha =2$$ . Furthermore, we demonstrate, for a reduced model, that the temporal ratcheting effect for$$\alpha =\tfrac{3}{2}$$ and$$\tfrac{4}{3}$$ appears at the third-order perturbation solution. More importantly, we find closed-form formulas for the magnitude and direction of the induced net velocities for these$$\alpha $$ values. On a broader scale, our methodology offers a new mathematical approach to study the complicated nature of temporal ratchets in physical systems. Graphic abstractmore » « less
-
Abstract We consider integral area-minimizing 2-dimensional currents$$T$$ in$$U\subset \mathbf {R}^{2+n}$$ with$$\partial T = Q\left [\!\![{\Gamma }\right ]\!\!]$$ , where$$Q\in \mathbf {N} \setminus \{0\}$$ and$$\Gamma $$ is sufficiently smooth. We prove that, if$$q\in \Gamma $$ is a point where the density of$$T$$ is strictly below$$\frac{Q+1}{2}$$ , then the current is regular at$$q$$ . The regularity is understood in the following sense: there is a neighborhood of$$q$$ in which$$T$$ consists of a finite number of regular minimal submanifolds meeting transversally at$$\Gamma $$ (and counted with the appropriate integer multiplicity). In view of well-known examples, our result is optimal, and it is the first nontrivial generalization of a classical theorem of Allard for$$Q=1$$ . As a corollary, if$$\Omega \subset \mathbf {R}^{2+n}$$ is a bounded uniformly convex set and$$\Gamma \subset \partial \Omega $$ a smooth 1-dimensional closed submanifold, then any area-minimizing current$$T$$ with$$\partial T = Q \left [\!\![{\Gamma }\right ]\!\!]$$ is regular in a neighborhood of $$\Gamma $$ .more » « less
-
Abstract We establish a first general partial regularity theorem for area minimizing currents$${\mathrm{mod}}(p)$$ , for everyp, in any dimension and codimension. More precisely, we prove that the Hausdorff dimension of the interior singular set of anm-dimensional area minimizing current$${\mathrm{mod}}(p)$$ cannot be larger than$$m-1$$ . Additionally, we show that, whenpis odd, the interior singular set is$$(m-1)$$ -rectifiable with locally finite$$(m-1)$$ -dimensional measure.more » « less
An official website of the United States government
