skip to main content


Title: Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
Abstract

Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to asrmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes:E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran onE. coli. Moreover, on the human genomermapperwas at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software,rmapperis written in C++ and is publicly available under GNU General Public License athttps://github.com/kingufl/Rmapper.

 
more » « less
NSF-PAR ID:
10230865
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Algorithms for Molecular Biology
Volume:
16
Issue:
1
ISSN:
1748-7188
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Well-resolved measurements of the small-scale dissipation statistics within turbulent channel flow are reported for a range of Reynolds numbers from $Re_{{\it\tau}}\approx 500$ to 4000. In this flow, the local large-scale Reynolds number based on the longitudinal integral length scale is found to poorly describe the Reynolds number dependence of the small-scale statistics. When a length scale based on Townsend’s attached-eddy hypothesis is used to define the local large-scale Reynolds number, the Reynolds number scaling behaviour was found to be more consistent with that observed in homogeneous, isotropic turbulence. The Reynolds number scaling of the dissipation moments up to the sixth moment was examined and the results were found to be in good agreement with predicted scaling behaviour (Schumacher et al. , Proc. Natl Acad. Sci. USA , vol. 111, 2014, pp. 10961–10965). The probability density functions of the local dissipation scales (Yakhot, Physica D, vol. 215 (2), 2006, pp. 166–174) were also determined and, when the revised local large-scale Reynolds number is used for normalization, provide support for the existence of a universal distribution which scales differently for inner and outer regions. 
    more » « less
  2. In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth’s eukaryotic diversity [H. A. Lewin et al. , Proc. Natl. Acad. Sci. U.S.A. 115, 4325–4333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline’s future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the world’s most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation. 
    more » « less
  3. Abstract

    Quantum key distribution (QKD) has established itself as a groundbreaking technology, showcasing inherent security features that are fundamentally proven. Qubit-based QKD protocols that rely on binary encoding encounter an inherent constraint related to the secret key capacity. This limitation restricts the maximum secret key capacity to one bit per photon. On the other hand, qudit-based QKD protocols have their advantages in scenarios where photons are scarce and noise is present, as they enable the transmission of more than one secret bit per photon. While proof-of-principle entangled-based qudit QKD systems have been successfully demonstrated over the years, the current limitation lies in the maximum distribution distance, which remains at 20 km fiber distance. Moreover, in these entangled high-dimensional QKD systems, the witness and distribution of quantum steering have not been shown before. Here we present a high-dimensional time-bin QKD protocol based on energy-time entanglement that generates a secure finite-length key capacity of 2.39 bit/coincidences and secure cryptographic finite-length keys at 0.24 Mbits s−1in a 50 km optical fiber link. Our system is built entirely using readily available commercial off-the-shelf components, and secured by nonlocal dispersion cancellation technique against collective Gaussian attacks. Furthermore, we set new records for witnessing both energy-time entanglement and quantum steering over different fiber distances. When operating with a quantum channel loss of 39 dB, our system retains its inherent characteristic of utilizing large-alphabet. This enables us to achieve a secure key rate of 0.30 kbits s−1and a secure key capacity of 1.10 bit/coincidences, considering finite-key effects. Our experimental results closely match the theoretical upper bound limit of secure cryptographic keys in high-dimensional time-bin QKD protocols (Moweret al2013Phys. Rev.A87062322; Zhanget al2014Phys. Rev. Lett.112120506), and outperform recent state-of-the-art qubit-based QKD protocols in terms of secure key throughput using commercial single-photon detectors (Wengerowskyet al2019Proc. Natl Acad. Sci.1166684; Wengerowskyet al2020npj Quantum Inf.65; Zhanget al2014Phys. Rev. Lett.112120506; Zhanget al2019Nat. Photon.13839; Liuet al2019Phys. Rev. Lett.122160501; Zhanget al2020Phys. Rev. Lett.125010502; Weiet al2020Phys. Rev.X10031030). The simple and robust entanglement-based high-dimensional time-bin protocol presented here provides potential for practical long-distance quantum steering and QKD with multiple secure bits-per-coincidence, and higher secure cryptographic keys compared to mature qubit-based QKD protocols.

     
    more » « less
  4. Abstract It was recently shown in Wechsung et al (2022 Proc. Natl Acad. Sci. USA 119 e2202084119) that there exist electromagnetic coils that generate magnetic fields, which are excellent approximations to quasi-symmetric fields and have very good particle confinement properties. Using a Gaussian process-based model for coil perturbations, we investigate the impact of manufacturing errors on the performance of these coils. We show that even fairly small errors result in noticeable performance degradation. While stochastic optimization yields minor improvements, it is not possible to mitigate these errors significantly. As an alternative to stochastic optimization, we then formulate a new optimization problem for computing optimal adjustments of the coil positions and currents without changing the shapes of the coil. These a-posteriori adjustments are able to reduce the impact of coil errors by an order of magnitude, providing a new perspective for dealing with manufacturing tolerances in stellarator design. 
    more » « less
  5. Global aridification is projected to intensify. Yet, our knowledge of its potential impacts on species ranges remains limited. Here, we investigate global aridity velocity and its overlap with three sectors (natural protected areas, agricultural areas, and urban areas) and terrestrial biodiversity in historical (1979 through 2016) and future periods (2050 through 2099), with and without considering vegetation physiological response to rising CO2. Both agricultural and urban areas showed a mean drying velocity in history, although the concurrent global aridity velocity was on average +0.05/+0.20 km/yr−1(no CO2effects/with CO2effects; “+” denoting wetting). Moreover, in drylands, the shifts of vegetation greenness isolines were found to be significantly coupled with the tracks of aridity velocity. In the future, the aridity velocity in natural protected areas is projected to change from wetting to drying across RCP (representative concentration pathway) 2.6, RCP6.0, and RCP8.5 scenarios. When accounting for spatial distribution of terrestrial taxa (including plants, mammals, birds, and amphibians), the global aridity velocity would be -0.15/-0.02 km/yr−1(“-” denoting drying; historical), -0.12/-0.15 km/yr−1(RCP2.6), -0.36/-0.10 km/yr−1(RCP6.0), and -0.75/-0.29 km/yr−1(RCP8.5), with amphibians particularly negatively impacted. Under all scenarios, aridity velocity shows much higher multidirectionality than temperature velocity, which is mainly poleward. These results suggest that aridification risks may significantly influence the distribution of terrestrial species besides warming impacts and further impact the effectiveness of current protected areas in future, especially under RCP8.5, which best matches historical CO2emissions [C. R. Schwalmet al.,Proc. Natl. Acad. Sci. U.S.A.117, 19656–19657 (2020)].

     
    more » « less