skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dynamic time warping of palaeomagnetic secular variation data
SUMMARY We present and make publicly available a dynamic programming algorithm to simultaneously align the inclination and declination vector directions of sedimentary palaeomagnetic secular variation data. This algorithm generates a library of possible alignments through the systematic variation of assumptions about the relative accumulation rate and shared temporal overlap of two or more time-series. The palaeomagnetist can then evaluate this library of reproducible and objective alignments using available geological constraints, statistical methods and expert knowledge. We apply the algorithm to align previously (visually) correlated medium to high accumulation rate northern North Atlantic Holocene deposits (101–102 cm ka–1) with strong radiocarbon control. The algorithm generates plausible alignments that largely conform with radiocarbon and magnetic acquisition process uncertainty. These alignments illustrate the strengths and limitations of this numerical approach.  more » « less
Award ID(s):
1645411
PAR ID:
10176613
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Geophysical Journal International
Volume:
221
Issue:
1
ISSN:
0956-540X
Page Range / eLocation ID:
706 to 721
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Stratigraphic correlation underpins all understanding of Earth’s history, yet few geoscientists have access to, or expertise in, numerical codes that can generate reproducible, optimal (in a least-squares framework) alignments between two stratigraphic time-series data sets. Here we introduce Align, a user-friendly computer app that makes accessible a published dynamic time warping (DTW) algorithm that, in a minute or less, catalogs a library of alignments between two time-series data sets by systematically exploring assumptions about the temporal overlap and relative sedimentation rates between the two stratigraphic sections. The Align app, written in the free, open-source R programming language, utilizes a graphical user interface (e.g., drop-down menus for data upload and sliding bars for parameter exploration) such that no coding is required. In addition to generating alignment libraries, a user can employ Align to visualize, explore, and cull each alignment library according to thresholds on Pearson’s correlation coefficient and/or temporal overlap. Here we demonstrate Align with time-series records of carbonate stable carbon isotope composition, though Align can, in principle, align any two quantitative stratigraphic time-series data sets. 
    more » « less
  2. While it remains uncertain whether excursions in the stable carbon isotopic composition of Ediacaran marine carbonate (δ13Ccarb) represent globally synchronous events (or a direct measure of ocean carbon cycling), the absence of widely distributed and readily preservable fauna, and the presence of several iconic carbon isotope excursions (CIEs), has sustained δ13Ccarb correlation as the primary means to establish relative time relationships for Ediacaran successions. Here we present an Ediacaran global δ13Ccarb composite built with a dynamic time warping (DTW) time-normalization algorithm that generates libraries of least-squares alignments between chemostratigraphic records of unequal length and distinct sediment accumulation rates. When developing a δ13Ccarb composite for each of 16 globally distributed Ediacaran paleo-depositional regions, we selected high Pearson r alignments that conformed with published geological guidance about the correlation of constituent sections. When applying DTW to align these regional algorithmic composites into one global δ13Ccarb stack, we selected alignments that allied the excursions that field workers have established (or speculated) are the Marinoan cap carbonate excursion, the Shuram excursion, and/or the basal Cambrian excursion. There are strengths and weaknesses to making explicit the temporal relationships (point-to-point correspondences) often left implicit in visual correlation. One strength is to extrapolate depositional ages by means of isotopic correlation, and here we explored this with a Bayesian Markov chain Monte Carlo age model that predicts a median age, and uncertainty, for every carbonate stratum in the global Ediacaran δ13Ccarb composite. Yet, one must caution against a false accuracy that can arise from selecting one alignment among many possibilities––the likelihood that time-uncertain time series can be stretched and squeezed into one unequivocal alignment is low. Thus, while these alignments are grounded in the expert assessment of the field worker, this global Ediacaran δ13Ccarb–Bayesian age model should be viewed as a working hypothesis to enrich, but not arbitrate, discussions of the correlation, synchrony, and completeness of Ediacaran successions. 
    more » « less
  3. Abstract Motivation Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Results Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these “gold standard” Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-MEM, and vg to align more reads correctly. Availability and implementation Source code implemented in C ++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. The SpTransformer ( SpTrf ) gene family encodes a set of proteins that function in the sea urchin immune system. The gene sequences have a series of internal repeats in a mosaic pattern that is characteristic of this family. This mosaic pattern necessitates the insertion of large gaps, which has made alignments of the deduced protein sequences computationally difficult such that only manual alignments have been reported previously. Because manual alignments are time consuming for evaluating newly available SpTrf sequences, computational approaches were evaluated for the sequences reported previously. Furthermore, because two different manual alignments of the SpTrf sequences are feasible because of the multiple internal repeats, it is not known whether additional alternative alignments can be identified using different approaches. The bioinformatic program, PRANK, was used because it was designed to align sequences with large gaps and indels. The results from PRANK show that the alignments of the internal repeats are similar to those done manually, suggesting multiple feasible alignments for some regions. GUIDANCE based analysis of the alignments identified regions that were excellent and other regions that failed to align. This suggests that computational approaches have limits for aligning the SpTrf sequences that include multiple repeats and that require inserted gaps. Furthermore, it is unlikely that alternative alignments for the full-length SpTrf sequences will be identified. 
    more » « less
  5. Alkan, Can (Ed.)
    Abstract Motivation Pangenome variation graphs model the mutual alignment of collections of DNA sequences. A set of pairwise alignments implies a variation graph, but there are no scalable methods to generate such a graph from these alignments. Existing related approaches depend on a single reference, a specific ordering of genomes or a de Bruijn model based on a fixed k-mer length. A scalable, self-contained method to build pangenome graphs without such limitations would be a key step in pangenome construction and manipulation pipelines. Results We design the seqwish algorithm, which builds a variation graph from a set of sequences and alignments between them. We first transform the alignment set into an implicit interval tree. To build up the variation graph, we query this tree-based representation of the alignments to reduce transitive matches into single DNA segments in a sequence graph. By recording the mapping from input sequence to output graph, we can trace the original paths through this graph, yielding a pangenome variation graph. We present an implementation that operates in external memory, using disk-backed data structures and lock-free parallel methods to drive the core graph induction step. We demonstrate that our method scales to very large graph induction problems by applying it to build pangenome graphs for several species. Availability and implementation seqwish is published as free software under the MIT open source license. Source code and documentation are available at https://github.com/ekg/seqwish. seqwish can be installed via Bioconda https://bioconda.github.io/recipes/seqwish/README.html or GNU Guix https://github.com/ekg/guix-genomics/blob/master/seqwish.scm. 
    more » « less