skip to main content

Title: Somtimes: self organizing maps for time series clustering and its application to serious illness conversations

There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM’s training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8$$\times$$×speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1$$\times$$×and 18$$\times$$×depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Data Mining and Knowledge Discovery
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Coronal Holes (CHs) are regions of open magnetic-field lines, resulting in high-speed solar wind. Accurate detection of CHs is vital for space-weather prediction. This paper presents an intramethod ensemble for coronal-hole detection based on the Active Contours Without Edges (ACWE) segmentation algorithm. The purpose of this ensemble is to develop a confidence map that defines, for all ondisk regions of a solar extreme ultraviolet (EUV) image, the likelihood that each region belongs to a CH based on that region’s proximity to, and homogeneity with, the core of identified CH regions. By relying on region homogeneity, and not intensity (which can vary due to various factors, including line-of-sight changes and stray light from nearby bright regions), to define the final confidence of any given region, this ensemble is able to provide robust, consistent delineations of the CH regions. Using the metrics of global consistency error (GCE), local consistency error (LCE), intersection over union (IOU), and the structural similarity index measure (SSIM), the method is shown to be robust to different spatial resolutions maintaining a median IOU$>0.75$>0.75and minimum SSIM$>0.93$>0.93even when the segmentation process was performed on an EUV image decimated from$4096\times 4096$4096×4096pixels down to$512\times 512$512×512pixels. Furthermore, using the same metrics, the method is shown to be robust across short timescales, producing segmentation with a mean IOU of 0.826 from EUV images taken at a 1-h cadence, and showing a smooth decay in similarity across all metrics as a function of time, indicating self-consistent segmentations even when corrections for exposure time have not been applied to the data. Finally, the accuracy of the segmentations and confidence maps are validated by considering the skewness (i.e., unipolarity) of the underlying magnetic field.

    more » « less
  2. Abstract

    As the use of spectral/hpelement methods, and high-order finite element methods in general, continues to spread, community efforts to create efficient, optimized algorithms associated with fundamental high-order operations have grown. Core tasks such as solution expansion evaluation at quadrature points, stiffness and mass matrix generation, and matrix assembly have received tremendous attention. With the expansion of the types of problems to which high-order methods are applied, and correspondingly the growth in types of numerical tasks accomplished through high-order methods, the number and types of these core operations broaden. This work focuses on solution expansion evaluation at arbitrary points within an element. This operation is core to many postprocessing applications such as evaluation of streamlines and pathlines, as well as to field projection techniques such as mortaring. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/hpelement methods. We provide efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/hpelement libraryNektar++by running a series of baseline evaluations against the ‘standard’ Lagrangian method, where an interpolation matrix is generated and matrix-multiplication applied to evaluate a point at a given location. We present results from a rigorous series of benchmarking tests for a variety of element shapes, polynomial orders and dimensions. We show that when the point of interest is to be repeatedly evaluated, the barycentric method performs at worst$$50\%$$50%slower, when compared to a cached matrix evaluation. However, when the point of interest changes repeatedly so that the interpolation matrix must be regenerated in the ‘standard’ approach, the barycentric method yields far greater performance, with a minimum speedup factor of$$7\times $$7×. Furthermore, when derivatives of the solution evaluation are also required, the barycentric method in general slightly outperforms the cached interpolation matrix method across all elements and orders, with an up to$$30\%$$30%speedup. Finally we investigate a real-world example of scalar transport using a non-conformal discontinuous Galerkin simulation, in which we observe around$$6\times $$6×speedup in computational time for the barycentric method compared to the matrix-based approach. We also explore the complexity of both interpolation methods and show that the barycentric interpolation method requires$${\mathcal {O}}(k)$$O(k)storage compared to a best case space complexity of$${\mathcal {O}}(k^2)$$O(k2)for the Lagrangian interpolation matrix method.

    more » « less
  3. Abstract

    The question of global existence versus finite-time singularity formation is considered for the generalized Constantin–Lax–Majda equation with dissipationΛσ, whereΛσˆ=|k|σ, both for the problem on the circlex[π,π]and the real line. In the periodic geometry, two complementary approaches are used to prove global-in-time existence of solutions forσ1and all real values of an advection parameterawhen the data is small. We also derive new analytical solutions in both geometries whena = 0, and on the real line whena=1/2, for various values ofσ. These solutions exhibit self-similar finite-time singularity formation, and the similarity exponents and conditions for singularity formation are fully characterized. We revisit an analytical solution on the real line due to Schochet fora = 0 andσ = 2, and reinterpret it terms of self-similar finite-time collapse. The analytical solutions on the real line allow finite-time singularity formation for arbitrarily small data, even for values ofσthat are greater than or equal to one, thereby illustrating a critical difference between the problems on the real line and the circle. The analysis is complemented by accurate numerical simulations, which are able to track the formation and motion of singularities in the complex plane. The computations validate and build upon the analytical theory.

    more » « less
  4. Abstract

    Database peptide search is the primary computational technique for identifying peptides from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now ubiquitous in the current-generation of high-performance computing (HPC) systems, yet its application in the database peptide search domain remains limited. Part of the reason is the use of sub-optimal algorithms in the existing GPU-accelerated methods resulting in significantly inefficient hardware utilization. In this paper, we design and implement a new-age CPU-GPU HPC framework, calledGiCOPS, for efficient and complete GPU-acceleration of the modern database peptide search algorithms on supercomputers. Our experimentation shows that the GiCOPS exhibits between 1.2 to 5$$\times$$×speed improvement over its CPU-only predecessor, HiCOPS, and over 10$$\times$$×improvement over several existing GPU-based database search algorithms for sufficiently large experiment sizes. We further assess and optimize the performance of our framework using the Roofline Model and report near-optimal results for several metrics including computations per second, occupancy rate, memory workload, branch efficiency and shared memory performance. Finally, the CPU-GPU methods and optimizations proposed in our work for complex integer- and memory-bounded algorithmic pipelines can also be extended to accelerate the existing and future peptide identification algorithms. GiCOPS is now integrated with our umbrella HPC framework HiCOPS and is available at:

    more » « less
  5. Abstract

    We report on a series of detailed Breit-Pauli and Dirac B-spline R-matrix (DBSR) differential cross section (DCS) calculations for excitation of the$$5\,^2\textrm{S}_{1/2} \rightarrow 5\,^2\textrm{P}_{1/2}$$52S1/252P1/2and$$5\,^2\textrm{S}_{1/2}\rightarrow 5\,^2\textrm{P}_{3/2}$$52S1/252P3/2states in rubidium by 40 eV incident electrons. The early BP computations shown here were carried out with both 5 states and 12 states, while the DBSR models coupled 150 and 325 states, respectively. We also report corresponding results from a limited set of DCS measurements on the unresolved$$5\,^2\textrm{P}_{1/2,3/2}$$52P1/2,3/2states, with the experimental data being restricted to the scattered electron angular range 2–$$10^\circ $$10. Typically, good agreement is found between our calculated and measured DCS for excitation of the unresolved$$5\,^2\textrm{P}_{1/2,3/2}$$52P1/2,3/2states, with best accord being found between the DBSR predictions and the measured data. The present theoretical and experimental results are also compared with predictions from earlier 40 eV calculations using the nonrelativistic Distorted-Wave Born Approximation and a Relativistic Distorted-Wave model.

    Graphic abstract 
    more » « less