skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on March 18, 2024

Title: TS-MIoU: A Time Series Similarity Metric Without Mapping
Quantifying the similarity or distance between time series, processes, signals, and trajectories is a task-specific problem and remains a challenge for many applications. The simplest measure, meaning the Euclidean distance, is often dismissed because of its sensitivity to noise and the curse of dimensionality. Therefore, elastic mappings (such as DTW, LCSS, ED) are often utilized instead. However, these measures are not metric functions, and more importantly, they must deal with the challenges intrinsic to point-to-point mappings, such as pathological alignment. In this paper, we adopt an object-similarity measure, namely Multiscale Intersection over Union (MIoU), for measuring the distance/similarity between time series. We call the new measure TS-MIoU. Unlike the most popular time series similarity measures, TS-MIoU does not rely on a point-to-point mapping, and therefore, circumvents all respective challenges. We show that TS-MIoU is indeed a metric function, especially that it holds the triangle inequality axiom, and therefore can take advantage of indexing algorithms without a lower bounding. We further show that its sensitivity to noise is adjustable, which makes it a strong alternative to the Euclidean distance while not suffering from the curse of dimensionality. Our proof-of-concept experiments on over 100 UCR datasets show that TS-MIoU can fill the gap between the unforgiving strictness of the ℓp-norm measures, and the mapping challenges of elastic measures.  more » « less
Award ID(s):
1931555
NSF-PAR ID:
10402091
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Amini, MR.; Canu, S.; Fischer, A.; Guns, T.; Kralj Novak, P.; Tsoumakas, G.
Date Published:
Journal Name:
Lecture notes in computer science
Volume:
13718
ISSN:
1611-3349
Page Range / eLocation ID:
87–102
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The quantification of Hutchinson's n‐dimensional hypervolume has enabled substantial progress in community ecology, species niche analysis and beyond. However, most existing methods do not support a partitioning of the different components of hypervolume. Such a partitioning is crucial to address the ‘curse of dimensionality’ in hypervolume measures and interpret the metrics on the original niche axes instead of principal components. Here, we propose the use of multivariate normal distributions for the comparison of niche hypervolumes and introduce this as the multivariate‐normal hypervolume (MVNH) framework (R package available onhttps://github.com/lvmuyang/MVNH).

    The framework provides parametric measures of the size and dissimilarity of niche hypervolumes, each of which can be partitioned into biologically interpretable components. Specifically, the determinant of the covariance matrix (i.e. the generalized variance) of a MVNH is a measure of total niche size, which can be partitioned into univariate niche variance components and a correlation component (a measure of dimensionality, i.e. the effective number of independent niche axes standardized by the number of dimensions). The Bhattacharyya distance (BD; a function of the geometric mean of two probability distributions) between two MVNHs is a measure of niche dissimilarity. The BD partitions total dissimilarity into the components of Mahalanobis distance (standardized Euclidean distance with correlated variables) between hypervolume centroids and the determinant ratio which measures hypervolume size difference. The Mahalanobis distance and determinant ratio can be further partitioned into univariate divergences and a correlation component.

    We use empirical examples of community‐ and species‐level analysis to demonstrate the new insights provided by these metrics. We show that the newly proposed framework enables us to quantify the relative contributions of different hypervolume components and to connect these analyses to the ecological drivers of functional diversity and environmental niche variation.

    Our approach overcomes several operational and computational limitations of popular nonparametric methods and provides a partitioning framework that has wide implications for understanding functional diversity, niche evolution, niche shifts and expansion during biotic invasions, etc.

     
    more » « less
  2. This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering. 
    more » « less
  3. Objective

    This study explores subjective and objective driving style similarity to identify how similarity can be used to develop driver-compatible vehicle automation.

    Background

    Similarity in the ways that interaction partners perform tasks can be measured subjectively, through questionnaires, or objectively by characterizing each agent’s actions. Although subjective measures have advantages in prediction, objective measures are more useful when operationalizing interventions based on these measures. Showing how objective and subjective similarity are related is therefore prudent for aligning future machine performance with human preferences.

    Methods

    A driving simulator study was conducted with stop-and-go scenarios. Participants experienced conservative, moderate, and aggressive automated driving styles and rated the similarity between their own driving style and that of the automation. Objective similarity between the manual and automated driving speed profiles was calculated using three distance measures: dynamic time warping, Euclidean distance, and time alignment measure. Linear mixed effects models were used to examine how different components of the stopping profile and the three objective similarity measures predicted subjective similarity.

    Results

    Objective similarity using Euclidean distance best predicted subjective similarity. However, this was only observed for participants’ approach to the intersection and not their departure.

    Conclusion

    Developing driving styles that drivers perceive to be similar to their own is an important step toward driver-compatible automation. In determining what constitutes similarity, it is important to (a) use measures that reflect the driver’s perception of similarity, and (b) understand what elements of the driving style govern subjective similarity.

     
    more » « less
  4. Abstract

    There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM’s training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8$$\times$$×speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1$$\times$$×and 18$$\times$$×depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm’s utility with complex, temporally sequenced natural language.

     
    more » « less
  5. null (Ed.)
    We consider the problem of estimating the Wasserstein distance between the empirical measure and a set of probability measures whose expectations over a class of functions (hypothesis class) are constrained. If this class is sufficiently rich to characterize a particular distribution (e.g., all Lipschitz functions), then our formulation recovers the Wasserstein distance to such a distribution. We establish a strong duality result that generalizes the celebrated Kantorovich-Rubinstein duality. We also show that our formulation can be used to beat the curse of dimensionality, which is well known to affect the rates of statistical convergence of the empirical Wasserstein distance. In particular, examples of infinite-dimensional hypothesis classes are presented, informed by a complex correlation structure, for which it is shown that the empirical Wasserstein distance to such classes converges to zero at the standard parametric rate. Our formulation provides insights that help clarify why, despite the curse of dimensionality, the Wasserstein distance enjoys favorable empirical performance across a wide range of statistical applications. 
    more » « less