skip to main content

Title: Detecting Traffic Incidents Using Persistence Diagrams
We introduce a novel methodology for anomaly detection in time-series data. The method uses persistence diagrams and bottleneck distances to identify anomalies. Specifically, we generate multiple predictors by randomly bagging the data (reference bags), then for each data point replacing the data point for a randomly chosen point in each bag (modified bags). The predictors then are the set of bottleneck distances for the reference/modified bag pairs. We prove the stability of the predictors as the number of bags increases. We apply our methodology to traffic data and measure the performance for identifying known incidents.
Authors:
; ;
Award ID(s):
1830254 1934884
Publication Date:
NSF-PAR ID:
10275086
Journal Name:
Algorithms
Volume:
13
Issue:
9
Page Range or eLocation-ID:
222
ISSN:
1999-4893
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning from label proportions (LLP) is a weakly supervised setting for classification in which unlabeled training instances are grouped into bags, and each bag is annotated with the proportion of each class occurring in that bag. Prior work on LLP has yet to establish a consistent learning procedure, nor does there exist a theoretically justified, general purpose training criterion. In this work we address these two issues by posing LLP in terms of mutual contamination models (MCMs), which have recently been applied successfully to study various other weak supervision settings. In the process, we establish several novel technical results for MCMs, including unbiased losses and generalization error bounds under non-iid sampling plans. We also point out the limitations of a common experimental setting for LLP, and propose a new one based on our MCM framework.
  2. Learning from label proportions (LLP) is a weakly supervised setting for classification in whichunlabeled training instances are grouped into bags, and each bag is annotated with the proportion ofeach class occurring in that bag. Prior work on LLP has yet to establish a consistent learning procedure,nor does there exist a theoretically justified, general purpose training criterion. In this work we addressthese two issues by posing LLP in terms of mutual contamination models (MCMs), which have recentlybeen applied successfully to study various other weak supervision settings. In the process, we establishseveral novel technical results for MCMs, including unbiased losses and generalization error bounds undernon-iid sampling plans. We also point out the limitations ofa common experimental setting for LLP,and propose a new one based on our MCM framework.
  3. Abstract Rare events arising in nonlinear atmospheric dynamics remain hard to predict and attribute. We address the problem of forecasting rare events in a prototypical example, sudden stratospheric warmings (SSWs). Approximately once every other winter, the boreal stratospheric polar vortex rapidly breaks down, shifting midlatitude surface weather patterns for months. We focus on two key quantities of interest: the probability of an SSW occurring, and the expected lead time if it does occur, as functions of initial condition. These optimal forecasts concretely measure the event’s progress. Direct numerical simulation can estimate them in principle but is prohibitively expensive in practice: each rare event requires a long integration to observe, and the cost of each integration grows with model complexity. We describe an alternative approach using integrations that are short compared to the time scale of the warming event. We compute the probability and lead time efficiently by solving equations involving the transition operator, which encodes all information about the dynamics. We relate these optimal forecasts to a small number of interpretable physical variables, suggesting optimal measurements for forecasting. We illustrate the methodology on a prototype SSW model developed by Holton and Mass and modified by stochastic forcing. While highly idealized,more »this model captures the essential nonlinear dynamics of SSWs and exhibits the key forecasting challenge: the dramatic separation in time scales between a single event and the return time between successive events. Our methodology is designed to fully exploit high-dimensional data from models and observations, and has the potential to identify detailed predictors of many complex rare events in meteorology.« less
  4. Abstract Background Ensemble modeling aims to boost the forecasting performance by systematically integrating the predictive accuracy across individual models. Here we introduce a simple-yet-powerful ensemble methodology for forecasting the trajectory of dynamic growth processes that are defined by a system of non-linear differential equations with applications to infectious disease spread. Methods We propose and assess the performance of two ensemble modeling schemes with different parametric bootstrapping procedures for trajectory forecasting and uncertainty quantification. Specifically, we conduct sequential probabilistic forecasts to evaluate their forecasting performance using simple dynamical growth models with good track records including the Richards model, the generalized-logistic growth model, and the Gompertz model. We first test and verify the functionality of the method using simulated data from phenomenological models and a mechanistic transmission model. Next, the performance of the method is demonstrated using a diversity of epidemic datasets including scenario outbreak data of the Ebola Forecasting Challenge and real-world epidemic data outbreaks of including influenza, plague, Zika, and COVID-19. Results We found that the ensemble method that randomly selects a model from the set of individual models for each time point of the trajectory of the epidemic frequently outcompeted the individual models as well as an alternative ensemblemore »method based on the weighted combination of the individual models and yields broader and more realistic uncertainty bounds for the trajectory envelope, achieving not only better coverage rate of the 95% prediction interval but also improved mean interval scores across a diversity of epidemic datasets. Conclusion Our new methodology for ensemble forecasting outcompete component models and an alternative ensemble model that differ in how the variance is evaluated for the generation of the prediction intervals of the forecasts.« less
  5. Abstract. Equimolal tris (2-amino-2-hydroxymethyl-propane-1,3-diol) buffer in artificialseawater is a well characterized and commonly used standard for oceanographic pH measurements. We evaluated the stability of tris pH when stored in purportedly gas-impermeable bags across a variety of experimental conditions, including bag type and storage in air vs. seawater over300 d. Bench-top spectrophotometric pH analysis revealed that the pH of tris stored in bags decreased at a rate of 0.0058±0.0011 yr−1 (mean slope ±95 % confidence interval of slope). The upper and lower bounds of expected pH change att=365 d, calculated using the averages and confidence intervals of slope and intercept of measured pH change vs. time data, were −0.0042 and −0.0076 from initial pH. Analyses of total dissolved inorganic carbonconfirmed that a combination of CO2 infiltration and/or microbialrespiration led to the observed decrease in pH. Eliminating the change in pH of bagged tris remains a goal, yet the rate of pH change is lower than many processes of interest and demonstrates the potential of bagged tris for sensor calibration and validation of autonomous in situ pH measurements.