skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Sequential change‐point detection: Computation versus statistical performance
Abstract Change‐point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large‐scale, high‐dimensional, and complex streaming data call for computationally (memory) efficient sequential change‐point detection algorithms that are also statistically powerful. This gives rise to a computation versus statistical power trade‐off, an aspect less emphasized in the past in classic literature. This tutorial takes this new perspective and reviews several sequential change‐point detection procedures, ranging from classic sequential change‐point detection algorithms to more recent non‐parametric procedures that consider computation, memory efficiency, and model robustness in the algorithm design. Our survey also contains classic performance analysis, which provides useful techniques for analyzing new procedures. This article is categorized under:Statistical Models > Time Series ModelsAlgorithms and Computational Methods > AlgorithmsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data  more » « less
Award ID(s):
1650913 2220495
PAR ID:
10492374
Author(s) / Creator(s):
;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
WIREs Computational Statistics
Volume:
16
Issue:
1
ISSN:
1939-5108
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many statistical models currently used in ecology and evolution account for covariances among random errors. Here, I address five points: (i) correlated random errors unite many types of statistical models, including spatial, phylogenetic and time‐series models; (ii) random errors are neither unpredictable nor mistakes; (iii) diagnostics for correlated random errors are not useful, but simulations are; (iv) model predictions can be made with random errors; and (v) can random errors be causal?These five points are illustrated by applying statistical models to analyse simulated spatial, phylogenetic and time‐series data. These three simulation studies are paired with three types of predictions that can be made using information from covariances among random errors: predictions for goodness‐of‐fit, interpolation, and forecasting.In the simulation studies, models incorporating covariances among random errors improve inference about the relationship between dependent and independent variables. They also imply the existence of unmeasured variables that generate the covariances among random errors. Understanding the covariances among random errors gives information about possible processes underlying the data.Random errors are caused by something. Therefore, to extract full information from data, covariances among random errors should not just be included in statistical models; they should also be studied in their own right. Data are hard won, and appropriate statistical analyses can make the most of them. 
    more » « less
  2. Abstract Missing values are ubiquitous in ecological time series. Methods like linear interpolation,k‐nearest neighbour (kNN) imputation or regression‐based imputation are commonly used to repair these gaps, but may be unsuitable when the data are infrequently sampled or have nonlinear dynamics.We introduce multiview cross‐mapping (MVCM), a novel method based in empirical dynamic modelling (EDM) that exploits shared information between dynamically coupled time series. Rather than using points nearby in time, MVCM uses similar system states on an attractor to estimate the value of a missing data point. MVCM works best where other dynamically coupled variables have been observed, but it can also predict into short gaps where all variables are missing (data void).Using model data from a coupled five‐species system, and observational data from a long‐term plankton survey in Lake Zurich, Switzerland, we show that MVCM is robust and performs significantly better than linear methods (linear interpolation, linear regression‐based imputation) and kNN imputation.Crucially, this approach differs from methods based on a purely statistical paradigm because it assumes that the time series are generated by underlying deterministic rules. This dynamical framework allows us to exploit information shared between time series from a mechanistically coupled system, making complexity an asset for the analysis of imperfect observational data. 
    more » « less
  3. Abstract A fundamental problem in functional data analysis is to classify a functional observation based on training data. The application of functional data classification has gained immense popularity and utility across a wide array of disciplines, encompassing biology, engineering, environmental science, medical science, neurology, social science, and beyond. The phenomenal growth of the application of functional data classification indicates the urgent need for a systematic approach to develop efficient classification methods and scalable algorithmic implementations. Therefore, we here conduct a comprehensive review of classification methods for functional data. The review aims to bridge the gap between the functional data analysis community and the machine learning community, and to intrigue new principles for functional data classification. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and ClassificationStatistical Models > Classification ModelsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data 
    more » « less
  4. Abstract Relationships between plant biodiversity and productivity are highly variable across studies in managed grasslands, partly because of the challenge of accounting for confounding's and reciprocal relationships between biodiversity and productivity in observational data collected at a single point in time. Identifying causal effects in the presence of these challenges requires new analytical approaches and repeated observations to determine the temporal ordering of effects.Though rarely available, data collected at multiple time points within a growing season can help to disentangle the effects of biodiversity on productivity and vice versa. Here we advance this understanding using seasonal grassland surveys from 150 managed grassland sites repeated over 2 years, along with statistical methods that are relatively new in ecology, that aim to infer causal relationships from observational data. We compare our approach to common methods used in ecology, that is, mixed‐effect models, and to analyses that use observations from only one point in time within the growing seasons.We find that mixed models overestimated the effect of biodiversity on productivity by two standard errors as compared to our main models, which find no evidence for a strong positive effect. For the effect of productivity on biodiversity we found a negative effect using mixed models which was highly sensitive to the time at which the data was collected within the growing season. In contrast, our main models found no evidence for an effect. Conventional models overestimated the effects between biodiversity and productivity, likely due to confounding variables.Synthesis. Understanding the biodiversity‐productivity relationships is a focal topic in ecology, but unravelling their reciprocal nature remains challenging. We demonstrate that higher‐resolution longitudinal data along with methods to control for a broader suite of confounding variables can be used to resolve reciprocal relationships. We highlight future data needs and methods that can help us to resolve biodiversity‐productivity relationships, crucial for reconciling a long‐running debate in ecology and ultimately, to understand how biodiversity and ecosystem functioning respond to global change. 
    more » « less
  5. Abstract Model calibration is crucial for optimizing the performance of complex computer models across various disciplines. In the era of Industry 4.0, symbolizing rapid technological advancement through the integration of advanced digital technologies into industrial processes, model calibration plays a key role in advancing digital twin technology, ensuring alignment between digital representations and real‐world systems. This comprehensive review focuses on the Kennedy and O'Hagan (KOH) framework (Kennedy and O'Hagan, Journal of the Royal Statistical Society: Series B 2001; 63(3):425–464). In particular, we explore recent advancements addressing the challenges of the unidentifiability issue while accommodating model inadequacy within the KOH framework. In addition, we explore recent advancements in adapting the KOH framework to complex scenarios, including those involving multivariate outputs and functional calibration parameters. We also delve into experimental design strategies tailored to the unique demands of model calibration. By offering a comprehensive analysis of the KOH approach and its diverse applications, this review serves as a valuable resource for researchers and practitioners aiming to enhance the accuracy and reliability of their computer models. This article is categorized under:Statistical Models > Semiparametric ModelsStatistical Models > Simulation ModelsStatistical Models > Bayesian Models 
    more » « less