skip to main content


This content will become publicly available on July 14, 2024

Title: Sequential change‐point detection: Computation versus statistical performance
Abstract

Change‐point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large‐scale, high‐dimensional, and complex streaming data call for computationally (memory) efficient sequential change‐point detection algorithms that are also statistically powerful. This gives rise to a computation versus statistical power trade‐off, an aspect less emphasized in the past in classic literature. This tutorial takes this new perspective and reviews several sequential change‐point detection procedures, ranging from classic sequential change‐point detection algorithms to more recent non‐parametric procedures that consider computation, memory efficiency, and model robustness in the algorithm design. Our survey also contains classic performance analysis, which provides useful techniques for analyzing new procedures.

This article is categorized under:

Statistical Models > Time Series Models

Algorithms and Computational Methods > Algorithms

Data: Types and Structure > Time Series, Stochastic Processes, and Functional Data

 
more » « less
Award ID(s):
1650913
NSF-PAR ID:
10431924
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Statistics
ISSN:
1939-5108
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Bayesian hierarchical models allow ecologists to account for uncertainty and make inference at multiple scales. However, hierarchical models are often computationally intensive to fit, especially with large datasets, and researchers face trade‐offs between capturing ecological complexity in statistical models and implementing these models.

    We present a recursive Bayesian computing (RB) method that can be used to fit Bayesian models efficiently in sequential MCMC stages to ease computation and streamline hierarchical inference. We also introduce transformation‐assisted RB (TARB) to create unsupervised MCMC algorithms and improve interpretability of parameters. We demonstrate TARB by fitting a hierarchical animal movement model to obtain inference about individual‐ and population‐level migratory characteristics.

    Our recursive procedure reduced computation time for fitting our hierarchical movement model by half compared to fitting the model with a single MCMC algorithm. We obtained the same inference fitting our model using TARB as we obtained fitting the model with a single algorithm.

    For complex ecological statistical models, like those for animal movement, multi‐species systems, or large spatial and temporal scales, the computational demands of fitting models with conventional computing techniques can limit model specification, thus hindering scientific discovery. Transformation‐assisted RB is one of the most accessible methods for reducing these limitations, enabling us to implement new statistical models and advance our understanding of complex ecological phenomena.

     
    more » « less
  2. Abstract

    Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core‐collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational‐wave background are in the sensitivity band of the ground‐based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high‐dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer‐intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state‐of‐the‐art Bayesian statistical parameter estimation methods will be given for researchers in this cross‐disciplinary area of gravitational wave data analysis.

    This article is categorized under:

    Applications of Computational Statistics > Signal and Image Processing and Coding

    Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)

    Statistical Models > Time Series Models

     
    more » « less
  3. Abstract

    Historical museum records provide potentially useful data for identifying drivers of change in species occupancy. However, because museum records are typically obtained via many collection methods, methodological developments are needed to enable robust inferences. Occupancy–detection models, a relatively new and powerful suite of statistical methods, are a potentially promising avenue because they can account for changes in collection effort through space and time.

    We use simulated datasets to identify how and when patterns in data and/or modelling decisions can bias inference. We focus primarily on the consequences of contrasting methodological approaches for dealing with species' ranges and inferring species' non‐detections in both space and time.

    We find that not all datasets are suitable for occupancy–detection analysis but, under the right conditions (namely, datasets that are broken into more time periods for occupancy inference and that contain a high fraction of community‐wide collections, or collection events that focus on communities of organisms), models can accurately estimate trends. Finally, we present a case study on eastern North American odonates where we calculate long‐term trends of occupancy using our most robust workflow.

    These results indicate that occupancy–detection models are a suitable framework for some research cases and expand the suite of available tools for macroecological analysis available to researchers, especially where structured datasets are unavailable.

     
    more » « less
  4. Abstract

    Monitoring wildlife abundance across space and time is an essential task to study their population dynamics and inform effective management. Acoustic recording units are a promising technology for efficiently monitoring bird populations and communities. While current acoustic data models provide information on the presence/absence of individual species, new approaches are needed to monitor population abundance, ideally across large spatio‐temporal regions.

    We present an integrated modelling framework that combines high‐quality but temporally sparse bird point count survey data with acoustic recordings. Our models account for imperfect detection in both data types and false positive errors in the acoustic data. Using simulations, we compare the accuracy and precision of abundance estimates using differing amounts of acoustic vocalizations obtained from a clustering algorithm, point count data, and a subset of manually validated acoustic vocalizations. We also use our modelling framework in a case study to estimate abundance of the Eastern Wood‐Pewee (Contopus virens) in Vermont, USA.

    The simulation study reveals that combining acoustic and point count data via an integrated model improves accuracy and precision of abundance estimates compared with models informed by either acoustic or point count data alone. Improved estimates are obtained across a wide range of scenarios, with the largest gains occurring when detection probability for the point count data is low. Combining acoustic data with only a small number of point count surveys yields estimates of abundance without the need for validating any of the identified vocalizations from the acoustic data. Within our case study, the integrated models provided moderate support for a decline of the Eastern Wood‐Pewee in this region.

    Our integrated modelling approach combines dense acoustic data with few point count surveys to deliver reliable estimates of species abundance without the need for manual identification of acoustic vocalizations or a prohibitively expensive large number of repeated point count surveys. Our proposed approach offers an efficient monitoring alternative for large spatio‐temporal regions when point count data are difficult to obtain or when monitoring is focused on rare species with low detection probability.

     
    more » « less
  5. Abstract

    Searching for patterns in data is important because it can lead to the discovery of sequence segments that play a functional role. The complexity of pattern statistics that are used in data analysis and the need of the sampling distribution of those statistics for inference renders efficient computation methods as paramount. This article gives an overview of the main methods used to compute distributions of statistics of overlapping pattern occurrences, specifically, generating functions, correlation functions, the Goulden‐Jackson cluster method, recursive equations, and Markov chain embedding. The underlying data sequence will be assumed to be higher‐order Markovian, which includes sparse Markov models and variable length Markov chains as special cases. Also considered will be recent developments for extending the computational capabilities of the Markov chain‐based method through an algorithm for minimizing the size of the chain's state space, as well as improved data modeling capabilities through sparse Markov models. An application to compute a distribution used as a test statistic in sequence alignment will serve to illustrate the usefulness of the methodology.

    This article is categorized under:

    Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition

    Data: Types and Structure > Categorical Data

    Statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms

     
    more » « less