skip to main content


Title: Bayesian sequential monitoring of density estimates
Abstract

In this paper, we consider sequentially estimating the density of univariate data. We utilize Pólya trees to develop a statistical process control (SPC) methodology. Our proposed methodology monitors the distribution of the sequentially observed data and detects when the generating density differs from an in‐control standard. We also propose an approximation that merges the probability mass of multiple possible changepoints to curb computational complexity while maintaining the accuracy of the monitoring procedure. We show in simulation experiments that our approach is capable of quickly detecting when a changepoint has occurred while controlling the number of false alarms, and performs well relative to competing methods. We then use our methodology to detect changepoints in high‐frequency foreign exchange (Forex) return data.

 
more » « less
NSF-PAR ID:
10446293
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Quality and Reliability Engineering International
Volume:
38
Issue:
4
ISSN:
0748-8017
Page Range / eLocation ID:
p. 1826-1849
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or the presence) of given changepointspost‐selectionare lacking. Post‐selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low‐powered hypothesis tests and leaves open several important questions about practical usability. In this work, we carefully tailor post‐selection inference methods toward changepoint detection, focusing on copy number variation data. To accomplish this, we study commonly used changepoint algorithms: binary segmentation, as well as two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post‐selection inference theory, mainly auxiliary randomization. This improves the power, which requires implementations of Markov chain Monte Carlo algorithms (importance sampling and hit‐and‐run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and example analyses on array comparative genomic hybridization as well as sequencing data.

     
    more » « less
  2. Abstract

    Climate changepoint (homogenization) methods abound today, with a myriad of techniques existing in both the climate and statistics literature. Unfortunately, the appropriate changepoint technique to use remains unclear to many. Further complicating issues, changepoint conclusions are not robust to perturbations in assumptions; for example, allowing for a trend or correlation in the series can drastically change changepoint conclusions. This paper is a review of the topic, with an emphasis on illuminating the models and techniques that allow the scientist to make reliable conclusions. Pitfalls to avoid are demonstrated via actual applications. The discourse begins by narrating the salient statistical features of most climate time series. Thereafter, single- and multiple-changepoint problems are considered. Several pitfalls are discussed en route and good practices are recommended. While most of our applications involve temperatures, a sea ice series is also considered.

    Significance Statement

    This paper reviews the methods used to identify and analyze the changepoints in climate data, with a focus on helping scientists make reliable conclusions. The paper discusses common mistakes and pitfalls to avoid in changepoint analysis and provides recommendations for best practices. The paper also provides examples of how these methods have been applied to temperature and sea ice data. The main goal of the paper is to provide guidance on how to effectively identify the changepoints in climate time series and homogenize the series.

     
    more » « less
  3. Abstract

    We propose the multiple changepoint isolation (MCI) method for detecting multiple changes in the mean and covariance of a functional process. We first introduce a pair of projections to represent the variability “between” and “within” the functional observations. We then present an augmented fused lasso procedure to split the projections into multiple regions robustly. These regions act to isolate each changepoint away from the others so that the powerful univariate CUSUM statistic can be applied region‐wise to identify the changepoints. Simulations show that our method accurately detects the number and locations of changepoints under many different scenarios. These include light and heavy tailed data, data with symmetric and skewed distributions, sparsely and densely sampled changepoints, and mean and covariance changes. We show that our method outperforms a recent multiple functional changepoint detector and several univariate changepoint detectors applied to our proposed projections. We also show that MCI is more robust than existing approaches and scales linearly with sample size. Finally, we demonstrate our method on a large time series of water vapor mixing ratio profiles from atmospheric emitted radiance interferometer measurements.

     
    more » « less
  4. Abstract

    Proximity to roads is one of the main determinants of deforestation in the Amazon basin. Determining the construction year of roads (CYR) is critical to improve the understanding of the drivers of road construction and to enable predictions of the expansion of the road network and its consequent impact on ecosystems. While recent artificial intelligence approaches have been successfully used for road extraction, they have typically relied on high spatial‐resolution imagery, precluding their adoption for the determination of CYR for older roads. In this article, we developed a new approach to automate the process of determining CYR that relies on the approximate position of the current road network and a time‐series of the proportion of exposed soil based on the multidecadal remote sensing imagery from the Landsat program. Starting with these inputs, our methodology relies on the Least Cost Path algorithm to co‐register the road network and on a Before‐After Control‐Impact design to circumvent the inherent image‐to‐image variability in the estimated amount of exposed soil. We demonstrate this approach for a 357 000 km2area around the Transamazon highway (BR‐230) in the Brazilian Amazon, encompassing 36 240 road segments. The reliability of this approach is assessed by comparing the estimated CYR using our approach to the observed CYR based on a time‐series of Landsat images. This exercise reveals a close correspondence between the estimated and observed CYR (). Finally, we show how these data can be used to assess the effectiveness of protected areas (PAs) in reducing the yearly rate of road construction and thus their vulnerability to future degradation. In particular, we find that integral protection PAs in this region were generally more effective in reducing the expansion of the road network when compared to sustainable use PAs.

     
    more » « less
  5. null (Ed.)
    Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost. 
    more » « less