skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 2 until 12:00 AM ET on Saturday, May 3 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on May 4, 2025

Title: Distributionally Robust Quickest Change Detection using Wasserstein Uncertainty Sets
The problem of quickest detection of a change in the distribution of streaming data is considered. It is assumed that the pre-change distribution is known, while the only information about the post-change is through a (small) set of labeled data. This post-change data is used in a data-driven minimax robust framework, where an uncertainty set for the post-change distribution is constructed. The robust change detection problem is studied in an asymptotic setting where the mean time to false alarm goes to infinity. It is shown that the least favorable distribution (LFD) is an exponentially tilted version of the pre-change density and can be obtained efficiently. A Cumulative Sum (CuSum) test based on the LFD, which is referred to as the distributionally robust (DR) CuSum test, is then shown to be asymptotically robust. The results are extended to the case with multiple post-change uncertainty sets and validated using synthetic and real data examples.  more » « less
Award ID(s):
2033900
PAR ID:
10560860
Author(s) / Creator(s):
; ;
Editor(s):
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen
Publisher / Repository:
PMLR
Date Published:
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The problem of sequential change diagnosis is considered, where a sequence of independent random elements is accessed sequentially, there is an abrupt change in its distribution at some unknown time, and there are two main operational goals: to quickly detect the change and to accurately identify the post-change distribution among a finite set of alternatives. A standard algorithm is considered, which does not explicitly address the isolation task and raises an alarm as soon as the CuSum statistic that corresponds to one of the post-change alternatives exceeds a certain threshold. It is shown that in certain cases, such as the so-called multichannel problem, this algorithm controls the worst-case conditional probability of false isolation and minimizes Lorden’s criterion, for every possible post-change distribution, to a first-order asymptotic approximation as the false alarm rate goes to zero sufficiently faster than the worst-case conditional probability of false isolation. These theoretical results are also illustrated with a numerical study. 
    more » « less
  2. This paper considers the change point detection problem under dependent samples. In particular, we provide performance guarantees for the MMD-CUSUM test under exponentially α, β, and fast ϕ-mixing processes, which significantly expands its utility beyond the i.i.d. and Markovian cases used in previous studies. We obtain lower bounds for average-run-length (ARL) and upper bounds for average-detection-delay (ADD) in terms of the threshold parameter. We show that the MMD-CUSUM test enjoys the same level of performance as the i.i.d. case under fast ϕ-mixing processes. The MMD-CUSUM test also achieves strong performance under exponentially α/β-mixing processes, which are significantly more relaxed than existing results. The MMD-CUSUM test statistic adapts to different settings without modifications, rendering it a completely data-driven, dependence-agnostic change point detection scheme. Numerical simulations are provided at the end to evaluate our findings. 
    more » « less
  3. Abstract Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximising the ℓ∞-norm of the generalised CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results. 
    more » « less
  4. Astley, Susan M; Chen, Weijie (Ed.)
    Devices enabled by artificial intelligence (AI) and machine learning (ML) are being introduced for clinical use at an accelerating pace. In a dynamic clinical environment, these devices may encounter conditions different from those they were developed for. The statistical data mismatch between training/initial testing and production is often referred to as data drift. Detecting and quantifying data drift is significant for ensuring that AI model performs as expected in clinical environments. A drift detector signals when a corrective action is needed if the performance changes. In this study, we investigate how a change in the performance of an AI model due to data drift can be detected and quantified using a cumulative sum (CUSUM) control chart. To study the properties of CUSUM, we first simulate different scenarios that change the performance of an AI model. We simulate a sudden change in the mean of the performance metric at a change-point (change day) in time. The task is to quickly detect the change while providing few false-alarms before the change-point, which may be caused by the statistical variation of the performance metric over time. Subsequently, we simulate data drift by denoising the Emory Breast Imaging Dataset (EMBED) after a pre-defined change-point. We detect the change-point by studying the pre- and post-change specificity of a mammographic CAD algorithm. Our results indicate that with the appropriate choice of parameters, CUSUM is able to quickly detect relatively small drifts with a small number of false-positive alarms. 
    more » « less
  5. The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied to the model during test time. Our constrained optimization objective is inspired from a natural notion of robustness to controlled distribution shifts. Our method comes with provable guarantees and empirically makes a strong case for distributional robust post-hoc classifiers. An empirical implementation is available at https://github.com/weijiaheng/Drops. 
    more » « less