skip to main content

Title: A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening
In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints. This rate is seen to be suboptimal (compared to the minimax rate) by only a factor of loglogn. Our proof technique is centered around a novel construction that we call a lower interpolant. We extend our results to misspecified models and exponential family distributions. We also describe the implications of our error analysis for the approximate screening of changepoints.
Authors:
; ; ;
Award ID(s):
1712996
Publication Date:
NSF-PAR ID:
10061394
Journal Name:
Advances in neural information processing systems
Volume:
30
Page Range or eLocation-ID:
6884 - 6893
ISSN:
1049-5258
Sponsoring Org:
National Science Foundation
More Like this
  1. Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost.
  2. Summary This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.
  3. Abstract Across the social sciences, scholars regularly pool effects over substantial periods of time, a practice that produces faulty inferences if the underlying data generating process is dynamic. To help researchers better perform principled analyses of time-varying processes, we develop a two-stage procedure based upon techniques for permutation testing and statistical process monitoring. Given time series cross-sectional data, we break the role of time through permutation inference and produce a null distribution that reflects a time-invariant data generating process. The null distribution then serves as a stable reference point, enabling the detection of effect changepoints. In Monte Carlo simulations, our randomization technique outperforms alternatives for changepoint analysis. A particular benefit of our method is that, by establishing the bounds for time-invariant effects before interacting with actual estimates, it is able to differentiate stochastic fluctuations from genuine changes. We demonstrate the method’s utility by applying it to a popular study on the relationship between alliances and the initiation of militarized interstate disputes. The example illustrates how the technique can help researchers make inferences about where changes occur in dynamic relationships and ask important questions about such changes.
  4. To explore the prevalence of abrupt changes (changepoints) in open source project activity, we assembled a dataset of 8,919 projects from the World of Code. Projects were selected based on age, number of commits, and number of authors. Using the nonparametric PELT algorithm, we identified changepoints in project activity time series, finding that more than 90% of projects had between one and six changepoints. Increases and decreases in project activity occurred with roughly equal frequency. While most changes are relatively small, on the order of a few authors or few dozen commits per month, there were long tails of much larger project activity changes. In future work, we plan to focus on larger changes to search for common open source lifecycle patterns as well as common responses to external events.
  5. A paper by Zhang et al. in 2001, “On the Constancy of Internet Path Properties” [1] examined the constancy of end- to-end packet loss, latency, and throughput using a modest set of hosts deployed in the Internet. In the time since that work, the Internet has changed dramatically, including the flattening of the autonomous system hierarchy and increased deployment of IPv6, among other developments. In this paper, we investigate the constancy of end-to-end Internet latency, revisiting findings of the earlier study. We use latency measurements from RIPE Atlas, choosing a set of 124 anchors with broad geographic distribution and drawn from 112 distinct autonomous systems. The earlier work of Zhang et al. relies on changepoint detection methods to identify mathematically constant time periods. We reimplement the two methods described in that earlier work and use them on the RIPE Atlas latency measurements. We also use a recently- published method (HMM-HDP) that has direct support in a RIPE Atlas API. Comparing the three changepoint detection methods, we find that the two methods used in the earlier work may miss many changepoints caused by common level-shift events. Overall, we find that the recently proposed HMM-HDP method performs substantially better. Moreover, we findmore »that delay spikes—as defined by the earlier work—are an order of magnitude less prevalent than 20 years ago. We also find that maximum change- free regions (CFRs) along paths that we observe in today’s Internet are substantially longer than what was observed in 2001, regardless of the changepoint detection method used. In particular, the 50th percentile maximum CFR was on the order of 30 minutes in the earlier study, but our analysis reveals it to be on the order of 3 days or longer. Moreover, we find that CFR durations appear to have steadily increased over the past 5 years.« less