We propose a new approach, the calibrated nonparametric scan statistic (CNSS), for more accurate detection of anomalous patterns in large-scale, real-world graphs. Scan statistics identify connected subgraphs that are interesting or unexpected through maximization of a likelihood ratio statistic; in particular, nonparametric scan statistics (NPSSs) identify subgraphs with a higher than expected proportion of individually significant nodes. However, we show that recently proposed NPSS methods are miscalibrated, failing to account for the maximization of the statistic over the multiplicity of subgraphs. This results in both reduced detection power for subtle signals, and low precision of the detected subgraph even for stronger signals. Thus we develop a new statistical approach to recalibrate NPSSs, correctly adjusting for multiple hypothesis testing and taking the underlying graph structure into account. While the recalibration, based on randomization testing, is computationally expensive, we propose both an efficient (approximate) algorithm and new, closed-form lower bounds (on the expected maximum proportion of significant nodes for subgraphs of a given size, under the null hypothesis of no anomalous patterns). These advances, along with the integration of recent core-tree decomposition methods, enable CNSS to scale to large real-world graphs, with substantial improvement in the accuracy of detected subgraphs. Extensive experiments on both semi-synthetic and real-world datasets are demonstrated to validate the effectiveness of our proposed methods, in comparison with state-of-the-art counterparts.
more »
« less
Learning Patterns for Detection with Multiscale Scan Statistics
This paper addresses detecting anomalous patterns in images, time-series, and tensor data when the location and scale of the pattern and the pattern itself is unknown a priori. The multiscale scan statistic convolves the proposed pattern with the image at various scales and returns the maximum of the resulting tensor. Scale corrected multiscale scan statistics apply different standardizations at each scale, and the limiting distribution under the null hypothesis---that the data is only noise---is known for smooth patterns. We consider the problem of simultaneously learning and detecting the anomalous pattern from a dictionary of smooth patterns and a database of many tensors. To this end, we show that the multiscale scan statistic is a subexponential random variable, and prove a chaining lemma for standardized suprema, which may be of independent interest. Then by averaging the statistics over the database of tensors we can learn the pattern and obtain Bernstein-type error bounds. We will also provide a construction of an epsilon-net of the location and scale parameters, providing a computationally tractable approximation with similar error bounds.
more »
« less
- Award ID(s):
- 1712996
- NSF-PAR ID:
- 10061398
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research (31st Annual Conference on Learning Theory))
- Volume:
- 75
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In many real-world applications of monitoring multivariate spatio-temporal data that are non-stationary over time, one is often interested in detecting hot-spots with spatial sparsity and temporal consistency, instead of detecting system-wise changes as in traditional statistical process control (SPC) literature. In this paper, we propose an efficient method to detect hot-spots through tensor decomposition, and our method has three steps. First, we fit the observed data into a Smooth Sparse Decomposition Tensor (SSD-Tensor) model that serves as a dimension reduction and de-noising technique: it is an additive model decomposing the original data into: smooth but non-stationary global mean, sparse local anomalies, and random noises. Next, we estimate model parameters by the penalized framework that includes Least Absolute Shrinkage and Selection Operator (LASSO) and fused LASSO penalty. An efficient recursive optimization algorithm is developed based on Fast Iterative Shrinkage Thresholding Algorithm (FISTA). Finally, we apply a Cumulative Sum (CUSUM) Control Chart to monitor model residuals after removing global means, which helps to detect when and where hot-spots occur. To demonstrate the usefulness of our proposed SSD-Tensor method, we compare it with several other methods including scan statistics, LASSO-based, PCA-based, T2-based control chart in extensive numerical simulation studies and a real crime rate dataset.more » « less
-
We generalize the spatial and subset scan statistics from the single to the multiple subset case. The two main approaches to defining the log-likelihood ratio statistic in the single subset case—the population-based and expectation-based scan statistics—are considered, leading to risk partitioning and multiple cluster detection scan statistics, respectively. We show that, for distributions in a separable exponential family, the risk partitioning scan statistic can be expressed as a scaled f-divergence of the normalized count and baseline vectors, and the multiple cluster detection scan statistic as a sum of scaled Bregman divergences. In either case, however, maximization of the scan statistic by exhaustive search over all partitionings of the data requires exponential time. To make this optimization computationally feasible, we prove sufficient conditions under which the optimal partitioning is guaranteed to be consecutive. This Consecutive Partitions Property generalizes the linear-time subset scanning property from two partitions (the detected subset and the remaining data elements) to the multiple partition case. While the number of consecutive partitionings of n elements into t partitions scales as O(n^(t−1)), making it computationally expensive for large t, we present a dynamic programming approach which identifies the optimal consecutive partitioning in O(n^2 t) time, thus allowing for the exact and efficient solution of large-scale risk partitioning and multiple cluster detection problems. Finally, we demonstrate the detection performance and practical utility of partition scan statistics using simulated and real-world data. Supplementary materials for this article are available online.more » « less
-
null (Ed.)Abstract Higher-order tensors can represent scores in a rating system, frames in a video, and images of the same subject. In practice, the measurements are often highly quantized due to the sampling strategies or the quality of devices. Existing works on tensor recovery have focused on data losses and random noises. Only a few works consider tensor recovery from quantized measurements but are restricted to binary measurements. This paper, for the first time, addresses the problem of tensor recovery from multi-level quantized measurements by leveraging the low CANDECOMP/PARAFAC (CP) rank property. We study the recovery of both general low-rank tensors and tensors that have tensor singular value decomposition (TSVD) by solving nonconvex optimization problems. We provide the theoretical upper bounds of the recovery error, which diminish to zero when the sizes of dimensions increase to infinity. We further characterize the fundamental limit of any recovery algorithm and show that our recovery error is nearly order-wise optimal. A tensor-based alternating proximal gradient descent algorithm with a convergence guarantee and a TSVD-based projected gradient descent algorithm are proposed to solve the nonconvex problems. Our recovery methods can also handle data losses and do not necessarily need the information of the quantization rule. The methods are validated on synthetic data, image datasets, and music recommender datasets.more » « less
-
Byrka, Jaroslaw ; Meka, Raghu (Ed.)In this work, we prove new relations between the bias of multilinear forms, the correlation between multilinear forms and lower degree polynomials, and the rank of tensors over F₂. We show the following results for multilinear forms and tensors. Correlation bounds. We show that a random d-linear form has exponentially low correlation with low-degree polynomials. More precisely, for d = 2^{o(k)}, we show that a random d-linear form f(X₁,X₂, … , X_d) : (F₂^{k}) ^d → F₂ has correlation 2^{-k(1-o(1))} with any polynomial of degree at most d/2 with high probability. This result is proved by giving near-optimal bounds on the bias of a random d-linear form, which is in turn proved by giving near-optimal bounds on the probability that a sum of t random d-dimensional rank-1 tensors is identically zero. Tensor rank vs Bias. We show that if a 3-dimensional tensor has small rank then its bias, when viewed as a 3-linear form, is large. More precisely, given any 3-dimensional tensor T: [k]³ → F₂ of rank at most t, the bias of the 3-linear form f_T(X₁, X₂, X₃) : = ∑_{(i₁, i₂, i₃) ∈ [k]³} T(i₁, i₂, i₃)⋅ X_{1,i₁}⋅ X_{2,i₂}⋅ X_{3,i₃} is at least (3/4)^t. This bias vs tensor-rank connection suggests a natural approach to proving nontrivial tensor-rank lower bounds. In particular, we use this approach to give a new proof that the finite field multiplication tensor has tensor rank at least 3.52 k, which is the best known rank lower bound for any explicit tensor in three dimensions over F₂. Moreover, this relation between bias and tensor rank holds for d-dimensional tensors for any fixed d.more » « less