The spatial distribution of disease cases can provide important insights into disease spread and its potential risk factors. Identifying disease clusters correctly can help us discover new risk factors and inform interventions to control and prevent the spread of disease as quickly as possible. In this study, we propose a novel scan method, the Prefiltered Component‐based Greedy (PreCoG) scan method, which efficiently and accurately detects irregularly shaped clusters using a prefiltered component‐based algorithm. The PreCoG scan method's flexibility allows it to perform well in detecting both regularly and irregularly‐shaped clusters. Additionally, it is fast to apply while providing high power, sensitivity, and positive predictive value for the detected clusters compared to other scan methods. To confirm the effectiveness of the PreCoG method, we compare its performance to many other scan methods. Additionally, we have implemented this method in thesmercR package to make it publicly available to other researchers. Our proposed PreCoG scan method presents a unique and innovative process for detecting disease clusters and can improve the accuracy of disease surveillance systems.
more »
« less
Flexible-Elliptical Spatial Scan Method
The detection of disease clusters in spatial data analysis plays a crucial role in public health, while the circular scan method is widely utilized for this purpose, accurately identifying non-circular (irregular) clusters remains challenging and reduces detection accuracy. To overcome this limitation, various extensions have been proposed to effectively detect arbitrarily shaped clusters. In this paper, we combine the strengths of two well-known methods, the flexible and elliptic scan methods, which are specifically designed for detecting irregularly shaped clusters. We leverage the unique characteristics of these methods to create candidate zones capable of accurately detecting irregularly shaped clusters, along with a modified likelihood ratio test statistic. By inheriting the advantages of the flexible and elliptic methods, our proposed approach represents a practical addition to the existing repertoire of spatial data analysis techniques.
more »
« less
- Award ID(s):
- 1915277
- PAR ID:
- 10470488
- Publisher / Repository:
- Mathematics
- Date Published:
- Journal Name:
- Mathematics
- Volume:
- 11
- Issue:
- 17
- ISSN:
- 2227-7390
- Page Range / eLocation ID:
- 3627
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Estimating the optimal population upper bound for scan methods in retrospective disease surveillanceAbstract Correctly and quickly identifying disease patterns and clusters is a vital aspect of public health and epidemiology so that disease outbreaks can be mitigated as effectively as possible. The circular scan method is one of the most commonly used methods for detecting disease outbreaks and clusters in retrospective and prospective disease surveillance. The circular scan method requires a population upper bound in order to construct the set of candidate zones to be scanned, which is usually set to 50% of the total population. The performance of the circular scan method is affected by the choice of the population upper bound, and choosing an upper bound different from the default value can improve the method's performance. Recently, the Gini coefficient based on the Lorenz curve, which was originally used in economics, was proposed to determine a better population upper bound. We present the elbow method, a new method for choosing the population upper bound, which seeks to address some of the limitations of the Gini‐based method while improving the performance of the circular scan method over the default value. To evaluate the performance of the proposed approach, we evaluate the sensitivity and positive predictive value of the circular scan method for publicly‐available benchmark data for the default value, the Gini coefficient method, and the elbow method.more » « less
-
null (Ed.)We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of highdimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points.more » « less
-
null (Ed.)We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points.more » « less
-
We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points.more » « less
An official website of the United States government

