skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Second‐order preserving point process permutations
While random permutations of point processes are useful for generating counterfactuals in bivariate interaction tests, such permutations require that the underlying intensity be separable. In many real‐world datasets where clustering or inhibition is present, such an assumption does not hold. Here, we introduce a simple combinatorial optimization algorithm that generates second‐order preserving (SOP) point process permutations, for example, permutations of the times of events such that the function of the permuted process matches the function of the data. We apply the algorithm to synthetic data generated by a self‐exciting Hawkes process and a self‐avoiding point process, along with data from Los Angeles on earthquakes and arsons and data from Indianapolis on law enforcement drug seizures and overdoses. In all cases, we are able to generate a diverse sample of permuted point processes where the distribution of the functions closely matches that of the data. We then show how SOP point process permutations can be used in two applications: (1) bivariate Knox tests and (2) data augmentation to improve deep learning‐based space‐time forecasts.  more » « less
Award ID(s):
2317397 2124313
PAR ID:
10408975
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Stat
Volume:
12
Issue:
1
ISSN:
2049-1573
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Unlabeled sensing is a linear inverse problem with permuted measurements. We propose an alternating minimization (AltMin) algorithm with a suitable initialization for two widely considered permutation models: partially shuffled/k-sparse permutations and r-local/block diagonal permutations. Key to the performance of the AltMin algorithm is the initialization. For the exact unlabeled sensing problem, assuming either a Gaussian measurement matrix or a sub-Gaussian signal, we bound the initialization error in terms of the number of blocks and the number of shuffles. Experimental results show that our algorithm is fast, applicable to both permutation models, and robust to choice of measurement matrix. We also test our algorithm on several real datasets for the ‘linked linear regression’ problem and show superior performance compared to baseline methods. 
    more » « less
  2. null (Ed.)
    We present approximation and exact algorithms for piecewise regression of univariate and bivariate data using fixed-degree polynomials. Specifically, given a set S of n data points (x1, y1), . . . , (xn, yn) ∈ Rd × R where d ∈ {1, 2}, the goal is to segment xi’s into some (arbitrary) number of disjoint pieces P1, . . . , Pk, where each piece Pj is associated with a fixed-degree polynomial fj : Rd → R, to minimize the total loss function λk+􏰄ni=1(yi −f(xi))2, where λ ≥ 0 is a regularization term that penalizes model complexity (number of pieces) and f : 􏰇kj=1 Pj → R is the piecewise polynomial function defined as f|Pj = fj. The pieces P1,...,Pk are disjoint intervals of R in the case of univariate data and disjoint axis-aligned rectangles in the case of bivariate data. Our error approximation allows use of any fixed-degree polynomial, not just linear functions. Our main results are the following. For univariate data, we present a (1 + ε)-approximation algorithm with time complexity O(nε log1ε), assuming that data is presented in sorted order of xi’s. For bivariate data, we √ present three results: a sub-exponential exact algorithm with running time nO( n); a polynomial-time constant- approximation algorithm; and a quasi-polynomial time approximation scheme (QPTAS). The bivariate case is believed to be NP-hard in the folklore but we could not find a published record in the literature, so in this paper we also present a hardness proof for completeness. 
    more » « less
  3. We propose a general framework of using a multi-level log-Gaussian Cox process to model repeatedly observed point processes with complex structures; such type of data have become increasingly available in various areas including medical research, social sciences, economics, and finance due to technological advances. A novel nonparametric approach is developed to efficiently and consistently estimate the covariance functions of the latent Gaussian processes at all levels. To predict the functional principal component scores, we propose a consistent estimation procedure by maximizing the conditional likelihood of super-positions of point processes. We further extend our procedure to the bivariate point process case in which potential correlations between the processes can be assessed. Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset. 
    more » « less
  4. null (Ed.)
    We propose a general framework of using a multi-level log-Gaussian Cox process to model repeatedly observed point processes with complex structures; such type of data has become increasingly available in various areas including medical research, social sciences, economics, and finance due to technological advances. A novel nonparametric approach is developed to efficiently and consistently estimate the covariance functions of the latent Gaussian processes at all levels. To predict the functional principal component scores, we propose a consistent estimation procedure by maximizing the conditional likelihood of super-positions of point processes. We further extend our procedure to the bivariate point process case in which potential correlations between the processes can be assessed. Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset. 
    more » « less
  5. Summary Combining dependent $ p $-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, $ p $-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean $ p $-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of $ p $-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the $ p $-values. First, when $ p $-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided $ p $-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate $ t $-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where $ p $-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions. 
    more » « less