skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: Erlang mixture modeling for Poisson process intensities
Abstract

We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and is illustrated with synthetic and real data examples.

 
more » « less
Award ID(s):
1950902
NSF-PAR ID:
10363460
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Statistics and Computing
Volume:
32
Issue:
1
ISSN:
0960-3174
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The simultaneous testing of multiple hypotheses is common to the analysis of high‐dimensional data sets. The two‐group model, first proposed by Efron, identifies significant comparisons by allocating observations to a mixture of an empirical null and an alternative distribution. In the Bayesian nonparametrics literature, many approaches have suggested using mixtures of Dirichlet Processes in the two‐group model framework. Here, we investigate employing mixtures of two‐parameter Poisson‐Dirichlet Processes instead, and show how they provide a more flexible and effective tool for large‐scale hypothesis testing. Our model further employs nonlocal prior densities to allow separation between the two mixture components. We obtain a closed‐form expression for the exchangeable partition probability function of the two‐group model, which leads to a straightforward Markov Chain Monte Carlo implementation. We compare the performance of our method for large‐scale inference in a simulation study and illustrate its use on both a prostate cancer data set and a case‐control microbiome study of the gastrointestinal tracts in children from underdeveloped countries who have been recently diagnosed with moderate‐to‐severe diarrhea.

     
    more » « less
  2. null (Ed.)
    We extend network tomography to traffic flows that are not necessarily Poisson random processes. This assumption has governed the field since its inception in 1996 by Y. Vardi. We allow the distribution of the packet count of each traffic flow in a given time interval to be a mixture of Poisson random variables. Both discrete as well as continuous mixtures are studied. For the latter case, we focus on mixed Poisson distributions with Gamma mixing distribution. As is well known, this mixed Poisson distribution is the negative binomial distribution. Other mixing distributions, such as Wald or the inverse Gaussian distribution can be used. Mixture distributions are overdispersed with variance larger than the mean. Thus, they are more suitable for Internet traffic than the Poisson model. We develop a second-order moment matching approach for estimating the mean traffic rate for each source-destination pair using least squares and the minimum I-divergence iterative procedure. We demonstrate the performance of the proposed approach by several numerical examples. The results show that the averaged normalized mean squared error in rate estimation is of the same order as in the classic Poisson based network tomography. Furthermore, no degradation in performance was observed when traffic rates are Poisson but Poisson mixtures are assumed. 
    more » « less
  3. Abstract

    The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling heterogeneity; however, previous parameter estimation procedures assume that the data is generated through a simple random sampling process. We propose using a semi‐supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same source, yet an unknown subpopulation. A simulation study and a real data analysis based on famous glass datasets and a keystroke dynamic typing data set show that the proposed approach performs better than other approaches that have been used previously in practice.

     
    more » « less
  4. In this paper, we develop structure assisted nonnegative matrix factorization (NMF) methods for blind source separation of degenerate data. The motivation originates from nuclear magnetic resonance (NMR) spectroscopy, where a multiple mixture NMR spectra are recorded to identify chemical compounds with similar structures. Consider the linear mixing model (LMM), we aim to identify the chemical compounds involved when the mixing process is known to be nearly singular. We first consider a class of data with dominant interval(s) (DI) where each of source signals has dominant peaks over others. Besides, a nearly singular mixing process produces degenerate mixtures. The DI condition implies clustering structures in the data points. Hence, the estimation of the mixing matrix could be achieved by data clustering. Due to the presence of the noise and the degeneracy of the data, a small deviation in the estimation may introduce errors in the output. To resolve this problem and improve robustness of the separation, methods are developed in two aspects. One is to find better estimation of the mixing matrix by allowing a constrained perturbation to the clustering output, and it can be achieved by a quadratic programming. The other is to seek sparse source signals by exploiting the DI condition, and it solves an 1 optimization. If no source information is available, we propose to adopt the nonnegative matrix factorization approach by incorporating the matrix structure (parallel columns of the mixing matrix) into the cost function and develop multiplicative iteration rules for the numerical solutions. We present experimental results of NMR data to show the performance and reliability of the method in the applications arising in NMR spectroscopy. 
    more » « less
  5. Summary

    In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ‘damaged’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k-levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to ∞ and their connection with Dirichlet process mixtures.

     
    more » « less