A new approach to clustering multivariate data, based on a multilevel linear mixed model, is proposed. A key feature of the model is that observations from the same cluster are correlated, because they share clusterspecific random effects. The inclusion of clusterspecific random effects allows parsimonious departure from an assumed base model for cluster mean profiles. This departure is captured statistically via the posterior expectation, or best linear unbiased predictor. One of the parameters in the model is the true underlying partition of the data, and the posterior distribution of this parameter, which is known up to a normalizing constant, is used to cluster the data. The problem of finding partitions with high posterior probability is not amenable to deterministic methods such as the EM algorithm. Thus, we propose a stochastic search algorithm that is driven by a Markov chain that is a mixture of two Metropolis–Hastings algorithms—one that makes small scale changes to individual objects and another that performs large scale moves involving entire clusters. The methodology proposed is fundamentally different from the wellknown finite mixture model approach to clustering, which does not explicitly include the partition as a parameter, and involves an independent and identically distributed structure.
In the classical biased sampling problem, we have k densities π1(·), …,πk(·), each known up to a normalizing constant, i.e., for l = 1, …,k, πl(·)=νl(·)/ml, where νl(·) is a known function and ml is an unknown constant. For each l, we have an independent and identically distributed sample from πl, and the problem is to estimate the ratios ml/ms for all l and all s. This problem arises frequently in several situations in both frequentist and Bayesian inference. An estimate of the ratios was developed and studied by Vardi and his coworkers over two decades ago, and there has been much subsequent work on this problem from many perspectives. In spite of this, there are no rigorous results in the literature on how to estimate the standard error of the estimate. We present a class of estimates of the ratios of normalizing constants that are appropriate for the case where the samples from the πls are not necessarily independent and identically distributed sequences but are Markov chains. We also develop an approach based on regenerative simulation for obtaining standard errors for the estimates of ratios of normalizing constants. These standard error estimates are valid for both the independent and identically distributed samples case and the Markov chain case.
more » « less NSFPAR ID:
 10401327
 Publisher / Repository:
 Oxford University Press
 Date Published:
 Journal Name:
 Journal of the Royal Statistical Society Series B: Statistical Methodology
 Volume:
 76
 Issue:
 4
 ISSN:
 13697412
 Format(s):
 Medium: X Size: p. 683712
 Size(s):
 p. 683712
 Sponsoring Org:
 National Science Foundation
More Like this

Summary 
The analyses of interior penalty discontinuous Galerkin methods of any order k for solving elliptic and parabolic problems with Dirac line sources are presented. For the steady state case, we prove convergence of the method by deriving a priori error estimates in the L 2 norm and in weighted energy norms. In addition, we prove almost optimal local error estimates in the energy norm for any approximation order. Further, almost optimal local error estimates in the L 2 norm are obtained for the case of piecewise linear approximations whereas suboptimal error bounds in the L 2 norm are shown for any polynomial degree. For the timedependent case, convergence of semidiscrete and of backward Euler fully discrete scheme is established by proving error estimates in L 2 in time and in space. Numerical results for the elliptic problem are added to support the theoretical results.more » « less

We propose a strategy for computing estimators in some nonstandard Mestimation problems, where the data are distributed across different servers and the observations across servers, though independent, can come from heterogeneous subpopulations, thereby violating the identically distributed assumption. Our strategy fixes the superefficiency phenomenon observed in prior work on distributed computing in (i) the isotonic regression framework, where averaging several isotonic estimates (each computed at a local server) on a central server produces superefficient estimates that do not replicate the properties of the global isotonic estimator, i.e. the isotonic estimate that would be constructed by transferring all the data to a single server, and (ii) certain types of Mestimation problems involving optimization of discontinuous criterion functions where Mestimates converge at the cuberoot rate. The new estimators proposed in this paper work by smoothing the data on each local server, communicating the smoothed summaries to the central server, and then solving a nonlinear optimization problem at the central server. They are shown to replicate the asymptotic properties of the corresponding global estimators, and also overcome the superefficiency phenomenon exhibited by existing estimators.more » « less

Abstract The paper addresses an error analysis of an Eulerian finite element method used for solving a linearized Navier–Stokes problem in a timedependent domain. In this study, the domain’s evolution is assumed to be known and independent of the solution to the problem at hand. The numerical method employed in the study combines a standard backward differentiation formulatype timestepping procedure with a geometrically unfitted finite element discretization technique. Additionally, Nitsche’s method is utilized to enforce the boundary conditions. The paper presents a convergence estimate for several velocity–pressure elements that are infsup stable. The estimate demonstrates optimal order convergence in the energy norm for the velocity component and a scaled $L^{2}(H^{1})$type norm for the pressure component.

Markov chain Monte Carlo (MCMC) methods generate samples that are asymptotically distributed from a target distribution of interest as the number of iterations goes to infinity. Various theoretical results provide upper bounds on the distance between the target and marginal distribution after a fixed number of iterations. These upper bounds are on a case by case basis and typically involve intractable quantities, which limits their use for practitioners. We introduce Llag couplings to generate computable, nonasymptotic upper bound estimates for the total variation or the Wasserstein distance of general Markov chains. We apply Llag couplings to the tasks of (i) determining MCMC burnin, (ii) comparing different MCMC algorithms with the same target, and (iii) comparing exact and approximate MCMC. Lastly, we (iv) assess the bias of sequential Monte Carlo and selfnormalized importance samplers.more » « less