skip to main content

Title: Binary Models for Marginal Independence

Log-linear models are a classical tool for the analysis of contingency tables. In particular, the subclass of graphical log-linear models provides a general framework for modelling conditional independences. However, with the exception of special structures, marginal independence hypotheses cannot be accommodated by these traditional models. Focusing on binary variables, we present a model class that provides a framework for modelling marginal independences in contingency tables. The approach that is taken is graphical and draws on analogies with multivariate Gaussian models for marginal independence. For the graphical model representation we use bidirected graphs, which are in the tradition of path diagrams. We show how the models can be parameterized in a simple fashion, and how maximum likelihood estimation can be performed by using a version of the iterated conditional fitting algorithm. Finally we consider combining these models with symmetry restrictions.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Medium: X Size: p. 287-309
["p. 287-309"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Marginal log-linear (MLL) models provide a flexible approach to multivariate discrete data. MLL parameterizations under linear constraints induce a wide variety of models, including models that are defined by conditional independences. We introduce a subclass of MLL models which correspond to acyclic directed mixed graphs under the usual global Markov property. We characterize for precisely which graphs the resulting parameterization is variation independent. The MLL approach provides the first description of acyclic directed mixed graph models in terms of a minimal list of constraints. The parameterization is also easily adapted to sparse modelling techniques, which we illustrate by using several examples of real data.

    more » « less
  2. Summary

    Human rights data presents challenges for capture–recapture methodology. Lists of violent acts provided by many different groups create large, sparse tables of data for which saturated models are difficult to fit and for which simple models may be misspecified. We analyze data on killings and disappearances in Casanare, Colombia during years 1998 to 2007. Our estimates differ whether we choose to model marginal reporting probabilities and odds ratios, versus modeling the full reporting pattern in a conditional (log-linear) model. With 2629 observed killings, a marginal model we consider estimates over 9000 killings, while conditional models we consider estimate 6000–7000 killings. The latter agree with previous estimates, also from a conditional model. We see a twofold difference between the high sample coverage estimate of over 10,000 killings and low sample coverage lower bound estimate of 5200 killings. We use a simulation study to compare marginal and conditional models with at most two-way interactions and sample coverage estimators. The simulation results together with model selection criteria lead us to believe the previous estimates of total killings in Casanare may have been biased downward, suggesting that the violence was worse than previously thought. Model specification is an important consideration when interpreting population estimates from capture recapture analysis and the Casanare data is a protypical example of how that manifests.

    more » « less
  3. Summary

    Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach.

    more » « less
  4. Summary For multivariate spatial Gaussian process models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence between the variables. This is undesirable, especially in highly multivariate settings, where popular cross-covariance functions, such as multivariate Matérn functions, suffer from a curse of dimensionality as the numbers of parameters and floating-point operations scale up in quadratic and cubic order, respectively, with the number of variables. We propose a class of multivariate graphical Gaussian processes using a general construction called stitching that crafts cross-covariance functions from graphs and ensures process-level conditional independence between variables. For the Matérn family of functions, stitching yields a multivariate Gaussian process whose univariate components are Matérn Gaussian processes, and which conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Matérn Gaussian process to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling. 
    more » « less
  5. Summary

    We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.

    more » « less