skip to main content


Title: Joint Gaussian graphical model estimation: A survey
Abstract

Graphs representing complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discovery or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high‐dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes.

This article is categorized under:

Data: Types and Structure > Graph and Network Data

Statistical Models > Graphical Models

 
more » « less
Award ID(s):
2046795
PAR ID:
10379699
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Statistics
Volume:
14
Issue:
6
ISSN:
1939-5108
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Networks effectively capture interactions among components of complex systems, and have thus become a mainstay in many scientific disciplines. Growing evidence, especially from biology, suggest that networks undergo changes over time, and in response to external stimuli. In biology and medicine, these changes have been found to be predictive of complex diseases. They have also been used to gain insight into mechanisms of disease initiation and progression. Primarily motivated by biological applications, this article provides a review of recent statistical machine learning methods for inferring networks and identifying changes in their structures.

    This article is categorized under:

    Data: Types and Structure > Graph and Network Data

    Statistical Models > Graphical Models

     
    more » « less
  2. Abstract

    The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.

    This article is categorized under:

    Data: Types and Structure > Traditional Statistical Data

    Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods

    Statistical and Graphical Methods of Data Analysis > Information Theoretic Methods

    Statistical Models > Model Selection

     
    more » « less
  3. Abstract

    Graph theory has a long history in chemistry. Yet as the breadth and variety of chemical data is rapidly changing, so too do graph encoding methods and analyses that yield qualitative and quantitative insights. Using illustrative cases within a basic mathematical framework, we showcase modern chemical graph theory's utility in Chemists' analysis and model development toolkit. The encoding of both experimental and simulation data is discussed at various levels of granularity of information. This is followed by a discussion of the two major classes of graph theoretical analyses: identifying connectivity patterns and partitioning methods. Measures, metrics, descriptors, and topological indices are then introduced with an emphasis upon enhancing interpretability and incorporation into physical models. Challenging data cases are described that include strategies for studying time dependence. Throughout, we incorporate recent advancements in computer science and applied mathematics that are propelling chemical graph theory into new domains of chemical study.

    This article is categorized under:

    Molecular and Statistical Mechanics > Molecular Dynamics and Monte‐Carlo Methods

    Structure and Mechanism > Computational Materials Science

    Structure and Mechanism > Molecular Structures

     
    more » « less
  4. Abstract

    Searching for patterns in data is important because it can lead to the discovery of sequence segments that play a functional role. The complexity of pattern statistics that are used in data analysis and the need of the sampling distribution of those statistics for inference renders efficient computation methods as paramount. This article gives an overview of the main methods used to compute distributions of statistics of overlapping pattern occurrences, specifically, generating functions, correlation functions, the Goulden‐Jackson cluster method, recursive equations, and Markov chain embedding. The underlying data sequence will be assumed to be higher‐order Markovian, which includes sparse Markov models and variable length Markov chains as special cases. Also considered will be recent developments for extending the computational capabilities of the Markov chain‐based method through an algorithm for minimizing the size of the chain's state space, as well as improved data modeling capabilities through sparse Markov models. An application to compute a distribution used as a test statistic in sequence alignment will serve to illustrate the usefulness of the methodology.

    This article is categorized under:

    Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition

    Data: Types and Structure > Categorical Data

    Statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms

     
    more » « less
  5. Abstract

    Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high‐throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open‐source pipeline for processing raw recordings and associated metadata into operative outcomes, publication‐worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2‐year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open‐source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field.image

    Key points

    Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life.

    Whole body plethysmography in rodents represents a high face‐value method for measuring respiratory outcomes in rodent models of these diseases and disorders.

    Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data.

    Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication‐worthy graphs with statistics.

    We validate Breathe Easy with a terabyte‐scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.

     
    more » « less