skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Joint Gaussian graphical model estimation: A survey
Abstract Graphs representing complex systems often share a partial underlying structure across domains while retaining individual features. Thus, identifying common structures can shed light on the underlying signal, for instance, when applied to scientific discovery or clinical diagnoses. Furthermore, growing evidence shows that the shared structure across domains boosts the estimation power of graphs, particularly for high‐dimensional data. However, building a joint estimator to extract the common structure may be more complicated than it seems, most often due to data heterogeneity across sources. This manuscript surveys recent work on statistical inference of joint Gaussian graphical models, identifying model structures that fit various data generation processes. This article is categorized under:Data: Types and Structure > Graph and Network DataStatistical Models > Graphical Models  more » « less
Award ID(s):
2046795
PAR ID:
10379699
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
WIREs Computational Statistics
Volume:
14
Issue:
6
ISSN:
1939-5108
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general. This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection 
    more » « less
  2. Abstract Changes in land use and climate change threaten global biodiversity and ecosystems, calling for the urgent development of effective conservation strategies. Recognizing landscape heterogeneity, which refers to the variation in natural features within an area, is crucial for these strategies. While remote sensing images quantify landscape heterogeneity, they might fail to detect ecological patterns in moderately disturbed areas, particularly at minor spatial scales. This is partly because satellite imagery may not effectively capture undergrowth conditions due to its resolution constraints. In contrast, soundscape analysis, which studies environmental acoustic signals, emerges as a novel tool for understanding ecological patterns, providing reliable information on habitat conditions and landscape heterogeneity in complex environments across diverse scales and serving as a complement to remote sensing methods.We propose an unsupervised approach using passive acoustic monitoring data and network inference methods to analyse acoustic heterogeneity patterns based on biophony composition. This method uses sonotypes, unique acoustic entities characterized by their specific time‐frequency spaces, to establish the acoustic structure of a site through sonotype occurrences, focusing on general biophony rather than specific species and providing information on the acoustic footprint of a site. From a sonotype composition matrix, we use the Graphical Lasso method, a sparse Gaussian graphical model, to identify acoustic similarities across sites, map ecological complexity relationships through the nodes (sites) and edges (similarities), and transform acoustic data into a graphical representation of ecological interactions and landscape acoustic diversity.We implemented the proposed method across 17 sites within an oil palm plantation in Santander, Colombia. The resulting inferred graphs visualize the acoustic similarities among sites, reflecting the biophony achieved by characterizing the landscape through its acoustic structures. Correlating our findings with ecological metrics like the Bray–Curtis dissimilarity index and satellite imagery indices reveals significant insights into landscape heterogeneity.This unsupervised approach offers a new perspective on understanding ecological and biological interactions and advances soundscape analysis. The soundscape decomposition into sonotypes underscores the method's advantage, offering the possibility to associate sonotypes with species and identify their contribution to the similarity proposed by the graph. 
    more » « less
  3. Abstract Population ecology and biogeography applications often necessitate the transfer of models across spatial and/or temporal dimensions to make predictions outside the bounds of the data used for model fitting. However, ecological data are often spatiotemporally unbalanced such that the spatial or the temporal dimension tends to contain more data than the other. This unbalance frequently leads model transfers to become substitutions, which are predictions to a different dimension than the predictive model was built on. Despite the prevalence of substitutions in ecology, studies validating their performance and their underlying assumptions are scarce.Here, we present a case study demonstrating both space‐for‐time and time‐for‐space substitutions (TFSS) using emperor penguins (Aptenodytes forsteri) as the focal species. Using an abundance‐based species distribution model (aSDM) of adult emperor penguins in attendance during spring across 50 colonies, we predict long‐term annual fluctuations in fledgling abundance and breeding success at a single colony, Pointe Géologie. Subsequently, we construct statistical models from time series of extended counts on Pointe Géologie to predict average colony abundance distribution across 50 colonies.Our analysis reveals that the distance to nearest open water (NOW) exhibits the strongest association with both temporal and spatial data. Space‐for‐time substitution performance of the aSDM, as measured by the Pearson correlation coefficient, was 0.63 and 0.56 when predicting breeding success and fledgling abundance time series, respectively. Linear regression of fledgling abundance on NOW yields similar TFSS performance when predicting the abundance distribution of emperor penguin colonies with a correlation coefficient of 0.58.We posit that such space–time equivalence arises because: (1) emperor penguin colonies conform to their existing fundamental niche; (2) there is not yet any environmental novelty when comparing the spatial versus temporal variation of distance to the nearest open water; and (3) models of more specific components of life histories, such as fledgling abundance, rather than total population abundance, are more transferable. Identifying these conditions empirically can enhance the qualitative validation of substitutions in cases where direct validation data are lacking. 
    more » « less
  4. Abstract Change‐point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large‐scale, high‐dimensional, and complex streaming data call for computationally (memory) efficient sequential change‐point detection algorithms that are also statistically powerful. This gives rise to a computation versus statistical power trade‐off, an aspect less emphasized in the past in classic literature. This tutorial takes this new perspective and reviews several sequential change‐point detection procedures, ranging from classic sequential change‐point detection algorithms to more recent non‐parametric procedures that consider computation, memory efficiency, and model robustness in the algorithm design. Our survey also contains classic performance analysis, which provides useful techniques for analyzing new procedures. This article is categorized under:Statistical Models > Time Series ModelsAlgorithms and Computational Methods > AlgorithmsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data 
    more » « less
  5. Abstract Fusion learning methods, developed for the purpose of analyzing datasets from many different sources, have become a popular research topic in recent years. Individualized inference approaches through fusion learning extend fusion learning approaches to individualized inference problems over a heterogeneous population, where similar individuals are fused together to enhance the inference over the target individual. Both classical fusion learning and individualized inference approaches through fusion learning are established based on weighted aggregation of individual information, but the weight used in the latter is localized to thetargetindividual. This article provides a review on two individualized inference methods through fusion learning,iFusion andiGroup, that are developed under different asymptotic settings. Both procedures guarantee optimal asymptotic theoretical performance and computational scalability. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Manifold LearningStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Nonparametric MethodsData: Types and Structure > Massive Data 
    more » « less