skip to main content


Title: Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data
Abstract

Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest. In this paper, we develop a novel Bayesian network (BN) model for multivariate functional data where conditional independencies and causal structure are encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian processes, which is the key to unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional BN model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples demonstrate the practical utility of the proposed model.

 
more » « less
Award ID(s):
1940107 1918854
NSF-PAR ID:
10448255
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 3279-3293
Size(s):
["p. 3279-3293"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.

     
    more » « less
  2. Abstract

    Functional data are often extremely high-dimensional and exhibit strong dependence structures but can often prove valuable for both prediction and inference. The literature on functional data analysis is well developed; however, there has been very little work involving functional data in complex survey settings. Motivated by physical activity monitor data from the National Health and Nutrition Examination Survey (NHANES), we develop a Bayesian model for functional covariates that can properly account for the survey design. Our approach is intended for non-Gaussian data and can be applied in multivariate settings. In addition, we make use of a variety of Bayesian modeling techniques to ensure that the model is fit in a computationally efficient manner. We illustrate the value of our approach through two simulation studies as well as an example of mortality estimation using NHANES data.

     
    more » « less
  3. Abstract

    The leaf economic spectrum is a widely studied axis of plant trait variability that defines a trade‐off between leaf longevity and productivity. While this has been investigated at the global scale, where it is robust, and at local scales, where deviations from it are common, it has received less attention at the intermediate scale of plant functional types (PFTs). We investigated whether global leaf economic relationships are also present within the scale of plant functional types (PFTs) commonly used by Earth System models, and the extent to which this global‐PFThierarchy can be used to constrain trait estimates. We developed a hierarchical multivariate Bayesian model that assumes separate means and covariance structures within and acrossPFTs and fit this model to seven leaf traits from theTRYdatabase related to leaf longevity, morphology, biochemistry, and photosynthetic metabolism. Although patterns of trait covariation were generally consistent with the leaf economic spectrum, we found three approximate tiers to this consistency. Relationships among morphological and biochemical traits (specific leaf area [SLA], N, P) were the most robust within and acrossPFTs, suggesting that covariation in these traits is driven by universal leaf construction trade‐offs and stoichiometry. Relationships among metabolic traits (dark respiration [Rd], maximum RuBisCo carboxylation rate [Vc,max], maximum electron transport rate [Jmax]) were slightly less consistent, reflecting in part their much sparser sampling (especially for high‐latitudePFTs), but also pointing to more flexible plasticity in plant metabolistm. Finally, relationships involving leaf lifespan were the least consistent, indicating that leaf economic relationships related to leaf lifespan are dominated by across‐PFTdifferences and that within‐PFTvariation in leaf lifespan is more complex and idiosyncratic. Across all traits, this covariance was an important source of information, as evidenced by the improved imputation accuracy and reduced predictive uncertainty in multivariate models compared to univariate models. Ultimately, our study reaffirms the value of studying not just individual traits but the multivariate trait space and the utility of hierarchical modeling for studying the scale dependence of trait relationships.

     
    more » « less
  4. Recent advances in post-windstorm reconnaissance have accelerated the amounts of perishable building performance data being collected after extreme windstorms, necessitating better frameworks for knowledge discovery from the data. One particularly promising approach to this need is Bayesian Networks (BN), which have grown in their application in natural hazards research due to their ability to explicitly model causal factors. In this study, a Naïve Bayes Network (NBN) was first developed to observe the influence of wind speed ratio, roof shape, number of stories, roof cover, and pre/post-IBC (2002) on the damage class of a structure and predict the probability of each damage class given a specified scenario. This initial model was derived solely from empirical data and the parameters of influence are modelled with conditional independence, and limiting the model’s use. An illustrative hybrid Bayesian Network is also proposed which combines empirical data, known wind engineering theory, and expert opinion to formulate a more holistic model of structural performance in windstorms better suited for parameter inference and building performance predictions. 
    more » « less
  5. Abstract

    Multivariate functional data present theoretical and practical complications that are not found in univariate functional data. One of these is a situation where the component functions of multivariate functional data are positive and are subject to mutual time warping. That is, the component processes exhibit a common shape but are subject to systematic phase variation across their domains in addition to subject‐specific time warping, where each subject has its own internal clock. This motivates a novel model for multivariate functional data that connect such mutual time warping to a latent‐deformation‐based framework by exploiting a novel time‐warping separability assumption. This separability assumption allows for meaningful interpretation and dimension reduction. The resulting latent deformation model is shown to be well suited to represent commonly encountered functional vector data. The proposed approach combines a random amplitude factor for each component with population‐based registration across the components of a multivariate functional data vector and includes a latent population function, which corresponds to a common underlying trajectory. We propose estimators for all components of the model, enabling implementation of the proposed data‐based representation for multivariate functional data and downstream analyses such as Fréchet regression. Rates of convergence are established when curves are fully observed or observed with measurement error. The usefulness of the model, interpretations, and practical aspects are illustrated in simulations and with application to multivariate human growth curves and multivariate environmental pollution data.

     
    more » « less