Abstract Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data. 
                        more » 
                        « less   
                    
                            
                            Functional Bayesian Networks for Discovering Causality from Multivariate Functional Data
                        
                    
    
            Abstract Multivariate functional data arise in a wide range of applications. One fundamental task is to understand the causal relationships among these functional objects of interest. In this paper, we develop a novel Bayesian network (BN) model for multivariate functional data where conditional independencies and causal structure are encoded by a directed acyclic graph. Specifically, we allow the functional objects to deviate from Gaussian processes, which is the key to unique causal structure identification even when the functions are measured with noises. A fully Bayesian framework is designed to infer the functional BN model with natural uncertainty quantification through posterior summaries. Simulation studies and real data examples demonstrate the practical utility of the proposed model. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10448255
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 79
- Issue:
- 4
- ISSN:
- 0006-341X
- Format(s):
- Medium: X Size: p. 3279-3293
- Size(s):
- p. 3279-3293
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Functional data are often extremely high-dimensional and exhibit strong dependence structures but can often prove valuable for both prediction and inference. The literature on functional data analysis is well developed; however, there has been very little work involving functional data in complex survey settings. Motivated by physical activity monitor data from the National Health and Nutrition Examination Survey (NHANES), we develop a Bayesian model for functional covariates that can properly account for the survey design. Our approach is intended for non-Gaussian data and can be applied in multivariate settings. In addition, we make use of a variety of Bayesian modeling techniques to ensure that the model is fit in a computationally efficient manner. We illustrate the value of our approach through two simulation studies as well as an example of mortality estimation using NHANES data.more » « less
- 
            Abstract Multivariate functional data present theoretical and practical complications that are not found in univariate functional data. One of these is a situation where the component functions of multivariate functional data are positive and are subject to mutual time warping. That is, the component processes exhibit a common shape but are subject to systematic phase variation across their domains in addition to subject‐specific time warping, where each subject has its own internal clock. This motivates a novel model for multivariate functional data that connect such mutual time warping to a latent‐deformation‐based framework by exploiting a novel time‐warping separability assumption. This separability assumption allows for meaningful interpretation and dimension reduction. The resulting latent deformation model is shown to be well suited to represent commonly encountered functional vector data. The proposed approach combines a random amplitude factor for each component with population‐based registration across the components of a multivariate functional data vector and includes a latent population function, which corresponds to a common underlying trajectory. We propose estimators for all components of the model, enabling implementation of the proposed data‐based representation for multivariate functional data and downstream analyses such as Fréchet regression. Rates of convergence are established when curves are fully observed or observed with measurement error. The usefulness of the model, interpretations, and practical aspects are illustrated in simulations and with application to multivariate human growth curves and multivariate environmental pollution data.more » « less
- 
            Abstract Survey questionnaires are commonly used by psychologists and social scientists to measure various latent traits of study subjects. Various causal inference methods such as the potential outcome framework and structural equation models have been used to infer causal effects. However, the majority of these methods assume the knowledge of true causal structure, which is unknown for many applications in psychological and social sciences. This calls for alternative causal approaches for analyzing such questionnaire data. Bayesian networks are a promising option as they do not require causal structure to be knowna prioribut learn it objectively from data. Although we have seen some recent successes of using Bayesian networks to discover causality for psychological questionnaire data, their techniques tend to suffer from causal non-identifiability with observational data. In this paper, we propose the use of a state-of-the-art Bayesian network that is proven to be fully identifiable for observational ordinal data. We develop a causal structure learning algorithm based on an asymptotically justified BIC score function, a hill-climbing search strategy, and the bootstrapping technique, which is able to not only identify a unique causal structure but also quantify the associated uncertainty. Using simulation studies, we demonstrate the power of the proposed learning algorithm by comparing it with alternative Bayesian network methods. For illustration, we consider a dataset from a psychological study of the functional relationships among the symptoms of obsessive-compulsive disorder and depression. Without any prior knowledge, the proposed algorithm reveals some plausible causal relationships. This paper is accompanied by a user-friendly open-source R package OrdCD on CRAN.more » « less
- 
            Summary The covariance structure of multivariate functional data can be highly complex, especially if the multivariate dimension is large, making extensions of statistical methods for standard multivariate data to the functional data setting challenging. For example, Gaussian graphical models have recently been extended to the setting of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. However, compared with multivariate data, a key difficulty is that the covariance operator is compact and thus not invertible. This paper addresses the general problem of covariance modelling for multivariate functional data, and functional Gaussian graphical models in particular. As a first step, a new notion of separability for the covariance operator of multivariate functional data is proposed, termed partial separability, leading to a novel Karhunen–Loève-type expansion for such data. Next, the partial separability structure is shown to be particularly useful in providing a well-defined functional Gaussian graphical model that can be identified with a sequence of finite-dimensional graphical models, each of identical fixed dimension. This motivates a simple and efficient estimation procedure through application of the joint graphical lasso. Empirical performance of the proposed method for graphical model estimation is assessed through simulation and analysis of functional brain connectivity during a motor task.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
