Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.
This content will become publicly available on May 24, 2024
Successful modeling of degradation data is of great importance for both accurate reliability assessment and effective maintenance decision‐making. Many of existing degradation performance modeling approaches either assume a homogeneous population of units or characterize a heterogeneous population with some restrictive assumptions, such as pre‐specifying the number of sub‐populations. This paper proposes a Bayesian heterogeneous degradation performance modeling framework to relax the conventional modeling assumptions. Specifically, a Bayesian non‐parametric model formulation and learning algorithm are proposed to characterize the historical degradation data of a heterogeneous population of units with an unknown number of homogeneous sub‐populations and allowing the joint model estimation and sub‐population number identification. Based on the off‐line population‐level model, an on‐line individual‐level degradation model with sequential model updating is further developed to improve remaining useful life prediction of individual units with sparse data. A real case study using the heterogeneous degradation data of deteriorating roads is provided to illustrate the proposed approach and demonstrate its validity.more » « less
- NSF-PAR ID:
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Quality and Reliability Engineering International
- Medium: X Size: p. 2686-2705
- ["p. 2686-2705"]
- Sponsoring Org:
- National Science Foundation
More Like this
Passive surveillance systems are widely used to monitor diseases occurrence over wide spatial areas due to their cost-effectiveness and integration into broadly distributed healthcare systems. However, such systems are generally associated with imperfect ascertainment of disease cases and with heterogeneous capture probabilities arising from factors such as differential access to care. Augmenting passive surveillance systems with other surveillance efforts provides a way to estimate the true number of incident cases. We develop a hierarchical modeling framework for analyzing data from multiple surveillance systems that allows for individual-level covariate-dependent heterogeneous capture probabilities, and borrows information across surveillance sites to improve estimation of the true number of incident cases. Inference is carried out via a two-stage Bayesian procedure. Simulation studies illustrated superior performance of the proposed approach with respect to bias, root mean square error, and coverage compared to a model that does not borrow information across sites. We applied the proposed model to data from three surveillance systems reporting pulmonary tuberculosis (PTB) cases in a major center of ongoing transmission in China. The analysis yielded bias-corrected estimates of PTB cases from the passive system and led to the identification of risk factors associated with PTB rates, as well as factors influencing the operating characteristics of the implemented surveillance systems.
In functional data analysis, curves or surfaces are observed, up to measurement error, at a finite set of locations, for, say, a sample of n individuals. Often, the curves are homogeneous, except perhaps for individual-specific regions that provide heterogeneous behaviour (e.g. ‘damaged’ areas of irregular shape on an otherwise smooth surface). Motivated by applications with functional data of this nature, we propose a Bayesian mixture model, with the aim of dimension reduction, by representing the sample of n curves through a smaller set of canonical curves. We propose a novel prior on the space of probability measures for a random curve which extends the popular Dirichlet priors by allowing local clustering: non-homogeneous portions of a curve can be allocated to different clusters and the n individual curves can be represented as recombinations (hybrids) of a few canonical curves. More precisely, the prior proposed envisions a conceptual hidden factor with k-levels that acts locally on each curve. We discuss several models incorporating this prior and illustrate its performance with simulated and real data sets. We examine theoretical properties of the proposed finite hybrid Dirichlet mixtures, specifically, their behaviour as the number of the mixture components goes to ∞ and their connection with Dirichlet process mixtures.
We present the first Bayesian method for tomographic decomposition of the plane-of-sky orientation of the magnetic field with the use of stellar polarimetry and distance. This standalone tomographic inversion method presents an important step forward in reconstructing the magnetized interstellar medium (ISM) in three dimensions within dusty regions. We develop a model in which the polarization signal from the magnetized and dusty ISM is described by thin layers at various distances, a working assumption which should be satisfied in small-angular circular apertures. Our modeling makes it possible to infer the mean polarization (amplitude and orientation) induced by individual dusty clouds and to account for the turbulence-induced scatter in a generic way. We present a likelihood function that explicitly accounts for uncertainties in polarization and parallax. We develop a framework for reconstructing the magnetized ISM through the maximization of the log-likelihood using a nested sampling method. We test our Bayesian inversion method on mock data, representative of the high Galactic latitude sky, taking into account realistic uncertainties from Gaia and as expected for the optical polarization survey P ASIPHAE according to the currently planned observing strategy. We demonstrate that our method is effective at recovering the cloud properties as soon as the polarization induced by a cloud to its background stars is higher than ~0.1% for the adopted survey exposure time and level of systematic uncertainty. The larger the induced polarization is, the better the method’s performance, and the lower the number of required stars. Our method makes it possible to recover not only the mean polarization properties but also to characterize the intrinsic scatter, thus creating new ways to characterize ISM turbulence and the magnetic field strength. Finally, we apply our method to an existing data set of starlight polarization with known line-of-sight decomposition, demonstrating agreement with previous results and an improved quantification of uncertainties in cloud properties.more » « less
Understanding the drivers of speciation within islands is key to explain the high levels of invertebrate diversification and endemism often observed within islands. Here, we propose an insular topoclimate model for Quaternary diversification (ITQD), and test the general prediction that, within a radially eroded conical island, glacial climate conditions facilitate the divergence of populations within species across valleys.
Gran Canaria, Canary Islands.
Laparocerus tessellatusbeetle species complex (Coleoptera, Curculionidae). Methods
We characterize individual‐level genomic relationships using single nucleotide polymorphisms produced by double‐digest restriction site associated DNA sequencing (ddRAD‐seq). A range of parameter values were explored in order to filter our data. We assess individual relatedness, species boundaries, demographic history and spatial patterns of connectivity.
The total number of ddRAD‐seq loci per sample ranges from 4,576 to 512, with 11.12% and 4.84% of missing data respectively, depending on the filtering parameter combination. We consistently infer four genetically distinct ancestral populations and two presumed cases of admixture, one of which is largely restricted to high altitudes. Bayes factor delimitation support the hypothesis of four species, which is consistent with the four inferred ancestral gene pools. Landscape resistance analyses identified genomic relatedness among individuals in two out of the four inferred species to be best explained by annual precipitation during the last glacial maximum rather than geographic distance.
Our data reveal a complex speciation history involving population isolation and admixture, with broad support for the ITQD model here proposed. We suggest that further studies are needed to test the generality of our model, and enrich our understanding of the evolutionary process in island invertebrates. Our results demonstrate the power of ddRAD‐seq data to provide a detailed understanding of the temporal and spatial dynamics of insular biodiversity.