skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2153704

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ABSTRACT Covariate-dependent graph learning has gained increasing interest in the graphical modeling literature for the analysis of heterogeneous data. This task, however, poses challenges to modeling, computational efficiency, and interpretability. The parameter of interest can be naturally represented as a 3-dimensional array with elements that can be grouped according to 2 directions, corresponding to node level and covariate level, respectively. In this article, we propose a novel dual group spike-and-slab prior that enables multi-level selection at covariate-level and node-level, as well as individual (local) level sparsity. We introduce a nested strategy with specific choices to address distinct challenges posed by the various grouping directions. For posterior inference, we develop a full Gibbs sampler for all parameters, which mitigates the difficulties of parameter tuning often encountered in high-dimensional graphical models and facilitates routine implementation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of graph recovery. We show the practical utility of our model via an application to microbiome data where we seek to better understand the interactions among microbes as well as how these are affected by relevant covariates. 
    more » « less
  2. Abstract Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While such data have traditionally been available via “bulk sequencing,” the more recently introduced techniques for single-cell DNA sequencing (scDNAseq) provide the type of data that makes CNA inference possible at the single-cell resolution. We introduce a new birth-death evolutionary model of CNAs and a Bayesian method, NestedBD, for the inference of evolutionary trees (topologies and branch lengths with relative mutation rates) from single-cell data. We evaluated NestedBD’s performance using simulated data sets, benchmarking its accuracy against traditional phylogenetic tools as well as state-of-the-art methods. The results show that NestedBD infers more accurate topologies and branch lengths, and that the birth-death model can improve the accuracy of copy number estimation. And when applied to biological data sets, NestedBD infers plausible evolutionary histories of two colorectal cancer samples. NestedBD is available athttps://github.com/Androstane/NestedBD. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  3. Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce the concept of structured high-dimensional probability simplexes, in which most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by (i) high-dimensional weights that are common in modern applications, and (ii) ubiquitous examples in which equal weights -- despite their simplicity -- often achieve favorable or even state-of-the-art predictive performance. This particular structure, however, presents unique challenges partly because, unlike high-dimensional linear regression, the parameter space is a simplex and pattern switching between partial constancy and sparsity is unknown. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for implementation. Posterior contraction rates are established to study large sample behaviors of the posterior distribution. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). 
    more » « less