One of the Grand Challenges in Science is the construction of the
In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV‐infected patients and reveal two interesting genera
- PAR ID:
- 10082999
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Genetic Epidemiology
- Volume:
- 43
- Issue:
- 3
- ISSN:
- 0741-0395
- Page Range / eLocation ID:
- p. 250-262
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics forNP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees isNP -hard. Exact-RFS-2 is available in open source form on Github athttps://github.com/yuxilin51/GreedyRFS . -
Abstract Horizontal gene transfer (HGT) occurring within microbiomes is linked to complex environmental and ecological dynamics that are challenging to replicate in controlled settings. Consequently, most extant studies of microbiome HGT are either simplistic experimental settings with tenuous relevance to real microbiomes or correlative studies that assume that HGT potential is a function of the relative abundance of mobile genetic elements (MGEs), the vehicles of HGT. Here we introduce Kairos as a bioinformatic tool deployed in nextflow for detecting HGT events “
in situ, ” i.e., within a microbiome, through analysis of time-series metagenomic sequencing data. Thein-situ framework proposed here leverages available metagenomic data from a longitudinally sampled microbiome to assess whether the chronological occurrence of potential donors, recipients, and putatively transferred regions could plausibly have arisen due to HGT over a range of defined time periods. The centerpiece of the Kairos workflow is a novel competitive read alignment method that enables discernment of even very similar genomic sequences, such as those produced by MGE-associated recombination. A key advantage of Kairos is its reliance on assemblies rather than metagenome assembled genomes (MAGs), which avoids systematic exclusion of accessory genes associated with the binning process. In an example test-case of real world data, use of assemblies directly produced a 264-fold increase in the number of antibiotic resistance genes included in the analysis of HGT compared to analysis of MAGs with MetaCHIP. Further,in silico evaluation of contig taxonomy was performed to assess the accuracy of classification for both chromosomally- and MGE-derived sequences, indicating a high degree of accuracy even for conjugative plasmids up to the level of class or order. Thus, Kairos enables the analysis of very recent HGT events, making it suitable for studying rapid prokaryotic adaptation in environmental systems without disturbing the ornate ecological dynamics associated with microbiomes. Current versions of the Kairos workflow are available here:https://github.com/clb21565/kairos . -
Abstract Gravitational waves (GWs) from merging compact objects encode direct information about the luminosity distance to the binary. When paired with a redshift measurement, this enables standard-siren cosmology: a Hubble diagram can be constructed to directly probe the Universe’s expansion. This can be done in the absence of electromagnetic measurements, as features in the mass distribution of GW sources provide self-calibrating redshift measurements without the need for a definite or probabilistic host galaxy association. This “spectral siren” technique has thus far only been applied with simple parametric representations of the mass distribution, and theoretical predictions for features in the mass distribution are commonly presumed to be fundamental to the measurement. However, the use of an inaccurate representation leads to biases in the cosmological inference, an acute problem given the current uncertainties in true source population. Furthermore, it is commonly presumed that the form of the mass distribution must be known a priori to obtain unbiased measurements of cosmological parameters in this fashion. Here, we demonstrate that spectral sirens can accurately infer cosmological parameters without such prior assumptions. We apply a flexible, nonparametric model for the mass distribution of compact binaries to a simulated catalog of 1000 GW signals, consistent with expectations for the next LIGO–Virgo–KAGRA observing run. We find that, despite our model’s flexibility, both the source mass model and cosmological parameters are correctly reconstructed. We predict a 11.2%
✎ measurement ofH 0, keeping all other cosmological parameters fixed, and a 6.4%✎ measurement ofH (z = 0.9)✎ when fitting for multiple cosmological parameters (1σ uncertainties). This astrophysically agnostic spectral siren technique will be essential to arrive at precise and unbiased cosmological constraints from GW source populations. -
Abstract Background Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions.
Results In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers.
Conclusions Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package
https://github.com/humengying0907/deconvBenchmarking andhttps://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods. -
Summary Transposable elements (
TE s) are ubiquitous components of eukaryotic genomes and can create variation in genome organization and content. Most maize genomes are composed ofTE s. We developed an approach to define shared and variableTE insertions across genome assemblies and applied this method to four maize genomes (B73, W22, Mo17 andPH 207) with uniform structural annotations ofTE s. Among these genomes we identified approximately 400 000TE s that are polymorphic, encompassing 1.6 Gb of variableTE sequence. These polymorphicTE s include a combination of recent transposition events as well as deletions of olderTE s. There are examples of polymorphicTE s within each of the superfamilies ofTE s and they are found distributed across the genome, including in regions of recent shared ancestry among individuals. There are many examples of polymorphicTE s within or near maize genes. In addition, there are 2380 gene annotations in the B73 genome that are located within variableTE s, providing evidence for the role ofTE s in contributing to the substantial differences in annotated gene content among these genotypes.TE s are highly variable in our survey of four temperate maize genomes, highlighting the major contribution ofTE s in driving variation in genome organization and gene content.Open Research Badges This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at
https://github.com/SNAnderson/maizeTE_variation ;https://mcstitzer.github.io/maize_TEs .