skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Non‐linear phylogenetic regression using regularised kernels
Abstract Phylogenetic regression is a type of generalised least squares (GLS) method that incorporates a modelled covariance matrix based on the evolutionary relationships between species (i.e. phylogenetic relationships). While this method has found widespread use in hypothesis testing via phylogenetic comparative methods, such as phylogenetic ANOVA, its ability to account for non‐linear relationships has received little attention.To address this, here we implement a phylogenetic Kernel Ridge Regression (phyloKRR) method that utilises GLS in a high‐dimensional feature space, employing linear combinations of phylogenetically weighted data to account for non‐linearity. We analysed two biological datasets using the Radial Basis Function and linear kernel function. The first dataset contained morphometric data, while the second dataset comprised discrete trait data and diversification rates as response variable. Hyperparameter tuning of the model was achieved through cross‐validation rounds in the training set.In the tested biological datasets, phyloKRR reduced the error rate (as measured by RMSE) by around 20% compared to linear‐based regression when data did not exhibit linear relationships. In simulated datasets, the error rate decreased almost exponentially with the level of non‐linearity.These results show that introducing kernels into phylogenetic regression analysis presents a novel and promising tool for complementing phylogenetic comparative methods. We have integrated this method into Python package named phyloKRR, which is freely available at:https://github.com/ulises‐rosas/phylokrr.  more » « less
Award ID(s):
2225130
PAR ID:
10531246
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
15
Issue:
9
ISSN:
2041-210X
Format(s):
Medium: X Size: p. 1611-1623
Size(s):
p. 1611-1623
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundComputational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. ResultsIn our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. ConclusionsOur heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly packagehttps://github.com/humengying0907/deconvBenchmarkingandhttps://doi.org/10.5281/zenodo.8206516, enabling further developments in deconvolution methods. 
    more » « less
  2. Proteins and the complexes they form are central to nearly all cellular processes. Their flexibility, expressed through a continuum of states, provides a window into their biological functions. Cryogenic electron microscopy (cryo-EM) is an ideal tool to study these dynamic states as it captures specimens in noncrystalline conditions and enables high-resolution reconstructions. However, analyzing the heterogeneous distributions of conformations from cryo-EM data is challenging. We present RECOVAR, a method for analyzing these distributions based on principal component analysis (PCA) computed using a REgularized COVARiance estimator. RECOVAR is fast, robust, interpretable, expressive, and competitive with state-of-the-art neural network methods on heterogeneous cryo-EM datasets. The regularized covariance method efficiently computes a large number of high-resolution principal components that can encode rich heterogeneous distributions of conformations and does so robustly thanks to an automatic regularization scheme. The reconstruction method based on adaptive kernel regression resolves conformational states to a higher resolution than all other tested methods on extensive independent benchmarks while remaining highly interpretable. Additionally, we exploit favorable properties of the PCA embedding to estimate the conformational density accurately. This density allows for better interpretability of the latent space by identifying stable states and low free-energy motions. Finally, we present a scheme to navigate the high-dimensional latent space by automatically identifying these low free-energy trajectories. We make the code freely available athttps://github.com/ma-gilles/recovar. 
    more » « less
  3. Abstract The spectral characteristics of vertebrate ocular lenses affect the image of the world that is projected onto the retina, and thus help shape diverse visual capabilities. Here, we tested whether amphibian lens transmission is driven by adaptation to diurnal activity (bright light) and/or scansorial habits (complex visual environments).Spectral transmission through the lenses of 79 species of frogs and six species of salamanders was measured, and data for 29 additional frog species compiled from published literature. Phylogenetic comparative methods were used to test ecological explanations of variation in lens transmission and to test for selection across traits.Lenses of diurnal (day‐active) and scansorial (climbing) frogs transmitted significantly less shortwave light than those of non‐diurnal or non‐scansorial amphibians, and evolutionary modelling suggested that these differences have resulted from differential selection.The presence of shortwave‐transparent lenses was common among the sampled amphibians, which implies that many are sensitive to shortwave light to some degree even in the absence of visual pigments maximally sensitive in the UV. This suggests that shortwave light, including UV, could play an important role in amphibian behaviour and ecology.Shortwave‐absorbing lens pigments likely provide higher visual acuity to diurnally active frogs of multiple ecologies and to nocturnally active scansorial frogs. This new mechanistic understanding of amphibian visual systems suggests that shortwave‐filtering lenses are adaptive not only in daylight conditions but also in those scotopic conditions where high acuity is advantageous. Read the freePlain Language Summaryfor this article on the Journal blog. 
    more » « less
  4. Abstract Evolutionary biologists characterize macroevolutionary trends of phenotypic change across the tree of life using phylogenetic comparative methods. However, within‐species variation can complicate such investigations. For this reason, procedures for incorporating nonstructured (random) intraspecific variation have been developed.Likewise, evolutionary biologists seek to understand microevolutionary patterns of phenotypic variation within species, such as sex‐specific differences or allometric trends. Additionally, there is a desire to compare such within‐species patterns across taxa, but current analytical approaches cannot be used to interrogate within‐species patterns while simultaneously accounting for phylogenetic non‐independence. Consequently, deciphering how intraspecific trends evolve remains a challenge.Here we introduce an extended phylogenetic generalized least squares (E‐PGLS) procedure which facilitates comparisons of within‐species patterns across species while simultaneously accounting for phylogenetic non‐independence.Our method uses an expanded phylogenetic covariance matrix, a hierarchical linear model, and permutation methods to obtain empirical sampling distributions and effect sizes for model effects that can evaluate differences in intraspecific trends across species for both univariate and multivariate data, while conditioning them on the phylogeny.The method has appropriate statistical properties for both balanced and imbalanced data. Additionally, the procedure obtains evolutionary covariance estimates that reflect those from existing approaches for nonstructured intraspecific variation. Importantly, E‐PGLS can detect differences in structured (i.e. microevolutionary) intraspecific patterns across species when such trends are present. Thus, E‐PGLS extends the reach of phylogenetic comparative methods into the intraspecific comparative realm, by providing the ability to compare within‐species trends across species while simultaneously accounting for shared evolutionary history. 
    more » « less
  5. Abstract Spatial transcriptomics (ST) technologies enable high throughput gene expression characterization within thin tissue sections. However, comparing spatial observations across sections, samples, and technologies remains challenging. To address this challenge, we develop STalign to align ST datasets in a manner that accounts for partially matched tissue sections and other local non-linear distortions using diffeomorphic metric mapping. We apply STalign to align ST datasets within and across technologies as well as to align ST datasets to a 3D common coordinate framework. We show that STalign achieves high gene expression and cell-type correspondence across matched spatial locations that is significantly improved over landmark-based affine alignments. Applying STalign to align ST datasets of the mouse brain to the 3D common coordinate framework from the Allen Brain Atlas, we highlight how STalign can be used to lift over brain region annotations and enable the interrogation of compositional heterogeneity across anatomical structures. STalign is available as an open-source Python toolkit athttps://github.com/JEFworks-Lab/STalignand as Supplementary Software with additional documentation and tutorials available athttps://jef.works/STalign. 
    more » « less