skip to main content


This content will become publicly available on September 1, 2024

Title: Sparsification of large ultrametric matrices: insights into the microbial Tree of Life
Ultrametric matrices appear in many domains of mathematics and science; nevertheless, they can be large and dense, making them difficult to store and manipulate, unlike large but sparse matrices. In this manuscript, we exploit that ultrametric matrices can be represented as binary trees to sparsify them via an orthonormal base change based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large ultrametric matrices remain non-zero after the base change; and develop an algorithm to sparsify such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the Haar-like wavelets and supply a sufficient condition to approximate the spectrum of ultrametric matrices outside this subclass. Our methods give computational access to a covariance matrix model of the microbiologists’ Tree of Life, which was previously inaccessible due to its size, and motivate introducing a new wavelet-based (beta-diversity) metric to compare microbial environments. Unlike the established metrics, the new metric may be used to identify internal nodes (i.e. splits) in the Tree that link microbial composition and environmental factors in a statistically significant manner.  more » « less
Award ID(s):
1836914
NSF-PAR ID:
10495519
Author(s) / Creator(s):
;
Publisher / Repository:
The Royal Society Publishing
Date Published:
Journal Name:
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume:
479
Issue:
2277
ISSN:
1364-5021
Subject(s) / Keyword(s):
["double principal coordinate analysis","Haar-like wavelets","sparsification","phylogenetic covariance matrix","ultrametric","UniFrac"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio. 
    more » « less
  2. Abstract

    Precomputed Radiance Transfer (PRT) remains an attractive solution for real‐time rendering of complex light transport effects such as glossy global illumination. After precomputation, we can relight the scene with new environment maps while changing viewpoint in real‐time. However, practical PRT methods are usually limited to low‐frequency spherical harmonic lighting. All‐frequency techniques using wavelets are promising but have so far had little practical impact. The curse of dimensionality and much higher data requirements have typically limited them to relighting with fixed view or only direct lighting with triple product integrals. In this paper, we demonstrate a hybrid neural‐wavelet PRT solution to high‐frequency indirect illumination, including glossy reflection, for relighting with changing view. Specifically, we seek to represent the light transport function in the Haar wavelet basis. For global illumination, we learn the wavelet transport using a small multi‐layer perceptron (MLP) applied to a feature field as a function of spatial location and wavelet index, with reflected direction and material parameters being other MLP inputs. We optimize/learn the feature field (compactly represented by a tensor decomposition) and MLP parameters from multiple images of the scene under different lighting and viewing conditions. We demonstrate real‐time (512 x 512 at 24 FPS, 800 x 600 at 13 FPS) precomputed rendering of challenging scenes involving view‐dependent reflections and even caustics.

     
    more » « less
  3. Large-scale microbiome studies investigating disease-inducing microbial roles base their findings on differences between microbial count data in contrasting environments (e.g., stool samples between cases and controls). These microbiome survey studies are often impeded by small sample sizes and database bias. Combining data from multiple survey studies often results in obvious batch effects, even when DNA preparation and sequencing methods are identical. Relatedly, predictive models trained on one microbial DNA dataset often do not generalize to outside datasets. In this study, we address these limitations by applying word embedding algorithms (GloVe) and PCA transformation to ASV data from the American Gut Project and generating translation matrices that can be applied to any 16S rRNA V4 region gut microbiome sequencing study. Because these approaches contextualize microbial occurrences in a larger dataset while reducing dimensionality of the feature space, they can improve generalization of predictive models that predict host phenotype from stool associated gut microbiota. The GMEmbeddings R package contains GloVe and PCA embedding transformation matrices at 50, 100 and 250 dimensions, each learned using ∼15,000 samples from the American Gut Project. It currently supports the alignment, matching, and matrix multiplication to allow users to transform their V4 16S rRNA data into these embedding spaces. We show how to correlate the properties in the new embedding space to KEGG functional pathways for biological interpretation of results. Lastly, we provide benchmarking on six gut microbiome datasets describing three phenotypes to demonstrate the ability of embedding-based microbiome classifiers to generalize to independent datasets. Future iterations of GMEmbeddings will include embedding transformation matrices for other biological systems. Available at: https://github.com/MaudeDavidLab/GMEmbeddings . 
    more » « less
  4. null (Ed.)
    Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure leads to exact methods for maximum likelihood estimation. Our results also give new insights into the geometry of ultrametric matrices. 
    more » « less
  5. Chi Fru, Ernest ; Chik, Alex ; Colwell, Fredrick ; Dittrich, Maria ; Engel, Annette ; Keenan, Sarah ; Meckenstock, Rainer ; Omelon, Christopher ; Purkamo, Lotta ; Weisener, Chris (Ed.)

    Roots are common features in basaltic lava tube caves on the island of Hawai‘i. For the past 50 years, new species of cave-adapted invertebrates, including cixiid planthoppers, crickets, thread-legged bugs, and spiders, have been discovered from root patches in lava tubes on different volcanoes and across variable climatic conditions. Assessing vegetation on the surface above lava tube passages, as well as genetic characterization of roots from within lava tubes, suggest that most roots belong to the native pioneer tree, ‘ōhi‘a lehua (Metrosideros polymorpha). Planthoppers are the primary consumers of sap at the base of the subsurface food web. However, root physicochemistry and rhizobiome microbial diversity and functional potential have received little attention. This study focuses on characterizing the ‘ōhi‘a rhizobiome, accessed from free-hanging roots inside lava tubes. Using these results, we can begin to evaluate the development and evolution of plant-microbe-invertebrate relationships.

    We explored lava tubes formed in flows of differing elevations and ages, from about 140 to 3000 years old, on Mauna Loa, Kīlauea, and Hualālai volcanoes on Hawai‘i Island. Invertebrate diversity was evaluated from root galleries and non-root galleries, in situ fluid physicochemistry was measured, and root and bare rock fluids (e.g., water, sap) were collected to determine major ion concentrations, as well as non-purgeable organic carbon (NPOC) and total nitrogen (TN) content. To verify root identity, DNA was extracted, and three sets of primers were used. After screening for onlyMetrosiderosspp., the V4 region of the 16S rRNA gene was sequenced and taxonomy was assigned.

    Root fluids were viscous and ranged in color from clear to yellow to reddish orange. Root fluids had 2X to 10X higher major ion concentrations compared to rock water. The average root NPOC and TN concentrations were 192 mg/L and 5.2 mg/L, respectively, compared to rock water that had concentrations of 6.8 mg/L and 1.8 mg/L, respectively. Fluids from almost 300 root samples had pH values that ranged from 2.2 to 5.6 (average pH 4.63) and were lower than rock water (average pH 6.39). Root fluid pH was comparable to soil pH from montane wet forests dominated by ‘ōhi‘a (Selmants et al. 2016), which can grow in infertile soil with pH values as low as 3.6. On Hawai‘i, rain water pH averages 5.2 at sea level and systematically decreases with elevation to pH 4.3 at 2500 m (Miller and Yoshinaga 2012), but root fluid pH did not correlate with elevation, temperature, relative humidity, inorganic and organic constituents, or age of flow. Root fluid acidity is likely due to concentrated organic compounds, sourced as root exudates, and this habitat is acidic for the associated invertebrates.

    From 62 root samples, over 66% were identified to the genusMetrosideros. A few other identifications of roots from lava tube systems where there had been extensive clear-cutting and ranching included monkey pod tree, coconut palm,Ficusspp., and silky oak.

    The 16S rRNA gene sequence surveys revealed that root bacterial communities were dominated by few groups, including Burkholderiaceae, as well as Acetobacteraceae, Sphingomonadaceae, Acidobacteriaceae, Gemmataceae, Xanthobacteraceae, and Chitinophagaceae. However, most of the reads could not be classified to a specific genus, which suggested that the rhizobiome harbor novel diversity. Diversity was higher from wetter climates. The root communities were distinct from those described previously from ‘ōhi‘a flowers and leaves (Junker and Keller 2015) and lava tube rocky surfaces (Hathaway et al. 2014) where microbial groups were specifically presumed capable of heterotrophy, methanotrophy, diazotrophy, and nitrification. Less can be inferred for the rhizobiome metabolism, although most taxa are likely aerobic heterotrophs. Within the Burkholderiaceae, there were high relative abundances of sequences affiliated with the genusParaburkholderia, which includes known plant symbionts, as well as the acidophilic generaAcidocellaandAcidisomafrom the Acetobacteraceae, which were retrieved predominately from caves in the oldest lava flows that also had the lowest root pH values. It is likely that the bacterial groups are capable of degrading exudates and providing nutritional substrates for invertebrate consumers that are not provided by root fluids (i.e., phloem) alone.

    As details about the biochemistry of ‘ōhi‘a have been missing, characterizing the rhizobiome from lava tubes will help to better understand potential plant-microbe-invertebrate interactions and ecological and evolutionary relationships through time. In particular, the microbial rhizobiome may produce compounds used by invertebrates nutritionally or that affect their behavior, and changes to the rhizobiome in response to environmental conditions may influence invertebrate interactions with the roots, which could be important to combat climate change effects or invasive species introductions.

     
    more » « less