skip to main content


Title: Characterizing Cell Shape Distributions Using k-Mode Kernel Mixtures
This paper addresses the problem of characterizing statistical distributions of cellular shape populations using shape samples from microscopy image data. This problem is challenging because of the nonlinearity and high-dimensionality of shape manifolds. The paper develops an efficient, nonparametric approach using ideas from k-modal mixtures and kernel estimators. It uses elastic shape analysis of cell boundaries to estimate statistical modes and clusters given shapes around those modes. (Notably, it uses a combination of modal distributions and ANOVA to determine k automatically.) A population is then characterized as k-modal mixture relative to this estimated clustering and a chosen kernel (e.g., a Gaussian or a flat kernel). One can compare and analyze populations using the Fisher-Rao metric between their estimated distributions. We demonstrate this approach for classifying shapes associated with migrations of entamoeba histolytica under different experimental conditions. This framework remarkably captures salient shape patterns and separates shape data for different experimental settings, even when it is difficult to discern class differences visually.  more » « less
Award ID(s):
1955154
NSF-PAR ID:
10433354
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
26th International Conference on Pattern Recognition (ICPR)
Page Range / eLocation ID:
2517 to 2523
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of characterizing shape populations using highly frequent representative shapes. Framing such shapes as statistical modes – shapes that correspond to (significant) local maxima of the underlying pdfs – we develop a frequency-based, nonparametric approach for estimating sample modes. Using an elastic shape metric, we define ϵ-neighborhoods in the shape space and shortlist shapes that are central and have the most neighbors. A critical issue – How to automatically select the threshold ϵ? – is resolved using a combination of ANOVA and empirical mode distribution. The resulting modal set, in turn, helps characterize the shape population and performs better than the traditional cluster means. We demonstrate this framework using amoeba shapes from brightfield microscopy images and highlight its advantages over existing ideas. 
    more » « less
  2. null (Ed.)
    The provenance of sandstones deposited in the late Paleozoic Tepuel-Genoa Basin is analyzed in this paper. Five sections were sampled in Esquel, Sierra de Tepuel, Sierra de Tecka, El Molle, and Río Genoa areas for petrographic and geochemical studies. The sandstones in the Tepuel-Genoa Basin are dominated by feldspathic litharenites and litharenites, showing lithic fragments of volcanic and sedimentary rocks in the Valle Chico Formation and medium-to high-grade metamorphic rock clasts in the rest of the units. Detrital modes of seventy-five sandstones samples from the Valle Chico, Pampa de Tepuel, Moj´on de Hierro, and Río Genoa formations were counted and analyzed. Seven modal components have discriminant value for identifying provenance areas (Qm, Qi, Lv, Lmm-h, Lm-Lp, Lm, Qpm). These modal components allow identification of three petrofacies: 1. Quartzose-lithic (Qm69Lv2Lm29), 2. Quartzose (Qm89Lv4Lm7) and 3. Volcanic-sedimentary (Qm60Lv38Lm1). The quartzose-lithic petrofacies is mainly composed of monocrystalline quartz, medium- and high-grade metamorphic clasts and polycrystalline quartz with cataclastic texture, this assemblage is interpreted as being derived from the crystalline rocks that form the Deseado Massif. The quartzose petrofacies is composed of monocrystalline quartz with scarce contributions of metamorphic clasts and traces of volcanic fragments; the provenance area is ascribed to sedimentary terrains, which most likely covered part of the Deseado Massif. The volcanic-sedimentary petrofacies is comprised of volcanic (acidic and intermediate rocks) and sedimentary (sandstone and mudstone) clasts, with discrete amounts of quartz grains with idiomorph shapes and embayments. This assemblage may correspond to material supply from the Devonian-Early Carboniferous accretionary complex developed in Chile or the unroofing of the western volcanic arc located in the central part of Patagonia. The validity of the three defined petrofacies was evaluated using Principal Component Analysis and triangular compositional diagrams; both methods show good separation and lack of overlap between the three petrofacies. Major (Si, Al, Fe, Na, K) and trace-REE elements (Zr, Th, Sc, Hf) were used to improve the petrographic information. The relation SiO2 against K2O/Na2O indicates that the Pampa de Tepuel and the Moj´on de Hierro formations correspond to a passive margin, while the Valle Chico and Río Genoa formations represent different types of active continental margins. The Th/Sc and Zr/Sc ratios and the Th-Hf-Co distributions indicate that the sandstones of the Tepuel Group were formed from rocks compatibles with the average composition of the upper continental crust. 
    more » « less
  3. Particle shape strongly influences the diffusion charging of aerosol particles exposed to bipolar/unipolar ions and accurate modeling is needed to predict the charge distribution of non-spherical particles. A prior particle-ion collision kernel β_i model including Coulombic and image potential interactions for spherical particles is generalized for arbitrary shapes following a scaling approach that uses a continuum and free molecular particle length scale and Langevin dynamics simulations of non-spherical particle-ion collisions for attractive Coulomb-image potential interactions. This extended β_i model for collisions between unlike charged particle-ion (bipolar charging) and like charged particle-ion (unipolar charging) is validated by comparing against published experimental data of bipolar charge distributions for diverse shapes. Comparison to the bipolar charging data for spherical particles shows good agreement in air, argon, and nitrogen, while also demonstrating high accuracy in predicting charge states up to ±6. Comparisons to the data for fractal aggregates reveal that the LD-based β_i model predicts within overall ±30% without any systematic bias. The mean charge on linear chain aggregates and charge fractions on cylindrical particles is found to be in good agreement with the measurements (~±20% overall). The comparison with experimental results supports the use of LD-based diffusion charging models to predict the bipolar and unipolar charge distribution of arbitrary shaped aerosol particles for a wide range of particle size, and gas temperature, pressure. The presented β_i model is valid for perfectly conducting particles and in the absence of external electric fields; these simplifications need to be addressed in future work on particle charging. 
    more » « less
  4. Abstract

    There is a growing need for flexible general frameworks that integrate individual-level data with external summary information for improved statistical inference. External information relevant for a risk prediction model may come in multiple forms, through regression coefficient estimates or predicted values of the outcome variable. Different external models may use different sets of predictors and the algorithm they used to predict the outcome Y given these predictors may or may not be known. The underlying populations corresponding to each external model may be different from each other and from the internal study population. Motivated by a prostate cancer risk prediction problem where novel biomarkers are measured only in the internal study, this paper proposes an imputation-based methodology, where the goal is to fit a target regression model with all available predictors in the internal study while utilizing summary information from external models that may have used only a subset of the predictors. The method allows for heterogeneity of covariate effects across the external populations. The proposed approach generates synthetic outcome data in each external population, uses stacked multiple imputation to create a long dataset with complete covariate information. The final analysis of the stacked imputed data is conducted by weighted regression. This flexible and unified approach can improve statistical efficiency of the estimated coefficients in the internal study, improve predictions by utilizing even partial information available from models that use a subset of the full set of covariates used in the internal study, and provide statistical inference for the external population with potentially different covariate effects from the internal population.

     
    more » « less
  5. A modal decomposition is a useful tool that deconstructs the statistical dependence between two random variables by decomposing their joint distribution into orthogonal modes. Historically, modal decompositions have played important roles in statistics and information theory, e.g., in the study of maximal correlation. They are defined using the singular value decompo- sitions of divergence transition matrices (DTMs) and conditional expectation operators corresponding to joint distributions. In this paper, we first characterize the set of all DTMs, and illustrate how the associated conditional expectation operators are the only weak contractions among a class of natural candidates. While modal decompositions have several modern machine learning applications, such as feature extraction from categorical data, the sample complexity of estimating them in such scenarios has not been analyzed. Hence, we also establish some non-asymptotic sample complexity results for the problem of estimating dominant modes of an unknown joint distribution from training data. 
    more » « less