skip to main content

Title: On Estimation of Modal Decompositions
A modal decomposition is a useful tool that deconstructs the statistical dependence between two random variables by decomposing their joint distribution into orthogonal modes. Historically, modal decompositions have played important roles in statistics and information theory, e.g., in the study of maximal correlation. They are defined using the singular value decompo- sitions of divergence transition matrices (DTMs) and conditional expectation operators corresponding to joint distributions. In this paper, we first characterize the set of all DTMs, and illustrate how the associated conditional expectation operators are the only weak contractions among a class of natural candidates. While modal decompositions have several modern machine learning applications, such as feature extraction from categorical data, the sample complexity of estimating them in such scenarios has not been analyzed. Hence, we also establish some non-asymptotic sample complexity results for the problem of estimating dominant modes of an unknown joint distribution from training data.
Authors:
; ;
Award ID(s):
1717610 1711027
Publication Date:
NSF-PAR ID:
10172744
Journal Name:
IEEE International Symposium on Information Theory
Volume:
1
Issue:
1
Page Range or eLocation-ID:
1-5
Sponsoring Org:
National Science Foundation
More Like this
  1. Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency ofmore »the nested simulation.« less
  2. A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing problems like identity and uniformity. This significantly improves over the standard sampling model where polynomially many samples are necessary. Inspired by these results, we introduce a computational model based on conditional sampling to develop sublinear algorithms with exponentially faster runtimes compared to standard sublinear algorithms. We focus on geometric optimization problems over points in high dimensional Euclidean space. Access to these points is provided via a conditional sampling oracle that takes as input a succinct representation of a subset of the domain and outputs a uniformly random point in that subset. We study two well studied problems: k-means clustering and estimating the weight of the minimum spanning tree. In contrast to prior algorithms for the classic model, our algorithms have time, space and sample complexity that is polynomial in the dimension and polylogarithmic in the number ofmore »points. Finally, we comment on the applicability of the model and compare with existing ones like streaming, parallel and distributed computational models.« less
  3. To overcome the curse of dimensionality in joint probability learning, recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of random variables (RVs) from three-dimensional marginals, exploiting the uniqueness of tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals is still costly in terms of sample complexity. Tensor decomposition also poses a computationally intensive optimization problem. This work puts forth a new framework that learns the joint PMF using pairwise marginals that are relatively easy to acquire. The method is built upon nonnegative matrix factorization (NMF) theory, and features a Gram–Schmidt-like economical algorithm that works provably well under realistic conditions. Theoretical analysis of a recently proposed expectation maximization (EM) algorithm for joint PMF recovery is also presented. In particular, the EM algorithm is shown to provably improve upon the proposed pairwise marginal-based approach. Synthetic and real-data experiments are employed to showcase the effectiveness of the proposed approach.
  4. We consider the problem of characterizing shape populations using highly frequent representative shapes. Framing such shapes as statistical modes – shapes that correspond to (significant) local maxima of the underlying pdfs – we develop a frequency-based, nonparametric approach for estimating sample modes. Using an elastic shape metric, we define ϵ-neighborhoods in the shape space and shortlist shapes that are central and have the most neighbors. A critical issue – How to automatically select the threshold ϵ? – is resolved using a combination of ANOVA and empirical mode distribution. The resulting modal set, in turn, helps characterize the shape population and performs better than the traditional cluster means. We demonstrate this framework using amoeba shapes from brightfield microscopy images and highlight its advantages over existing ideas.
  5. We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements [n]={1,...,n} with corresponding data values x1,...,xn. We observe the values for a "sample" set A \subset [n] and wish to estimate some statistic of the values for a "target" set B \subset [n] where B could be the entire set. Crucially, we assume that the sets A and B are drawn according to some known distribution P over pairs of subsets of [n]. A given estimation algorithm is evaluated based on its "worst-case, expected error" where the expectation is with respect to the distribution P from which the sample A and target sets B are drawn, and the worst-case is with respect to the data values x1,...,xn. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values–-where the weights are functions of the distribution P and the sample and target sets A, B--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative pi/2 factor worse than the optimal of such algorithms.more »The algorithm and proof leverage a surprising connection to the Grothendieck problem. We also extend these results to the linear regression setting where each datapoint is not a scalar but a labeled vector (xi,yi). This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process via the distribution P, is a significant departure from the typical statistical estimation framework and introduces a uniform analysis for the many natural settings where membership in a sample may be correlated with data values, such as when individuals are recruited into a sample through their social networks as in "snowball/chain" sampling or when samples have chronological structure as in "selective prediction".« less