Density estimation is a building block for many other statistical methods, such as classification, nonparametric testing, and data compression. In this paper, we focus on a nonparametric approach to multivariate density estimation, and study its asymptotic properties under both frequentist and Bayesian settings. The estimated density function is obtained by considering a sequence of approximating spaces to the space of densities. These spaces consist of piecewise constant density functions supported by binary partitions with increasing complexity. To obtain an estimate, the partition is learned by maximizing either the likelihood of the corresponding histogram on that partition, or the marginal posterior probability of the partition under a suitable prior. We analyze the convergence rate of the maximum likelihood estimator and the posterior concentration rate of the Bayesian estimator, and conclude that for a relatively rich class of density functions the rate does not directly depend on the dimension. We also show that the Bayesian method can adapt to the unknown smoothness of the density function. The method is applied to several specific function classes and explicit rates are obtained. These include spatially sparse functions, functions of bounded variation, and Holder continuous functions. We also introduce an ensemble approach, obtained by aggregating multiple density estimates fit under carefully designed perturbations, and show that for density functions lying in a Holder space (H^(1,β), 0 < β ≤ 1), the ensemble method can achieve minimax convergence rate up to a logarithmic term, while the corresponding rate of the density estimator based on a single partition is suboptimal for this function class. 
                        more » 
                        « less   
                    
                            
                            Convergence Rates of a Class of Multivariate Density Estimation Methods Based on Adaptive Partitioning
                        
                    
    
            Density estimation is a building block for many other statistical methods, such as classification, nonparametric testing, and data compression. In this paper, we focus on a non-parametric approach to multivariate density estimation, and study its asymptotic properties under both frequentist and Bayesian settings. The estimated density function is obtained by considering a sequence of approximating spaces to the space of densities. These spaces consist of piecewise constant density functions supported by binary partitions with increasing complexity. To obtain an estimate, the partition is learned by maximizing either the likelihood of the corresponding histogram on that partition, or the marginal posterior probability of the partition under a suitable prior. We analyze the convergence rate of the maximum likelihood estimator and the posterior concentration rate of the Bayesian estimator, and conclude that for a relatively rich class of density functions the rate does not directly depend on the dimension. We also show that the Bayesian method can adapt to the unknown smoothness of the density function. The method is applied to several specific function classes and explicit rates are obtained. These include spatially sparse functions, functions of bounded variation, and Holder continuous functions. We also introduce an ensemble approach, obtained by aggregating multiple density estimates fit under carefully designed perturbations, and show that for density functions lying in a Holder space (H^(1,β),0<β≤1), the ensemble method can achieve minimax convergence rate up to a logarithmic term, while the corresponding rate of the density estimator based on a single partition is suboptimal for this function class. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1952386
- PAR ID:
- 10427903
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 24
- Issue:
- 50
- ISSN:
- 1532-4435
- Page Range / eLocation ID:
- 1-64
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            This paper presents a variational Bayesian inference Neural Network (BNN) approach to quantify uncertainties in matrix function estimation for the state-space linear parameter-varying (LPV) model identification problem using only inputs/outputs data. The proposed method simultaneously estimates states and posteriors of matrix functions given data. In particular, states are estimated by reaching a consensus between an estimator based on past system trajectory and an estimator by recurrent equations of states; posteriors are approximated by minimizing the Kullback–Leibler (KL) divergence between the parameterized posterior distribution and the true posterior of the LPV model parameters. Furthermore, techniques such as transfer learning are explored in this work to reduce computational cost and prevent convergence failure of Bayesian inference. The proposed data-driven method is validated using experimental data for identification of a control-oriented reactivity controlled compression ignition (RCCI) engine model.more » « less
- 
            Summary We propose and prove the optimality of a Bayesian approach for estimating the latent positions in random dot product graphs, which we call posterior spectral embedding. Unlike classical spectral-based adjacency, or Laplacian spectral embedding, posterior spectral embedding is a fully likelihood-based graph estimation method that takes advantage of the Bernoulli likelihood information of the observed adjacency matrix. We develop a minimax lower bound for estimating the latent positions, and show that posterior spectral embedding achieves this lower bound in the following two senses: it both results in a minimax-optimal posterior contraction rate and yields a point estimator achieving the minimax risk asymptotically. The convergence results are subsequently applied to clustering in stochastic block models with positive semidefinite block probability matrices, strengthening an existing result concerning the number of misclustered vertices. We also study a spectral-based Gaussian spectral embedding as a natural Bayesian analogue of adjacency spectral embedding, but the resulting posterior contraction rate is suboptimal by an extra logarithmic factor. The practical performance of the proposed methodology is illustrated through extensive synthetic examples and the analysis of Wikipedia graph data.more » « less
- 
            Agrawal, Shipra; Roth, Aaron (Ed.)Tree-based methods are popular nonparametric tools for capturing spatial heterogeneity and making predictions in multivariate problems. In unsupervised learning, trees and their ensembles have also been applied to a wide range of statistical inference tasks, such as multi-resolution sketching of distributional variations, localization of high-density regions, and design of efficient data compression schemes. In this paper, we study the spatial adaptation property of Bayesian tree-based methods in the unsupervised setting, with a focus on the density estimation problem. We characterize spatial heterogeneity of the underlying density function by using anisotropic Besov spaces, region-wise anisotropic Besov spaces, and two novel function classes as their extensions. For two types of commonly used prior distributions on trees under the context of unsupervised learning—the optional P{ó}lya tree (Wong and Ma, 2010) and the Dirichlet prior (Lu et al., 2013)—we calculate posterior concentration rates when the density function exhibits different types of heterogeneity. In specific, we show that the posterior concentration rate for trees is near minimax over the anisotropic Besov space. The rate is adaptive in the sense that to achieve such a rate we do not need any prior knowledge of the parameters of the Besov space.more » « less
- 
            Agrawal, Shipra; Roth, Aaron (Ed.)Tree-based methods are popular nonparametric tools for capturing spatial heterogeneity and making predictions in multivariate problems. In unsupervised learning, trees and their ensembles have also been applied to a wide range of statistical inference tasks, such as multi-resolution sketching of distributional variations, localization of high-density regions, and design of efficient data compression schemes. In this paper, we study the spatial adaptation property of Bayesian tree-based methods in the unsupervised setting, with a focus on the density estimation problem. We characterize spatial heterogeneity of the underlying density function by using anisotropic Besov spaces, region-wise anisotropic Besov spaces, and two novel function classes as their extensions. For two types of commonly used prior distributions on trees under the context of unsupervised learning—the optional P{ó}lya tree (Wong and Ma, 2010) and the Dirichlet prior (Lu et al., 2013)—we calculate posterior concentration rates when the density function exhibits different types of heterogeneity. In specific, we show that the posterior concentration rate for trees is near minimax over the anisotropic Besov space. The rate is adaptive in the sense that to achieve such a rate we do not need any prior knowledge of the parameters of the Besov space.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    