Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract The paper considers the Popularity Adjusted Block model (PABM) introduced by Sengupta and Chen (Journal of the Royal Statistical Society Series B, 2018, 80, 365–386). We argue that the main appeal of the PABM is the flexibility of the spectral properties of the graph which makes the PABM an attractive choice for modelling networks that appear in biological sciences. We expand the theory of PABM to the case of an arbitrary number of communities which possibly grows with a number of nodes in the network and is not assumed to be known. We produce estimators of the probability matrix and of the community structure and, in addition, provide non-asymptotic upper bounds for the estimation and the clustering errors. We use the Sparse Subspace Clustering (SSC) approach for partitioning the network into communities, the approach that, to the best of our knowledge, has not been used for the clustering network data. The theory is supplemented by a simulation study. In addition, we show advantages of the PABM for modelling a butterfly similarity network and a human brain functional network.more » « less
-
null (Ed.)In the present paper we study a sparse stochastic network enabled with a block structure. The popular Stochastic Block Model (SBM) and the Degree Corrected Block Model (DCBM) address sparsity by placing an upper bound on the maximum probability of connections between any pair of nodes. As a result, sparsity describes only the behavior of network as a whole, without distinguishing between the block-dependent sparsity patterns. To the best of our knowledge, the recently introduced Popularity Adjusted Block Model (PABM) is the only block model that allows to introduce a structural sparsity where some probabilities of connections are identically equal to zero while the rest of them remain above a certain threshold. The latter presents a more nuanced view of the network.more » « less
-
The present paper studies density deconvolution in the presence of small Berkson errors, in particular, when the variances of the errors tend to zero as the sample size grows. It is known that when the Berkson errors are present, in some cases, the unknown density estimator can be obtained by simple averaging without using kernels. However, this may not be the case when Berkson errors are asymptotically small. By treating the former case as a kernel estimator with the zero bandwidth, we obtain the optimal expressions for the bandwidth.We show that the density of Berkson errors acts as a regularizer, so that the kernel estimator is unnecessary when the variance of Berkson errors lies above some threshold that depends on the shapes of the densities in the model and the number of observations.more » « less
-
Discovering and clustering subspaces in high-dimensional data is a fundamental problem of machine learning with a wide range of applications in data mining, computer vision, and pattern recognition. Earlier methods divided the problem into two separate stages of finding the similarity matrix and finding clusters. Similar to some recent works, we integrate these two steps using a joint optimization approach. We make the following contributions: (i) we estimate the reliability of the cluster assignment for each point before assigning a point to a subspace. We group the data points into two groups of “certain” and “uncertain”, with the assignment of latter group delayed until their subspace association certainty improves. (ii) We demonstrate that delayed association is better suited for clustering subspaces that have ambiguities, i.e. when subspaces intersect or data are contaminated with outliers/noise. (iii) We demonstrate experimentally that such delayed probabilistic association leads to a more accurate self-representation and final clusters. The proposed method has higher accuracy both for points that exclusively lie in one subspace, and those that are on the intersection of subspaces. (iv) We show that delayed association leads to huge reduction of computational cost, since it allows for incremental spectral clusteringmore » « less
An official website of the United States government

Full Text Available