Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available August 1, 2026
-
Topic modeling is a widely utilized tool in text analysis. We investigate the optimal rate for estimating a topic model. Specifically, we consider a scenario with n documents, a vocabulary of size p, and document lengths at the order N. When N≥c·p, referred to as the long-document case, the optimal rate is established in the literature at p/(Nn). However, when N=o(p), referred to as the short-document case, the optimal rate remains unknown. In this paper, we first provide new entry-wise large-deviation bounds for the empirical singular vectors of a topic model. We then apply these bounds to improve the error rate of a spectral algorithm, Topic-SCORE. Finally, by comparing the improved error rate with the minimax lower bound, we conclude that the optimal rate is still p/(Nn) in the short-document case.more » « less
-
Given a K-vertex simplex in a d-dimensional space, suppose we measure n points on the simplex with noise (hence, some of the observed points fall outside the sim- plex). Vertex hunting is the problem of estimating the K vertices of the simplex. A popular vertex hunting algorithm is successive projection algorithm (SPA). How- ever, SPA is observed to perform unsatisfactorily under strong noise or outliers. We propose pseudo-point SPA (pp-SPA). It uses a projection step and a denoise step to generate pseudo-points and feed them into SPA for vertex hunting. We derive error bounds for pp-SPA, leveraging on extreme value theory of (possibly) high-dimensional random vectors. The results suggest that pp-SPA has faster rates and better numerical performances than SPA. Our analysis includes an improved non-asymptotic bound for the original SPA, which is of independent interest.more » « less
An official website of the United States government

Full Text Available