Recent Advances in Text Analysis

Ke, Zheng Tracy; Ji, Pengsheng; Jin, Jiashun; Li, Wanshan

doi:10.1146/annurev-statistics-040522-022138

Citation Details

Recent Advances in Text Analysis

Text analysis is an interesting research area in data science and has various applications, such as in artificial intelligence, biomedical research, and engineering. We review popular methods for text analysis, ranging from topic modeling to the recent neural language models. In particular, we review Topic-SCORE, a statistical approach to topic modeling, and discuss how to use it to analyze the Multi-Attribute Data Set on Statisticians (MADStat), a data set on statistical publications that we collected and cleaned. The application of Topic-SCORE and other methods to MADStat leads to interesting findings. For example, we identified 11 representative topics in statistics. For each journal, the evolution of topic weights over time can be visualized, and these results are used to analyze the trends in statistical research. In particular, we propose a new statistical model for ranking the citation impacts of 11 topics, and we also build a cross-topic citation graph to illustrate how research results on different topics spread to one another. The results on MADStat provide a data-driven picture of the statistical research from 1975 to 2015, from a text analysis perspective. more »

Award ID(s):: 1943902 2310668

PAR ID:: 10531750

Author(s) / Creator(s):: Ke, Zheng Tracy; Ji, Pengsheng; Jin, Jiashun; Li, Wanshan

Publisher / Repository:: Annual Reviews

Date Published:: 2024-04-22

Journal Name:: Annual Review of Statistics and Its Application

Volume:: 11

Issue:: 1

ISSN:: 2326-8298

Page Range / eLocation ID:: 347 to 372

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1146/annurev-statistics-040522-022138

More Like this