NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mixed membership estimation for social networks

https://doi.org/10.1016/j.jeconom.2022.12.003

Jin, Jiashun; Ke, Zheng Tracy; Luo, Shengming (February 2024, Journal of Econometrics)

Full Text Available
Signal-noise ratio of genetic associations and statistical power of SNP-set tests

https://doi.org/10.1214/22-AOAS1725

Zhang, Hong; Liu, Ming; Jin, Jiashun; Wu, Zheyang (September 2023, The Annals of Applied Statistics)

Full Text Available
Subject clustering by IF-PCA and several recent methods

https://doi.org/10.3389/fgene.2023.1166404

Chen, Dieyi; Jin, Jiashun; Ke, Zheng Tracy (May 2023, Frontiers in Genetics)

Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear).
more » « less
Full Text Available
Special invited paper: The SCORE normalization, especially for heterogeneous network and text data

https://doi.org/10.1002/sta4.545

Ke, Zheng Tracy; Jin, Jiashun (March 2023, Stat)

SCORE was introduced as a spectral approach to network community detection. Since many networks have severe degree heterogeneity, the ordinary spectral clustering (OSC) approach to community detection may perform unsatisfactorily. SCORE alleviates the effect of degree heterogeneity by introducing a new normalization idea in the spectral domain and makes OSC more effective. SCORE is easy to use and computationally fast. It adapts easily to new directions and sees an increasing interest in practice. In this paper, we review the basics of SCORE, the adaption of SCORE to network mixed membership estimation and topic modeling, and the application of SCORE in real data, including two datasets on the publications of statisticians. We also review the theoretical “ideology” underlying SCORE. We show that in the spectral domain, SCORE converts a simplicial cone to a simplex and provides a simple and direct link between the simplex and network memberships. SCORE attains an exponential rate and a sharp phase transition in community detection, and achieves optimal rates in mixed membership estimation and topic modeling.
more » « less
Co-citation and Co-authorship Networks of Statisticians

https://doi.org/10.1080/07350015.2021.1978469

Ji, Pengsheng; Jin, Jiashun; Ke, Zheng Tracy; Li, Wanshan (April 2022, Journal of Business & Economic Statistics)

Full Text Available
Rejoinder: “Co-citation and Co-authorship Networks of Statisticians”

https://doi.org/10.1080/07350015.2022.2055358

Ji, Pengsheng; Jin, Jiashun; Ke, Zheng Tracy; Li, Wanshan (April 2022, Journal of Business & Economic Statistics)

Full Text Available
Optimal Estimation of the Number of Network Communities

https://doi.org/10.1080/01621459.2022.2035736

Jin, Jiashun; Ke, Zheng Tracy; Luo, Shengming; Wang, Minzhe (March 2022, Journal of the American Statistical Association)

Full Text Available
A sharp NMF result with applications to network modeling

Jiashun Jin (January 2022, Advances in neural information processing systems)

Full Text Available
Phase transition for detecting a small community in a large network

Jiashun Jin, Zheng Tracy (January 2022, ICLR 2023)

Full Text Available

Search for: All records