NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pairwise sequence alignment at arbitrarily large evolutionary distance

https://doi.org/10.1214/23-AAP2009

Legried, Brandon; Roch, Sebastien (June 2024, The Annals of Applied Probability)

Full Text Available
QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent

Tabatabaee, Yasamin; Roch, Sebastien; Warnow, Tandy (November 2023, Journal of computational biology)

Full Text Available
A New Basis for Sparse Principal Component Analysis

https://doi.org/10.1080/10618600.2023.2256502

Chen, Fan; Rohe, Karl (September 2023, Journal of Computational and Graphical Statistics)

Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a $$p \times k$$ matrix) is approximately sparse. We propose a method that presumes the $$p \times k$$ matrix becomes approximately sparse after a $$k \times k$$ rotation. The simplest version of the algorithm initializes with the leading $$k$$ principal components. Then, the principal components are rotated with an $$k \times k$$ orthogonal rotation to make them approximately sparse. Finally, soft-thresholding is applied to the rotated principal components. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. One consequence is that a sparse component need not to be a leading eigenvector, but rather a mixture of them. In this way, we propose a new (rotated) basis for sparse PCA. In addition, our approach avoids ``deflation'' and multiple tuning parameters required for that. Our sparse PCA framework is versatile; for example, it extends naturally to a two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We provide evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of social networks, we demonstrate the modern usefulness of sparse PCA in exploring multivariate data.
more » « less
Full Text Available
Expanding the Class of Global Objective Functions for Dissimilarity-Based Hierarchical Clustering

https://doi.org/10.1007/s00357-023-09447-x

Roch, Sebastien (September 2023, Journal of Classification)

Full Text Available
Vintage factor analysis with Varimax performs statistical inference

https://doi.org/10.1093/jrsssb/qkad029

Rohe, Karl; Zeng, Muzhe (July 2023, Journal of the Royal Statistical Society Series B: Statistical Methodology)

In the 1930s, Psychologists began developing Multiple-Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors. In this form of factor analysis, the Varimax factor rotation redraws the axes through the multi-dimensional factors to make them sparse and thus make them more interpretable. Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant. Despite the controversy, factor rotations have remained widely popular among people analyzing data. Reversing nearly a century of statistical thinking on the topic, we show that the rotation makes the factors easier to interpret because the Varimax performs statistical inference; in particular, principal components analysis (PCA) with a Varimax rotation provides a unified spectral estimation strategy for a broad class of semi-parametric factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation. In addition, we show that Thurstone’s widely employed sparsity diagnostics implicitly assess a key leptokurtic condition that makes the axes statistically identifiable in these models. PCA with Varimax is fast, stable, and practical. Combined with Thurstone’s straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.
more » « less
Full Text Available
An impossibility result for phylogeny reconstruction from k-mer counts

https://doi.org/10.1214/22-AAP1805

Fan, Wai-Tong Louis; Legried, Brandon; Roch, Sebastien (December 2022, The Annals of Applied Probability)

Full Text Available
Sufficient condition for root reconstruction by parsimony on binary trees with general weights

https://doi.org/10.1214/21-ECP423

Roch, Sebastien; Wang, Kun-Chieh (January 2021, Electronic Communications in Probability)

Full Text Available
Impossibility of Consistent Distance Estimation from Sequence Lengths Under the TKF91 Model

https://doi.org/10.1007/s11538-020-00801-3

Fan, Wai-Tong Louis; Legried, Brandon; Roch, Sebastien (September 2020, Bulletin of Mathematical Biology)
null (Ed.)
Full Text Available
Targeted sampling from massive block model graphs with personalized PageRank

https://doi.org/10.1111/rssb.12349

Chen, Fan; Zhang, Yini; Rohe, Karl (February 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology))

Full Text Available
Asymptotic seed bias in respondent-driven sampling

https://doi.org/10.1214/20-EJS1698

Yan, Yuling; Hanlon, Bret; Roch, Sebastien; Rohe, Karl (January 2020, Electronic Journal of Statistics)

Full Text Available

Search for: All records