NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A network model that combines latent factors and sparse graphs

https://doi.org/10.1002/sam.11492

Suh, Namjoon; Huo, Xiaoming; Heim, Eric; Seversky, Lee (December 2020, Statistical Analysis and Data Mining: The ASA Data Science Journal)

Abstract We propose a combined model, which integrates the latent factor model and a sparse graphical model, for network data. It is noticed that neither a latent factor model nor a sparse graphical model alone may be sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represent the main trends (a.k.a., factors), and a sparse graphical component that captures the remaining ad‐hoc dependence. Model selection and parameter estimation are carried out simultaneously via a penalized likelihood approach. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards low‐dimensional latent components and a sparse graphical structure. The effectiveness of our model is demonstrated via simulation studies, and the model is also applied to four real datasets: Zachary's Karate club data, Kreb's U.S. political book dataset (http://www.orgnet.com), U.S. political blog dataset , and citation network of statisticians; showing meaningful performances in practical situations.
more » « less
Asymptotic Theory of $\boldsymbol \ell _1$ -Regularized PDE Identification from a Single Noisy Trajectory

https://doi.org/10.1137/21M1398884

He, Yuchen; Suh, Namjoon; Huo, Xiaoming; Kang, Sung Ha; Mei, Yajun (September 2022, SIAM/ASA Journal on Uncertainty Quantification)

Full Text Available
A promising new tool for fault diagnosis of railway wheelset bearings: SSO-based Kurtogram

https://doi.org/10.1016/j.isatra.2021.09.009

Yi, Cai; Li, Yiqun; Huo, Xiaoming; Tsui, Kwok-Leung (September 2022, ISA Transactions)

Full Text Available
The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models

https://doi.org/10.1109/ISIT50566.2022.9834388

Luo, Yiling; Huo, Xiaoming; Mei, Yajun (June 2022, 2022 IEEE International Symposium on Information Theory (ISIT))

Full Text Available
A Statistically and Numerically Efficient Independence Test Based on Random Projections and Distance Covariance

https://doi.org/10.3389/fams.2021.779841

Huang, Cheng; Huo, Xiaoming (January 2022, Frontiers in Applied Mathematics and Statistics)

Testing for independence plays a fundamental role in many statistical techniques. Among the nonparametric approaches, the distance-based methods (such as the distance correlation-based hypotheses testing for independence) have many advantages, compared with many other alternatives. A known limitation of the distance-based method is that its computational complexity can be high. In general, when the sample size is n , the order of computational complexity of a distance-based method, which typically requires computing of all pairwise distances, can be O ( n 2 ). Recent advances have discovered that in the univariate cases, a fast method with O ( n log n ) computational complexity and O ( n ) memory requirement exists. In this paper, we introduce a test of independence method based on random projection and distance correlation, which achieves nearly the same power as the state-of-the-art distance-based approach, works in the multivariate cases, and enjoys the O ( nK log n ) computational complexity and O ( max{ n , K }) memory requirement, where K is the number of random projections. Note that saving is achieved when K < n / log n . We name our method a Randomly Projected Distance Covariance (RPDC). The statistical theoretical analysis takes advantage of some techniques on the random projection which are rooted in contemporary machine learning. Numerical experiments demonstrate the efficiency of the proposed method, relative to numerous competitors.
more » « less
Full Text Available
A unifying framework of high-dimensional sparse estimation with dierence-of-convex (DC) regularization

Cao, Shanshan; Huo, Xiaoming; Pang, Jong-Shi (August 2021, Statistical science)
null (Ed.)
Under the linear regression framework, we study the variable selection problem when the underlying model is assumed to have a small number of nonzero coefficients. Non-convex penalties in speci c forms are well-studied in the literature for sparse estimation. A recent work, Ahn, Pang, and Xin (2017), has pointed out that nearly all existing non-convex penalties can be represented as difference-of-convex (DC) functions, which are the difference of two convex functions, while itself may not be convex. There is a large existing literature on optimization problems when their objectives and/or constraints involve DC functions. Efficient numerical solutions have been proposed. Under the DC framework, directional-stationary (d-stationary) solutions are considered, and they are usually not unique. In this paper, we show that under some mild conditions, a certain subset of d-stationary solutions in an optimization problem (with a DC objective) has some ideal statistical properties: namely, asymptotic estimation consistency, asymptotic model selection consistency, asymptotic efficiency. Our assumptions are either weaker than or comparable with those conditions that have been adopted in other existing works. This work shows that DC is a nice framework to offer a uni ed approach to these existing works where non-convex penalties are involved. Our work bridges the communities of optimization and statistics.
more » « less
Full Text Available
Asymptotic Convergence Rates of the Length of the Longest Run(s) in an Inflating Bernoulli Net

https://doi.org/10.1109/TIT.2021.3097886

Ni, Kai; Cao, Shanshan; Huo, Xiaoming (July 2021, IEEE transactions on information theory)
null (Ed.)
In image detection, one problem is to test whether the set, though mainly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $$C^\alpha$$-norm bounded by $$\beta$$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the length of the path connecting many consecutive “significant” nodes. In this paper, we develop the mathematical formalism of this algorithm and analyze the statistical property of the length of the longest significant run. The rate of convergence is derived. Using percolation theory and random graph theory, we present a novel probabilistic model named, pseudo-tree model. Based on the asymptotic results for the pseudo-tree model, we further study the length of the longest significant run in an “inflating” Bernoulli net. We find that the probability parameter $$p$$ of significant node plays an important role: there is a threshold $$p_c$$, such that in the cases of $$p < p_c$$ and $$p > p_c$$, very different asymptotic behaviors of the length of the significant runs are observed. We apply our results to the detection of an underlying curvilinear feature and prove that the test based on our proposed longest run theory is asymptotically powerful.
more » « less
Full Text Available
Editorial: Mathematical Fundamentals of Machine Learning

https://doi.org/10.3389/fams.2021.674785

Glickenstein, David; Hamm, Keaton; Huo, Xiaoming; Mei, Yajun; Stoll, Martin (April 2021, Frontiers in Applied Mathematics and Statistics)
null (Ed.)
Full Text Available
What can cluster analysis offer in investing? - Measuring structural changes in the investment universe

https://doi.org/10.1016/j.iref.2020.09.004

Sim, Min Kyu; Deng, Shijie; Huo, Xiaoming (January 2021, International Review of Economics & Finance)
null (Ed.)
Full Text Available

Search for: All records