Summary We consider forecasting a single time series using a large number of predictors in the presence of a possible nonlinear forecast function. Assuming that the predictors affect the response through the latent factors, we propose to first conduct factor analysis and then apply sufficient dimension reduction on the estimated factors to derive the reduced data for subsequent forecasting. Using directional regression and the inverse third-moment method in the stage of sufficient dimension reduction, the proposed methods can capture the nonmonotone effect of factors on the response. We also allow a diverging number of factors and only impose general regularity conditions on the distribution of factors, avoiding the undesired time reversibility of the factors by the latter. These make the proposed methods fundamentally more applicable than the sufficient forecasting method of Fan et al. (2017). The proposed methods are demonstrated both in simulation studies and an empirical study of forecasting monthly macroeconomic data from 1959 to 2016. Also, our theory contributes to the literature of sufficient dimension reduction, as it includes an invariance result, a path to perform sufficient dimension reduction under the high-dimensional setting without assuming sparsity, and the corresponding order-determination procedure.
more »
« less
On order determination by predictor augmentation
Abstract In many dimension reduction problems in statistics and machine learning, such as principal component analysis, canonical correlation analysis, independent component analysis, and sufficient dimension reduction, it is important to determine the dimension of the reduced predictor, which often amounts to estimating the rank of a matrix. This problem is called order determination. In this paper, we propose a novel and highly effective order-determination method based on the idea of predictor augmentation. We show that, if we augment the predictor by an artificially generated random vector, then the part of the eigenvectors of the matrix induced by the augmentation display a pattern that reveals information about the order to be determined. This information, when combined with the information provided by the eigenvalues of the matrix, greatly enhances the accuracy of order determination.
more »
« less
- Award ID(s):
- 1713078
- PAR ID:
- 10202268
- Date Published:
- Journal Name:
- Biometrika
- ISSN:
- 0006-3444
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Modeling species distributions over space and time is one of the major research topics in both ecology and conservation biology. Joint Species Distribution models (JSDMs) have recently been introduced as a tool to better model community data, by inferring a residual covariance matrix between species, after accounting for species' response to the environment. However, these models are computationally demanding, even when latent factors, a common tool for dimension reduction, are used. To address this issue, Taylor-Rodriguez et al. ( 2017 ) proposed to use a Dirichlet process, a Bayesian nonparametric prior, to further reduce model dimension by clustering species in the residual covariance matrix. Here, we built on this approach to include a prior knowledge on the potential number of clusters, and instead used a Pitman–Yor process to address some critical limitations of the Dirichlet process. We therefore propose a framework that includes prior knowledge in the residual covariance matrix, providing a tool to analyze clusters of species that share the same residual associations with respect to other species. We applied our methodology to a case study of plant communities in a protected area of the French Alps (the Bauges Regional Park), and demonstrated that our extensions improve dimension reduction and reveal additional information from the residual covariance matrix, notably showing how the estimated clusters are compatible with plant traits, endorsing their importance in shaping communities.more » « less
-
Active learning with generalized sliced inverse regression for high-dimensional reliability analysisIt is computationally expensive to predict reliability using physical models at the design stage if many random input variables exist. This work introduces a dimension reduction technique based on generalized sliced inverse regression (GSIR) to mitigate the curse of dimensionality. The proposed high dimensional reliability method enables active learning to integrate GSIR, Gaussian Process (GP) modeling, and Importance Sampling (IS), resulting in an accurate reliability prediction at a reduced computational cost. The new method consists of three core steps, 1) identification of the importance sampling region, 2) dimension reduction by GSIR to produce a sufficient predictor, and 3) construction of a GP model for the true response with respect to the sufficient predictor in the reduced-dimension space. High accuracy and efficiency are achieved with active learning that is iteratively executed with the above three steps by adding new training points one by one in the region with a high chance of failure.more » « less
-
Abstract Reliability analysis is a core element in engineering design and can be performed with physical models (limit-state functions). Reliability analysis becomes computationally expensive when the dimensionality of input random variables is high. This work develops a high-dimensional reliability analysis method through a new dimension reduction strategy so that the contributions of unimportant input variables are also accommodated after dimension reduction. Dimension reduction is performed with the first iteration of the first-order reliability method (FORM), which identifies important and unimportant input variables. Then a higher order reliability analysis is performed in the reduced space of only important input variables. The reliability obtained in the reduced space is then integrated with the contributions of unimportant input variables, resulting in the final reliability prediction that accounts for both types of input variables. Consequently, the new reliability method is more accurate than the traditional method which fixes unimportant input variables at their means. The accuracy is demonstrated by three examples.more » « less
-
Abstract Reliability analysis is usually a core element in engineering design, during which reliability is predicted with physical models (limit-state functions). Reliability analysis becomes computationally expensive when the dimensionality of input random variables is high. This work develops a high dimensional reliability analysis method by a new dimension reduction strategy so that the contributions of both important and unimportant input variables are accommodated by the proposed dimension reduction method. The consideration of the contributions of unimportant input variables can certainly improve the accuracy of the reliability prediction, especially where many unimportant input variables are involved. The dimension reduction is performed with the first iteration of the first order reliability method (FORM), which identifies important and unimportant input variables. Then a higher order reliability analysis, such as the second order reliability analysis and metamodeling method, is performed in the reduced space of only important input variables. The reliability obtained in the reduced space is then integrated with the contributions of unimportant input variables, resulting in the final reliability prediction that accounts for both types of input variables. Consequently, the new reliability method is more accurate than the traditional method, which fixes unimportant input variables at their means. The accuracy is demonstrated by three examples.more » « less
An official website of the United States government

