Sufficient dimension reduction (SDR) is a very useful concept for exploratory analysis and data visualization in regression, especially when the number of covariates is large. Many SDR methods have been proposed for regression with a continuous response, where the central subspace (CS) is the target of estimation. Various conditions, such as the linearity condition and the constant covariance condition, are imposed so that these methods can estimate at least a portion of the CS. In this paper we study SDR for regression and discriminant analysis with categorical response. Motivated by the exploratory analysis and data visualization aspects of SDR, we propose a new geometric framework to reformulate the SDR problem in terms of manifold optimization and introduce a new concept called Maximum Separation Subspace (MASES). The MASES naturally preserves the “sufficiency” in SDR without imposing additional conditions on the predictor distribution, and directly inspires a semi-parametric estimator. Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings.
more »
« less
Robust sufficient dimension reduction via α -distance covariance
We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using α-distance covariance (dCov)in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free without estimating link function based on the projection on the Stiefel manifold. We establish the convergence property of the pro-posed estimation under some regularity conditions. We compare the performance of our method with existing SDR methods by simulation and real data analysis and show that our algorithm improves the computational efficiency and effectiveness.
more »
« less
- PAR ID:
- 10534142
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- Journal of Nonparametric Statistics
- Edition / Version:
- 1
- Volume:
- 0
- Issue:
- 0
- ISSN:
- 1048-5252
- Page Range / eLocation ID:
- 1-16
- Subject(s) / Keyword(s):
- α-distance covariance central subspace sufficient dimension reduction (SDR) manifold learning robust statistics
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary We consider forecasting a single time series using a large number of predictors in the presence of a possible nonlinear forecast function. Assuming that the predictors affect the response through the latent factors, we propose to first conduct factor analysis and then apply sufficient dimension reduction on the estimated factors to derive the reduced data for subsequent forecasting. Using directional regression and the inverse third-moment method in the stage of sufficient dimension reduction, the proposed methods can capture the nonmonotone effect of factors on the response. We also allow a diverging number of factors and only impose general regularity conditions on the distribution of factors, avoiding the undesired time reversibility of the factors by the latter. These make the proposed methods fundamentally more applicable than the sufficient forecasting method of Fan et al. (2017). The proposed methods are demonstrated both in simulation studies and an empirical study of forecasting monthly macroeconomic data from 1959 to 2016. Also, our theory contributes to the literature of sufficient dimension reduction, as it includes an invariance result, a path to perform sufficient dimension reduction under the high-dimensional setting without assuming sparsity, and the corresponding order-determination procedure.more » « less
-
Fair representations are a powerful tool for establishing criteria like statistical parity, proxy non-discrimination, and equality of opportunity in learned models. Existing techniques for learning these representations are typically model-agnostic, as they preprocess the original data such that the output satisfies some fairness criterion, and can be used with arbitrary learning methods. In contrast, we demonstrate the promise of learning a model-aware fair representation, focusing on kernel-based models. We leverage the classical Sufficient Dimension Reduction (SDR) framework to construct representations as subspaces of the reproducing kernel Hilbert space (RKHS), whose member functions are guaranteed to satisfy fairness. Our method supports several fairness criteria, continuous and discrete data, and multiple protected attributes. We further show how to calibrate the accuracy tradeoff by characterizing it in terms of the principal angles between subspaces of the RKHS. Finally, we apply our approach to obtain the first Fair Gaussian Process (FGP) prior for fair Bayesian learning, and show that it is competitive with, and in some cases outperforms, state-of-the-art methods on real data.more » « less
-
We consider the semi-supervised dimension reduction problem: given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannot-link constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method---the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkage-based methods when very few labels are available.more » « less
-
We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.more » « less
An official website of the United States government

