skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Principal weighted support vector machines for sufficient dimension reduction in binary classification
Sufficient dimension reduction is popular for reducing data dimensionality without stringent model assumptions. However, most existing methods may work poorly for binary classification. For example, sliced inverse regression (Li, 1991) can estimate at most one direction if the response is binary. In this paper we propose principal weighted support vector machines, a unified framework for linear and nonlinear sufficient dimension reduction in binary classification. Its asymptotic properties are studied, and an efficient computing algorithm is proposed. Numerical examples demonstrate its performance in binary classification.  more » « less
Award ID(s):
1632951
PAR ID:
10073273
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biometrika
Volume:
104
Issue:
1
ISSN:
0006-3444
Page Range / eLocation ID:
67-81
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary We consider forecasting a single time series using a large number of predictors in the presence of a possible nonlinear forecast function. Assuming that the predictors affect the response through the latent factors, we propose to first conduct factor analysis and then apply sufficient dimension reduction on the estimated factors to derive the reduced data for subsequent forecasting. Using directional regression and the inverse third-moment method in the stage of sufficient dimension reduction, the proposed methods can capture the nonmonotone effect of factors on the response. We also allow a diverging number of factors and only impose general regularity conditions on the distribution of factors, avoiding the undesired time reversibility of the factors by the latter. These make the proposed methods fundamentally more applicable than the sufficient forecasting method of Fan et al. (2017). The proposed methods are demonstrated both in simulation studies and an empirical study of forecasting monthly macroeconomic data from 1959 to 2016. Also, our theory contributes to the literature of sufficient dimension reduction, as it includes an invariance result, a path to perform sufficient dimension reduction under the high-dimensional setting without assuming sparsity, and the corresponding order-determination procedure. 
    more » « less
  2. We consider the semi-supervised dimension reduction problem: given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannot-link constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method---the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkage-based methods when very few labels are available. 
    more » « less
  3. It is computationally expensive to predict reliability using physical models at the design stage if many random input variables exist. This work introduces a dimension reduction technique based on generalized sliced inverse regression (GSIR) to mitigate the curse of dimensionality. The proposed high dimensional reliability method enables active learning to integrate GSIR, Gaussian Process (GP) modeling, and Importance Sampling (IS), resulting in an accurate reliability prediction at a reduced computational cost. The new method consists of three core steps, 1) identification of the importance sampling region, 2) dimension reduction by GSIR to produce a sufficient predictor, and 3) construction of a GP model for the true response with respect to the sufficient predictor in the reduced-dimension space. High accuracy and efficiency are achieved with active learning that is iteratively executed with the above three steps by adding new training points one by one in the region with a high chance of failure. 
    more » « less
  4. null (Ed.)
    Abstract Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods. 
    more » « less
  5. In this article, we develop CausalEGM, a deep learning framework for nonlinear dimension reduction and generative modeling of the dependency among covariate features affecting treatment and response. CausalEGM can be used for estimating causal effects in both binary and continuous treatment settings. By learning a bidirectional transformation between the high-dimensional covariate space and a low-dimensional latent space and then modeling the dependencies of different subsets of the latent variables on the treatment and response, CausalEGM can extract the latent covariate features that affect both treatment and response. By conditioning on these features, one can mitigate the confounding effect of the high dimensional covariate on the estimation of the causal relation between treatment and response. In a series of experiments, the proposed method is shown to achieve superior performance over existing methods in both binary and continuous treatment settings. The improvement is substantial when the sample size is large and the covariate is of high dimension. Finally, we established excess risk bounds and consistency results for our method, and discuss how our approach is related to and improves upon other dimension reduction approaches in causal inference. 
    more » « less