skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Maximum Separation Subspace in Sufficient Dimension Reduction with Categorical Response
Sufficient dimension reduction (SDR) is a very useful concept for exploratory analysis and data visualization in regression, especially when the number of covariates is large. Many SDR methods have been proposed for regression with a continuous response, where the central subspace (CS) is the target of estimation. Various conditions, such as the linearity condition and the constant covariance condition, are imposed so that these methods can estimate at least a portion of the CS. In this paper we study SDR for regression and discriminant analysis with categorical response. Motivated by the exploratory analysis and data visualization aspects of SDR, we propose a new geometric framework to reformulate the SDR problem in terms of manifold optimization and introduce a new concept called Maximum Separation Subspace (MASES). The MASES naturally preserves the “sufficiency” in SDR without imposing additional conditions on the predictor distribution, and directly inspires a semi-parametric estimator. Numerical studies show MASES exhibits superior performance as compared with competing SDR methods in specific settings.  more » « less
Award ID(s):
1908969 1613154 1617691
PAR ID:
10192086
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
21
Issue:
29
ISSN:
1533-7928
Page Range / eLocation ID:
1-36
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a novel sufficient dimension-reduction (SDR) method which is robust against outliers using α-distance covariance (dCov)in dimension-reduction problems. Under very mild conditions on the predictors, the central subspace is effectively estimated and model-free without estimating link function based on the projection on the Stiefel manifold. We establish the convergence property of the pro-posed estimation under some regularity conditions. We compare the performance of our method with existing SDR methods by simulation and real data analysis and show that our algorithm improves the computational efficiency and effectiveness. 
    more » « less
  2. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    K-12 Computer Science (CS) education has seen remarkable growth recently, driven by the increasing focus on CS and Computational Thinking (CT) integration. Despite the abundance of Professional development (PD) programs designed to prepare future CS teachers with the required knowledge and skills, there is a lack of research on how teachers' perceptions and attitudes of CS and CT evolve before and after participating in these programs. To address this gap, our exploratory study aims to study the dynamics of pre-and in-service teachers' experiences, attitudes, and perceptions towards CS and CT through their participation in a K-12 CS education micro-credential program. In this study, we employed topic modeling to identify topics that emerged from teachers' written pre- and post-CS autobiographies, conducted statistical analysis to explore how these topics evolve over time and applied regression analysis to investigate the factors influencing these dynamics. We observed a shift in teachers' initial feelings of fear, intimidation, and stress towards confidence, fun, and feeling competent in basic CS, reflecting a positive transformation. Regression analysis revealed that features, such as experienced teacher status and CT conceptual understanding, correlate with participants' evolving views. These observed relationships highlight the micro-credential's role in not only enhancing technical competency but also fostering an adaptive, integrative pedagogical mindset, providing new insights for course design. 
    more » « less
  3. Approximating the action of a matrix function $$f(\vec{A})$$ on a vector $$\vec{b}$$ is an increasingly important primitive in machine learning, data science, and statistics, with applications such as sampling high dimensional Gaussians, Gaussian process regression and Bayesian inference, principle component analysis, and approximating Hessian spectral densities. Over the past decade, a number of algorithms enjoying strong theoretical guarantees have been proposed for this task. Many of the most successful belong to a family of algorithms called \emph{Krylov subspace methods}. Remarkably, a classic Krylov subspace method, called the Lanczos method for matrix functions (Lanczos-FA), frequently outperforms newer methods in practice. Our main result is a theoretical justification for this finding: we show that, for a natural class of \emph{rational functions}, Lanczos-FA matches the error of the best possible Krylov subspace method up to a multiplicative approximation factor. The approximation factor depends on the degree of $f(x)$'s denominator and the condition number of $$\vec{A}$$, but not on the number of iterations $$k$$. Our result provides a strong justification for the excellent performance of Lanczos-FA, especially on functions that are well approximated by rationals, such as the matrix square root. 
    more » « less
  4. This research work in progress research paper examines student perceptions after completing an exploratory learning lesson before instruction on an introductory programming concept. During exploratory learning activities, students explore a novel concept prior to instruction—the reverse of typical instruct-then-practice methods. Exploratory learning before instruction can help students activate prior knowledge, become aware of their knowledge gaps, and discern important problem features to improve conceptual understanding. Students in a first-year engineering course (N=402) learned about Python error messages in one of two conditions. In the explore-first condition, students completed a collaborative activity prior to instruction. In the instruct-first condition, students received instruction prior to the activity. Following the activity and instruction, students completed a survey to assess their perceptions of the activities. Survey items (e.g. cognitive load, self-efficacy, belonging, knowledge gaps) were chosen as potential factors that could explain learning outcomes between the two conditions. In prior work, we found higher posttest scores in the instruct-first compared to explore-first condition, contrary to the majority of previous studies. Cognitive load and knowledge gaps were higher in the explore-first condition than the instruct-first condition. Self-efficacy and competence were lower in the explore-first condition. No other significant differences were found. Exploring before instruction might disrupt learning and perceived efficacy and competence if the activity is too challenging, or if the instruction does not fully resolve gaps in students’ knowledge. 
    more » « less
  5. Visualizations of data provide a proven method for analysts to explore and make data-driven discoveries. However, current visualization tools provide only limited support for hypothesis-driven analyses, and often lack capabilities that would allow users to visually test the fit of their conceptual models against the data. This imbalance could bias users to overly rely on exploratory visual analysis as the principal mode of inquiry, which can be detrimental to discovery. To address this gap, we propose a new paradigm for ‘concept-driven’ visual analysis. In this style of analysis, analysts share their conceptual models and hypotheses with the system. The system then uses those inputs to drive the generation of visualizations, while providing plots and interactions to explore places where models and data disagree. We discuss key characteristics and design considerations for concept-driven visualizations, and report preliminary findings from a formative study. 
    more » « less