skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using autoKC and Interactions in Logistic Knowledge Tracing
A longstanding goal of learner modeling and educational data min-ing is to improve the domain model of knowledge that is used to make inferences about learning and performance. In this report we present a tool for finding domain models that is built into an exist-ing modeling framework, logistic knowledge tracing (LKT). LKT allows the flexible specification of learner models in logistic re-gression by allowing the modeler to select whatever features of the data are relevant to prediction. Each of these features (such as the count of prior opportunities) is a function computed for a compo-nent of data (such as a student or knowledge component). In this context, we have developed the “autoKC” component, which clus-ters knowledge components and allows the modeler to compute features for the clustered components. For an autoKC, the input component (initial KC or item assignment) is clustered prior to computing the feature and the feature is a function of that cluster. Another recent new function for LKT, which allows us to specify interactions between the logistic regression predictor terms, is com-bined with autoKC for this report. Interactions allow us to move beyond just assuming the cluster information has additive effects to allow us to model situations where a second factor of the data mod-erates a first factor.  more » « less
Award ID(s):
1934745
PAR ID:
10353230
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of The Third Workshop of the Learner Data Institute , The 15th International Conference on Educational Data Mining (EDM 2022)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    Logistic Knowledge Tracing (LKT) is a framework for combining various predictive features into student models that are adaptive, interpretable, explainable, and accurate. While the name logistic knowledge tracing was coined for our R package that implements this methodology for making student models, logistic knowledge tracing originates with much older models such as Item Response Theory (IRT), the Additive Factors Model (AFM), and Perfor-mance Factors Analysis (PFA), which exemplify a type of model where student performance is represented by the sum of multiple components each with some sort of feature computed for the component. Features may range from the simple presence or ab-sence of the component to complex functions of the prior history of the component. The LKT package provides a simple interface to this methodology, allowing old models to be specified or new models to be created by mixing and matching components with features. We will provide concrete examples of how the LKT framework can provide interpretable results on real-world datasets while being highly accurate. 
    more » « less
  2. null (Ed.)
    We describe a data mining pipeline to convert data from educational systems into knowledge component (KC) models. In contrast to other approaches, our approach employs and compares multiple model search methodologies (e.g., sparse factor analysis, covariance clustering) within a single pipeline. In this preliminary work, we describe our approach's results on two datasets when using 2 model search methodologies for inferring item or KCs relations (i.e., implied transfer). The first method uses item covariances which are clustered to determine related KCs, and the second method uses sparse factor analysis to derive the relationship matrix for clustering. We evaluate these methods on data from experimentally controlled practice of statistics items as well as data from the Andes physics system. We explain our plans to upgrade our pipeline to include additional methods of finding item relationships and creating domain models. We discuss advantages of improving the domain model that go beyond model fit, including the fact that models with clustered item KCs result in performance predictions transferring between KCs, enabling the learning system to be more adaptive and better able to track student knowledge. 
    more » « less
  3. Automatic pain intensity assessment from physiological signals has become an appealing approach, but it remains a largely unexplored research topic. Most studies have used machine learning approaches built on carefully designed features based on the domain knowledge available in the literature on the time series of physiological signals. However, a deep learning framework can automate the feature engineering step, enabling the model to directly deal with the raw input signals for real-time pain monitoring. We investigated a personalized Bidirectional Long short-term memory Recurrent Neural Networks (BiLSTM RNN), and an ensemble of BiLSTM RNN and Extreme Gradient Boosting Decision Trees (XGB) for four-category pain intensity classification. We recorded Electrodermal Activity (EDA) signals from 29 subjects during the cold pressor test. We decomposed EDA signals into tonic and phasic components and augmented them to original signals. The BiLSTM-XGB model outperformed the BiLSTM classification performance and achieved an average F1-score of 0.81 and an Area Under the Receiver Operating Characteristic curve (AUROC) of 0.93 over four pain states: no pain, low pain, medium pain, and high pain. We also explored a concatenation of the deep-learning feature representations and a set of fourteen knowledge-based features extracted from EDA signals. The XGB model trained on this fused feature set showed better performance than when it was trained on component feature sets individually. This study showed that deep learning could let us go beyond expert knowledge and benefit from the generated deep representations of physiological signals for pain assessment. 
    more » « less
  4. Arecaceae (palms) are an important resource for indigenous communities as well as fauna populations across Amazonia. Understanding the spatial patterns and the environmental factors that determine the habitats of palms is of considerable interest to rainforest ecologists. Here, we utilize remotely sensed imagery in conjunction with topography and soil attribute data and employ a generalized cluster identification algorithm, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), to study the underlying patterns of palms in two areas of Guyana, South America. The results of the HDBSCAN assessment were cross-validated with several point pattern analysis methods commonly used by ecologists (the quadrat test for complete spatial randomness, Morista Index, Ripley’s L-function, and the pair correlation function). A spatial logistic regression model was generated to understand the multivariate environmental influences driving the placement of cluster and outlier palms. Our results showed that palms are strongly clustered in the areas of interest and that the HDBSCAN’s clustering output correlates well with traditional analytical methods. The environmental factors influencing palm clusters or outliers, as determined by logistic regression, exhibit qualitative similarities to those identified in conventional ground-based palm surveys. These findings are promising for prospective research aiming to integrate remote flora identification techniques with traditional data collection studies. 
    more » « less
  5. In this paper, we describe our solution to predict student STEM career choices during the 2017 ASSISTments Datamining Competition. We built a machine learning system that automatically reformats the data set, generates new features and prunes redundant ones, and performs model and feature selection. We designed the system to automatically find a model that optimizes prediction performance, yet the final model is a simple logistic regression that allows researchers to discover important features and study their effects on STEM career choices. We also compared our method to other methods, which revealed that the key to good prediction is proper feature enrichment in the beginning stage of the data analysis, while feature selection in a later stage allows a simpler final model. 
    more » « less