A longstanding goal of learner modeling and educational data min-ing is to improve the domain model of knowledge that is used to make inferences about learning and performance. In this report we present a tool for finding domain models that is built into an exist-ing modeling framework, logistic knowledge tracing (LKT). LKT allows the flexible specification of learner models in logistic re-gression by allowing the modeler to select whatever features of the data are relevant to prediction. Each of these features (such as the count of prior opportunities) is a function computed for a compo-nent of data (such as a student or knowledge component). In this context, we have developed the “autoKC” component, which clus-ters knowledge components and allows the modeler to compute features for the clustered components. For an autoKC, the input component (initial KC or item assignment) is clustered prior to computing the feature and the feature is a function of that cluster. Another recent new function for LKT, which allows us to specify interactions between the logistic regression predictor terms, is com-bined with autoKC for this report. Interactions allow us to move beyond just assuming the cluster information has additive effects to allow us to model situations where a second factor of the data mod-erates a first factor.
more »
« less
Automatic Domain Model Creation and Improvement
We describe a data mining pipeline to convert data from educational systems into knowledge component (KC) models. In contrast to other approaches, our approach employs and compares multiple model search methodologies (e.g., sparse factor analysis, covariance clustering) within a single pipeline. In this preliminary work, we describe our approach's results on two datasets when using 2 model search methodologies for inferring item or KCs relations (i.e., implied transfer). The first method uses item covariances which are clustered to determine related KCs, and the second method uses sparse factor analysis to derive the relationship matrix for clustering. We evaluate these methods on data from experimentally controlled practice of statistics items as well as data from the Andes physics system. We explain our plans to upgrade our pipeline to include additional methods of finding item relationships and creating domain models. We discuss advantages of improving the domain model that go beyond model fit, including the fact that models with clustered item KCs result in performance predictions transferring between KCs, enabling the learning system to be more adaptive and better able to track student knowledge.
more »
« less
- Award ID(s):
- 1934745
- PAR ID:
- 10291622
- Date Published:
- Journal Name:
- Proceedings of The 14th International Conference on Educational Data Mining
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Knowledge components (KCs) have many applications. In computing education, knowing the demonstration of specific KCs has been challenging. This paper introduces an entirely data-driven approach for (i) discovering KCs and (ii) demonstrating KCs, using students’ actual code submissions. Our system is based on two expected properties of KCs: (i) generate learning curves following the power law of practice, and (ii) are predictive of response correctness. We train a neural architecture (named KC-Finder) that classifies the correctness of student code submissions and captures problem-KC relationships. Our evaluation on data from 351 students in an introductory Java course shows that the learned KCs can generate reasonable learning curves and predict code submission correctness. At the same time, some KCs can be interpreted to identify programming skills. We compare the learning curves described by our model to four baselines, showing that (i) identifying KCs with naive methods is a difficult task and (ii) our learning curves exhibit a substantially better curve fit. Our work represents a first step in solving the data-driven KC discovery problem in computing education.more » « less
-
Abstract Studies of spatial point patterns (SPPs) are often used to examine the role that density‐dependence (DD) and environmental filtering (EF) play in community assembly and species coexistence in forest communities. However, SPP analyses often struggle to distinguish the opposing effects that DD and EF may have on the distribution of tree species.We tested percolation threshold analysis on simulated tree communities as a method to distinguish the importance of thinning from DD EF on SPPs. We then compared the performance of percolation threshold analysis results and a Gibbs point process model in detecting environmental associations as well as clustering patterns or overdispersion. Finally, we applied percolation threshold analysis and the Gibbs point process model to observed SPPs of 12 dominant tree species in a Puerto Rican forest to detect evidence of DD and EF.Percolation threshold analysis using simulated SPPs detected a decrease in clustering due to DD and an increase in clustering from EF. In contrast, the Gibbs point process model clearly detected the effects of EF but only identified DD thinning in two of the four types of simulated SPPs. Percolation threshold analysis on the 12 observed tree species' SPPs found that the SPPs for two species were consistent with thinning from DD processes only, four species had SPPs consistent with EF only and SPP for five reflected a combination of both processes. Gibbs models of observed SPPs of living trees detected significant environmental associations for 11 species and clustering consistent with DD processes for seven species.Percolation threshold analysis is a robust method for detecting community assembly processes in simulated SPPs. By applying percolation threshold analysis to natural communities, we found that tree SPPs were consistent with thinning from both DD and EF. Percolation threshold analysis was better suited to detect DD thinning than Gibbs models for clustered simulated communities. Percolation threshold analysis improves our understanding of forest community assembly processes by quantifying the relative importance of DD and EF in forest communities.more » « less
-
The problem of knowledge graph (KG) reasoning has been widely explored by traditional rule-based systems and more recently by knowledge graph embedding methods. While logical rules can capture deterministic behavior in a KG, they are brittle and mining ones that infer facts beyond the known KG is challenging. Probabilistic embedding methods are effective in capturing global soft statistical tendencies and reasoning with them is computationally efficient. While embedding representations learned from rich training data are expressive, incompleteness and sparsity in real-world KGs can impact their effectiveness. We aim to leverage the complementary properties of both methods to develop a hybrid model that learns both high-quality rules and embeddings simultaneously. Our method uses a cross feedback paradigm wherein an embedding model is used to guide the search of a rule mining system to mine rules and infer new facts. These new facts are sampled and further used to refine the embedding model. Experiments on multiple benchmark datasets show the effectiveness of our method over other competitive standalone and hybrid baselines. We also show its efficacy in a sparse KG setting.more » « less
-
null (Ed.)High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clustering on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights.more » « less
An official website of the United States government

