Educational data mining research has demonstrated that the large volume of learning data collected by modern e-learning systems could be used to recognize student behavior patterns and group students into cohorts with similar behavior. However, few attempts have been done to connect and compare behavioral patterns with known dimensions of individual differences. To what extent learner behavior is defined by known individual differences? Which of them could be a better predictor of learner engagement and performance? Could we use behavior patterns to build a data-driven model of individual differences that could be more useful for predicting critical outcomes of the learning process than traditional models? Our paper attempts to answer these questions using a large volume of learner data collected in an online practice system. We apply a sequential pattern mining approach to build individual models of learner practice behavior and reveal latent student subgroups that exhibit considerably different practice behavior. Using these models we explored the connections between learner behavior and both, the incoming and outgoing parameters of the learning process. Among incoming parameters we examined traditionally collected individual differences such as self-esteem, gender, and knowledge monitoring skills. We also attempted to bridge the gap between cluster-based behavior pattern models and traditional scale-based models of individual differences by quantifying learner behavior on a latent data-driven scale. Our research shows that this data-driven model of individual differences performs significantly better than traditional models of individual differences in predicting important parameters of the learning process, such as performance and engagement.
more »
« less
Prediction of Student Academic Performance Using a Hybrid 2D CNN Model
Opportunities to apply data mining techniques to analyze educational data and improve learning are increasing. A multitude of data are being produced by institutional technology, e-learning resources, and online and virtual courses. These data could be used by educators to analyze and understand the learning behaviors of students. The obtained data are raw data that must be analyzed, requiring educational data mining to predict useful information about students, such as academic performance, among other things. Many researchers have used traditional machine learning to predict the academic performance of students, and very little research has been conducted on the architecture of convolutional neural networks (CNNs) in the context of the pedagogical domain. We built a hybrid 2D CNN model by combining two different 2D CNN models to predict academic performance. Our sample comprised 1D data, so we transformed it to 2D image data to test the performance of our hybrid model. We compared the performance of our model with that of different traditional baseline models. Our model outperformed baseline models, such as k-nearest neighbor, naïve Bayes, decision trees, and logistic regression, in terms of accuracy.
more »
« less
- Award ID(s):
- 2047625
- PAR ID:
- 10346237
- Date Published:
- Journal Name:
- Electronics
- Volume:
- 11
- Issue:
- 7
- ISSN:
- 2079-9292
- Page Range / eLocation ID:
- 1005
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The increase of instructional technology, e-learning resources, and online courses has created opportunities for data mining and learning analytics in the pedagogical domain. A large amount of data is obtained from this domain that can be analyzed and interpreted so that educators can understand students’ attention. In a classroom where students have their own computers in front of them, it is important for instructors to understand whether students are paying attention. We collected on- and off-task data to analyze the attention behaviors of students. Educational data mining extracts hidden information from educational records, and we are using it to classify student attention patterns. A hybrid method is used to combine various techniques like classifications, regressions, or feature extraction. In our work, we combined two feature extraction techniques: principal component analysis and linear discriminant analysis. Extracted features are used by a linear and kernel support vector machine (SVM) to classify attention patterns. Classification results are compared with linear and kernel SVM. Our hybrid method achieved the best results in terms of accuracy, precision, recall, F1, and kappa. Also, we correlated attention with learning. Here, learning corresponds to tests and a final course grade. For determining the correlation between grades and attention, Pearson’s correlation coefficient and p-value were used.more » « less
-
One of the essential problems, in educational data mining, is to predict students' performance on future learning materials, such as problems, assignments, and quizzes. Pioneer algorithms for predicting student performance mostly rely on two sources of information: students' past performance, and learning materials' domain knowledge model. The domain knowledge model, traditionally curated by domain experts maps learning materials to concepts, topics, or knowledge components that are presented in them. However, creating a domain model by manually labeling the learning material can be a difficult and time-consuming task. In this paper, we propose a tensor factorization model for student performance prediction that does not rely on a predefined domain model. Our proposed algorithm models student knowledge as a soft membership of latent concepts. It also represents the knowledge acquisition process with an added rank-based constraint in the tensor factorization objective function. Our experiments show that the proposed model outperforms state-of-the-art algorithms in predicting student performance in two real-world datasets, and is robust to hyper-parameters.more » « less
-
Abstract Combining high-speed video cameras and optical measurement techniques with digital sensors controlled by a data acquisition system can provide an effective means of exploring boiling process thermophysics and heat transfer mechanisms. Imaging can provide qualitative and quantitative information that complements data provided by temperature, pressure, and other sensors. This paper summarizes the results of an exploration of machine learning strategies to optimally combine and analyze boiling process images and digital sensor information from experiments. We specifically sought a convolution neural network (CNN) to analyze the vaporization of deposited water droplets on superheated surfaces that may have varying degrees of nucleate boiling effects. Two specialized CNN models were developed in this study that can simultaneously analyze both image and digital data. One of our CNN model designs (case B) was trained to take an image of the vaporization process and nonthermal digital data as input and predict thermal heat transfer performance. This model predicts performance remarkably well given its nonthermal inputs, matching independent heat flux test data to a root-mean-square percent error (RMSPE) of 10.3%. This model appears to learn how the variations of nucleate boiling, vapor recoil activity, and local dryout over the surface vary with surface temperature and/or heat flux from changes in boiling system images. We also describe a CNN model (case C) that takes digital nonthermal data, digital thermal data, and image information and provides a high-fidelity prediction of vaporization heat transfer performance. This model predicted performance very well—better than our conventional fit to data (case A) and on par with best fits to quality nucleate boiling heat transfer data in the literature. This type of trained model fit independent heat flux test data to an RMSPE of 5.8%. Our results indicate that training this type of model which predicts performance from input image information and digital operating condition thermal data makes the resulting predictive model more accurate and robust. The successful use of the hybrid CNN models described here suggests that there is a strong correlation between two-phase morphology variations and changes in heat transfer performance. The hybrid CNN modeling approach developed in this research appears to be a promising strategy for analyzing experimental data for physical systems that are best investigated experimentally with combined use of imaging and digital sensor instrumentation. Possible use of this type of modeling in other systems is also discussed.more » « less
-
Predicting the transpiration stream concentration factor (TSCF) and other concentration factors is essential in understanding the plant uptake of organic contaminants. Traditional mechanistic and numerical modeling methods often fail to reliably predict the TSCF. This study developed a hybrid deep model to predict TSCF by integrating convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. This hybrid CNN-LSTM model used eight physicochemical properties of organic contaminants to predict TSCF. The training procedure for this hybrid model was successful. The results indicated the training and test losses for predicting TSCF were both in the same order and close to zero. This study showed that the hybrid CNN-LSTM model can outperform mechanistic models and have higher performances compared to classical machine learning models. Feature importance analysis using extreme gradient boosting highlighted the role and importance of lipophilicity in predicting uptake and translocation of organic contaminants.more » « less
An official website of the United States government

