skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automated Assessment of Student Self-explanation During Source Code Comprehension
This paper presents a novel method to automatically assess self-explanations generated by students during code comprehension activities. The self-explanations are produced in the context of an online learning environment that asks students to freely explain Java code examples line-by-line. We explored a number of models consisting of textual features in conjunction with machine learning algorithms such as Support Vector Regression (SVR), Decision Trees (DT), and Random Forests (RF). Support Vector Regression (SVR) performed best having a correlation score with human judgments of 0.7088. The best model used a combination of features such as semantic measures obtained using a Sentence BERT pre-trained model and from previously developed semantic algorithms used in a state-of-the-art intelligent tutoring system.  more » « less
Award ID(s):
1822816 1822752
PAR ID:
10344832
Author(s) / Creator(s):
Editor(s):
Roman Bartak and Fazel Keshtkar and Michael Franklin
Date Published:
Journal Name:
The International FLAIRS Conference Proceedings
Volume:
35
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A multimodal dataset is presented for the cognitive fatigue assessment of physiological minimally invasive sensory data of Electrocardiography (ECG) and Electrodermal Activity (EDA) and self-reporting scores of cognitive fatigue during HRI. Data were collected from 16 non-STEM participants, up to three visits each, during which the subjects interacted with a robot to prepare a meal and get ready for work. For some of the visits, a well-established cognitive test was used to induce cognitive fatigue. The developed cognitive fatigue assessment framework filtered noise from the raw signals, extracted relevant features, and applied machine learning regression algorithms, such as Support Vector Regression (SVR), Gradient Boosting Machine (GBM), and Random Forest Regressor (RFR) for estimating the Cognitive Fatigue (CF) level. 
    more » « less
  2. Le, Khanh N.Q. (Ed.)
    In current clinical settings, typically pain is measured by a patient’s self-reported information. This subjective pain assessment results in suboptimal treatment plans, over-prescription of opioids, and drug-seeking behavior among patients. In the present study, we explored automatic objective pain intensity estimation machine learning models using inputs from physiological sensors. This study uses BioVid Heat Pain Dataset. We extracted features from Electrodermal Activity (EDA), Electrocardiogram (ECG), Electromyogram (EMG) signals collected from study participants subjected to heat pain. We built different machine learning models, including Linear Regression, Support Vector Regression (SVR), Neural Networks and Extreme Gradient Boosting for continuous value pain intensity estimation. Then we identified the physiological sensor, feature set and machine learning model that give the best predictive performance. We found that EDA is the most information-rich sensor for continuous pain intensity prediction. A set of only 3 features from EDA signals using SVR model gave an average performance of 0.93 mean absolute error (MAE) and 1.16 root means square error (RMSE) for the subject-independent model and of 0.92 MAE and 1.13 RMSE for subject-dependent. The MAE achieved with signal-feature-model combination is less than 1 unit on 0 to 4 continues pain scale, which is smaller than the MAE achieved by the methods reported in the literature. These results demonstrate that it is possible to estimate pain intensity of a patient using a computationally inexpensive machine learning model with 3 statistical features from EDA signal which can be collected from a wrist biosensor. This method paves a way to developing a wearable pain measurement device. 
    more » « less
  3. The accurate forecast of algal blooms can provide helpful information for water resource management. However, the complex relationship between environmental variables and blooms makes the forecast challenging. In this study, we build a pipeline incorporating four commonly used machine learning models, Support Vector Regression (SVR), Random Forest Regression (RFR), Wavelet Analysis (WA)-Back Propagation Neural Network (BPNN) and WA-Long Short-Term Memory (LSTM), to predict chlorophyll-a in coastal waters. Two areas with distinct environmental features, the Neuse River Estuary, NC, USA—where machine learning models are applied for short-term algal bloom forecast at single stations for the first time—and the Scripps Pier, CA, USA, are selected. Applying the pipeline, we can easily switch from the NRE forecast to the Scripps Pier forecast with minimum model tuning. The pipeline successfully predicts the occurrence of algal blooms in both regions, with more robustness using WA-LSTM and WA-BPNN than SVR and RFR. The pipeline allows us to find the best results by trying different numbers of neuron hidden layers. The pipeline is easily adaptable to other coastal areas. Experience with the two study regions demonstrated that enrichment of the dataset by including dominant physical processes is necessary to improve chlorophyll prediction when applying it to other aquatic systems. 
    more » « less
  4. Protein content variation in milk can impact the quality and consistency of dairy products, necessitating access to in-line real time monitoring. Here, we present a chemometric approach for the qualitative and quantitative monitoring of β-lactoglobulin and α-lactalbumin, using mid-infrared spectroscopy (MIR). In this study, we employed Hotelling T2 and Q-residual for outlier detection, automated preprocessing using nippy, conducted wavenumber selection with genetic algorithms, and evaluated four chemometric models, including partial least squares, support vector regression (SVR), ridge, and logistic regression to accurately predict the concentrations of β-lactoglobulin and α-lactalbumin in milk. For the quantitative analysis of these two whey proteins, SVR performed the best to interpret protein concentration from 197 MIR spectra originating from 42 Cornell University samples of preserved pasteurized modified milk. The R2 values obtained for β-lactoglobulin and α-lactalbumin using leave one out cross-validation (LOOCV) are 92.8% and 92.7%, respectively, which is the highest correlation reported to date. Our approach introduced a combination of preprocessing automation, genetic algorithm-based wavenumber selection, and used Optuna to optimize the framework for tuning hyperparameters of the chemometric models, resulting in the best chemometric analysis of MIR data to quantitate β-lactoglobulin and α-lactalbumin to date. 
    more » « less
  5. Manufacturers have faced an increasing need for the development of predictive models that predict mechanical failures and the remaining useful life (RUL) of manufacturing systems or components. Classical model-based or physics-based prognostics often require an in-depth physical understanding of the system of interest to develop closed-form mathematical models. However, prior knowledge of system behavior is not always available, especially for complex manufacturing systems and processes. To complement model-based prognostics, data-driven methods have been increasingly applied to machinery prognostics and maintenance management, transforming legacy manufacturing systems into smart manufacturing systems with artificial intelligence. While previous research has demonstrated the effectiveness of data-driven methods, most of these prognostic methods are based on classical machine learning techniques, such as artificial neural networks (ANNs) and support vector regression (SVR). With the rapid advancement in artificial intelligence, various machine learning algorithms have been developed and widely applied in many engineering fields. The objective of this research is to introduce a random forests (RFs)-based prognostic method for tool wear prediction as well as compare the performance of RFs with feed-forward back propagation (FFBP) ANNs and SVR. Specifically, the performance of FFBP ANNs, SVR, and RFs are compared using an experimental data collected from 315 milling tests. Experimental results have shown that RFs can generate more accurate predictions than FFBP ANNs with a single hidden layer and SVR. 
    more » « less