skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Analyzing Students’ Written Arguments by Combining Qualitative and Computational Approaches
Education researchers have proposed that qualitative and emerging computational machine learning (ML) approaches can be productively combined to advance analyses of student-generated artifacts for evidence of engagement in scientific practices. We applied such a combined approach to written arguments excerpted from university students’ biology laboratory reports. These texts are lengthy and contain multiple different features that could be attended to in analysis. We present two outcomes of this combined analysis that illustrate possible affordances of combined workflows: 1) Comparing ML and human-generated scores allowed us to identify and reanalyze mismatches, increasing our overall confidence in the coding; and 2) ML-identified word clusters allowed us to interpret the overlap in meaning between the original coding scheme and the ML predicted scores, providing insight into which features of students’ writing can be used to differentiate rote from more meaningful engagement in scientific argumentation.  more » « less
Award ID(s):
1931978
PAR ID:
10348043
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 15th International Conference on Computer-Supported Learning (CSCL)
Page Range / eLocation ID:
163-170
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Involving students in scientific modeling practice is one of the most effective approaches to achieving the next generation science education learning goals. Given the complexity and multirepresentational features of scientific models, scoring student-developed models is time- and cost-intensive, remaining one of the most challenging assessment practices for science education. More importantly, teachers who rely on timely feedback to plan and adjust instruction are reluctant to use modeling tasks because they could not provide timely feedback to learners. This study utilized machine learn- ing (ML), the most advanced artificial intelligence (AI), to develop an approach to automatically score student- drawn models and their written descriptions of those models. We developed six modeling assessment tasks for middle school students that integrate disciplinary core ideas and crosscutting concepts with the modeling practice. For each task, we asked students to draw a model and write a description of that model, which gave students with diverse backgrounds an opportunity to represent their understanding in multiple ways. We then collected student responses to the six tasks and had human experts score a subset of those responses. We used the human-scored student responses to develop ML algorithmic models (AMs) and to train the computer. Validation using new data suggests that the machine-assigned scores achieved robust agreements with human consent scores. Qualitative analysis of student-drawn models further revealed five characteristics that might impact machine scoring accuracy: Alternative expression, confusing label, inconsistent size, inconsistent position, and redundant information. We argue that these five characteristics should be considered when developing machine-scorable modeling tasks. 
    more » « less
  2. There has been growing evidence that flipped teaching (FT) can increase student engagement. Traditional lecture-based teaching (TT) method was compared with FT and FT combined with retrieval practice (FTR) in a 400-level Exercise Physiology course over eight semesters. In the FT format, lecture content was assigned for students to prepare before class along with an online quiz. During class, the assigned content and quiz questions were reviewed, and a team-based learning (TBL) activity was conducted. Students found FT implementation three times a week (FT3) to be overwhelming, which led to reconfiguration of the FT design to minimize the quiz and TBL sessions to one per week. Subsequently, FT was combined with retrieval exercises (FTR), which involved recalling information, thus promoting retention. The students in the FTR format were given weekly quizzes in class, where no notes were allowed, which affected their quiz grade negatively compared with FT ( P < 0.0001). Again, no resources were permitted during FTR’s TBL sessions. When exam scores were compared with TT, student performance was significantly greater ( P < 0.001) with the FT and FTR methods, suggesting these methods are superior to TT. While both male and female students benefited from FT and FTR methods compared with TT ( P = 0.0008), male students benefited the most (( P = 0.0001). Similarly, when the exam scores were organized into upper and lower halves, both groups benefited from FT and FTR ( P < 0.0001) approaches. In conclusion, both FT and FTR methods benefit students more compared with TT, and male students are impacted the most. 
    more » « less
  3. Beiko, Robert G (Ed.)
    ABSTRACT Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke’sR2of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets. IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures. 
    more » « less
  4. This paper demonstrated how to apply Machine Learning (ML) techniques to analyze student interaction data collected in an online mathematics game. Using a data-driven approach, we examined 1) how different ML algorithms influenced the precision of middle-school students’ (N = 359) performance (i.e. posttest math knowledge scores) prediction and 2) what types of in-game features (i.e. student in-game behaviors, math anxiety, mathematical strategies) were associated with student math knowledge scores. The results indicated that the Random Forest algorithm showed the best performance (i.e. the accuracy of models, error measures) in predicting posttest math knowledge scores among the seven algorithms employed. Out of 37 features included in the model, the validity of the students’ first mathematical transformation was the most predictive of their posttest math knowledge scores. Implications for game learning analytics and supporting students’ algebraic learning are discussed based on the findings. 
    more » « less
  5. Palagi, Patricia M (Ed.)
    As genomics technologies advance, there is a growing demand for computational biologists trained for genomics analysis but instructors face significant hurdles in providing formal training in computer programming, statistics, and genomics to biology students. Fully online learners represent a significant and growing community that can contribute to meet this need, but they are frequently excluded from valuable research opportunities which mostly do not offer the flexibility they need. To address these opportunity gaps, we developed an asynchronous course-based undergraduate research experience (CURE) for computational genomics specifically for fully online biology students. We generated custom learning materials and leveraged remotely accessible computational tools to address 2 novel research questions over 2 iterations of the genomics CURE, one testing bioinformatics approaches and one mining cancer genomics data. Here, we present how the instructional team distributed analysis needed to address these questions between students over a 7.5-week CURE and provided concurrent training in biology and statistics, computer programming, and professional development. Scores from identical learning assessments administered before and after completion of each CURE showed significant learning gains across biology and coding course objectives. Open-response progress reports were submitted weekly and identified self-reported adaptive coping strategies for challenges encountered throughout the course. Progress reports identified problems that could be resolved through collaboration with instructors and peers via messaging platforms and virtual meetings. We implemented asynchronous communication using the Slack messaging platform and an asynchronous journal club where students discussed relevant publications using the Perusall social annotation platform. The online genomics CURE resulted in unanticipated positive outcomes, including students voluntarily discussing plans to continue research after the course. These outcomes underscore the effectiveness of this genomics CURE for scientific training, recruitment and student-mentor relationships, and student successes. Asynchronous genomics CUREs can contribute to a more skilled, diverse, and inclusive workforce for the advancement of biomedical science. 
    more » « less