Involving students in scientific modeling practice is one of the most effective approaches to achieving the next generation science education learning goals. Given the complexity and multirepresentational features of scientific models, scoring student-developed models is time- and cost-intensive, remaining one of the most challenging assessment practices for science education. More importantly, teachers who rely on timely feedback to plan and adjust instruction are reluctant to use modeling tasks because they could not provide timely feedback to learners. This study utilized machine learn- ing (ML), the most advanced artificial intelligence (AI), to develop an approach to automatically score student- drawn models and their written descriptions of those models. We developed six modeling assessment tasks for middle school students that integrate disciplinary core ideas and crosscutting concepts with the modeling practice. For each task, we asked students to draw a model and write a description of that model, which gave students with diverse backgrounds an opportunity to represent their understanding in multiple ways. We then collected student responses to the six tasks and had human experts score a subset of those responses. We used the human-scored student responses to develop ML algorithmic models (AMs) and to train the computer. Validation using new data suggests that the machine-assigned scores achieved robust agreements with human consent scores. Qualitative analysis of student-drawn models further revealed five characteristics that might impact machine scoring accuracy: Alternative expression, confusing label, inconsistent size, inconsistent position, and redundant information. We argue that these five characteristics should be considered when developing machine-scorable modeling tasks.
more »
« less
Analyzing Students’ Written Arguments by Combining Qualitative and Computational Approaches
Education researchers have proposed that qualitative and emerging computational machine learning (ML) approaches can be productively combined to advance analyses of student-generated artifacts for evidence of engagement in scientific practices. We applied such a combined approach to written arguments excerpted from university students’ biology laboratory reports. These texts are lengthy and contain multiple different features that could be attended to in analysis. We present two outcomes of this combined analysis that illustrate possible affordances of combined workflows: 1) Comparing ML and human-generated scores allowed us to identify and reanalyze mismatches, increasing our overall confidence in the coding; and 2) ML-identified word clusters allowed us to interpret the overlap in meaning between the original coding scheme and the ML predicted scores, providing insight into which features of students’ writing can be used to differentiate rote from more meaningful engagement in scientific argumentation.
more »
« less
- Award ID(s):
- 1931978
- PAR ID:
- 10348043
- Date Published:
- Journal Name:
- Proceedings of the 15th International Conference on Computer-Supported Learning (CSCL)
- Page Range / eLocation ID:
- 163-170
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
There has been growing evidence that flipped teaching (FT) can increase student engagement. Traditional lecture-based teaching (TT) method was compared with FT and FT combined with retrieval practice (FTR) in a 400-level Exercise Physiology course over eight semesters. In the FT format, lecture content was assigned for students to prepare before class along with an online quiz. During class, the assigned content and quiz questions were reviewed, and a team-based learning (TBL) activity was conducted. Students found FT implementation three times a week (FT3) to be overwhelming, which led to reconfiguration of the FT design to minimize the quiz and TBL sessions to one per week. Subsequently, FT was combined with retrieval exercises (FTR), which involved recalling information, thus promoting retention. The students in the FTR format were given weekly quizzes in class, where no notes were allowed, which affected their quiz grade negatively compared with FT ( P < 0.0001). Again, no resources were permitted during FTR’s TBL sessions. When exam scores were compared with TT, student performance was significantly greater ( P < 0.001) with the FT and FTR methods, suggesting these methods are superior to TT. While both male and female students benefited from FT and FTR methods compared with TT ( P = 0.0008), male students benefited the most (( P = 0.0001). Similarly, when the exam scores were organized into upper and lower halves, both groups benefited from FT and FTR ( P < 0.0001) approaches. In conclusion, both FT and FTR methods benefit students more compared with TT, and male students are impacted the most.more » « less
-
Beiko, Robert G (Ed.)ABSTRACT Inflammatory bowel disease (IBD) is characterized by complex etiology and a disrupted colonic ecosystem. We provide a framework for the analysis of multi-omic data, which we apply to study the gut ecosystem in IBD. Specifically, we train and validate models using data on the metagenome, metatranscriptome, virome, and metabolome from the Human Microbiome Project 2 IBD multi-omic database, with 1,785 repeated samples from 130 individuals (103 cases and 27 controls). After splitting the participants into training and testing groups, we used mixed-effects least absolute shrinkage and selection operator regression to select features for each omic. These features, with demographic covariates, were used to generate separate single-omic prediction scores. All four single-omic scores were then combined into a final regression to assess the relative importance of the individual omics and the predictive benefits when considered together. We identified several species, pathways, and metabolites known to be associated with IBD risk, and we explored the connections between data sets. Individually, metabolomic and viromic scores were more predictive than metagenomics or metatranscriptomics, and when all four scores were combined, we predicted disease diagnosis with a Nagelkerke’sR2of 0.46 and an area under the curve of 0.80 (95% confidence interval: 0.63, 0.98). Our work supports that some single-omic models for complex traits are more predictive than others, that incorporating multiple omic data sets may improve prediction, and that each omic data type provides a combination of unique and redundant information. This modeling framework can be extended to other complex traits and multi-omic data sets. IMPORTANCEComplex traits are characterized by many biological and environmental factors, such that multi-omic data sets are well-positioned to help us understand their underlying etiologies. We applied a prediction framework across multiple omics (metagenomics, metatranscriptomics, metabolomics, and viromics) from the gut ecosystem to predict inflammatory bowel disease (IBD) diagnosis. The predicted scores from our models highlighted key features and allowed us to compare the relative utility of each omic data set in single-omic versus multi-omic models. Our results emphasized the importance of metabolomics and viromics over metagenomics and metatranscriptomics for predicting IBD status. The greater predictive capability of metabolomics and viromics is likely because these omics serve as markers of lifestyle factors such as diet. This study provides a modeling framework for multi-omic data, and our results show the utility of combining multiple omic data types to disentangle complex disease etiologies and biological signatures.more » « less
-
Abstract Science educators are integrating more and more computational thinking (CT) activities into their curricula. Proponents of CT offer two motivations: familiarizing students with a realistic depiction of the computational nature of modern scientific practices and encouraging more students from underrepresented backgrounds to pursue careers in science, technology, engineering, and mathematics. However, some studies show that increasing exposure to computing may not necessarily translate to the hypothesized gains in participation by female students and students of color. Therefore, paying close attention to students' engagement in computationally intense science activities is important to finding more impactful ways to promote equitable science education. In this paper, we present an in‐depth analysis of the interactions among a small, racially diverse group of high school students during a chemistry unit with tightly integrated CT activities. We find a salient interaction between the students' engagement with the CT activities and their social identification with publicly recognizable categories such as “enjoys coding” or “finds computing boring.” We show that CT activities in science education can lead to numerous rich interactions that could, if leveraged correctly, allow educators to facilitate more inclusive science classrooms. However, we also show that such opportunities would be missed unless teachers are attentive to them. We discuss the implications of our findings on future work to integrate CT across science curricula and teacher education.more » « less
-
This paper demonstrated how to apply Machine Learning (ML) techniques to analyze student interaction data collected in an online mathematics game. Using a data-driven approach, we examined 1) how different ML algorithms influenced the precision of middle-school students’ (N = 359) performance (i.e. posttest math knowledge scores) prediction and 2) what types of in-game features (i.e. student in-game behaviors, math anxiety, mathematical strategies) were associated with student math knowledge scores. The results indicated that the Random Forest algorithm showed the best performance (i.e. the accuracy of models, error measures) in predicting posttest math knowledge scores among the seven algorithms employed. Out of 37 features included in the model, the validity of the students’ first mathematical transformation was the most predictive of their posttest math knowledge scores. Implications for game learning analytics and supporting students’ algebraic learning are discussed based on the findings.more » « less
An official website of the United States government

