skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Paper or Silicon: Assessing Student Understanding in a Computer-based Testing Environment Using PrairieLearn
Computer-based testing is a powerful tool for scaling exams in large lecture classes. The decision to adopt computer-based testing is typically framed as a tradeoff in terms of time; time saved by auto-grading is reallocated as time spent developing problem pools, but with significant time savings. This paper seeks to examine the tradeoff in terms of accuracy in measuring student understanding. While some exams (e.g., multiple choice) are readily portable to a computer-based format, adequately porting other exam types (e.g., drawings like FBDs or worked problems) can be challenging. A key component of this challenge is to ask “What is the exam actually able to measure?” In this paper the authors will provide a quantitative and qualitative analysis of student understanding measurements via computer-based testing in a sophomore level Solid Mechanics course. At Michigan State University, Solid Mechanics is taught using the SMART methodology. SMART stands for Supported Mastery Assessment through Repeated Testing. In a typical semester, students are given 5 exams that test their understanding of the material. Each exam is graded using the SMART rubric which awards full points for the correct answer, some percentage for non-conceptual errors, and zero points for a solution that has a conceptual error. Every exam is divided into four sections; concept, simple, average, and challenge. Each exam has at least one retake opportunity, for a total of 10 written tests. In the current study, students representing 10% of the class took half of each exam in Prairie Learn, a computer-based auto-grading platform. During this exam, students were given instant feedback on submitted answers (correct or incorrect) and given an opportunity to identify their mistakes and resubmit their work. Students were provided with scratch paper to set up the problem and work out solutions. After the exam, the paper-based work was compared with the computer submitted answers. This paper examines what types of mistakes (conceptual and non-conceptual) students were able to correct when feedback was provided. The answer is dependent on the type and difficulty of the problem. The analysis also examines whether students taking the computer-based test performed at the same level as their peers who took the paper-based exams. Additionally, student feedback is provided and discussed.  more » « less
Award ID(s):
2013286
PAR ID:
10575571
Author(s) / Creator(s):
; ;
Publisher / Repository:
ASEE
Date Published:
Format(s):
Medium: X
Location:
https://peer.asee.org/paper-or-silicon-assessing-student-understanding-in-a-computer-based-testing-environment-using-prairielearn
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We explore how course policies affect students' studying and learning when a second-chance exam is offered. High-stakes, one-off exams remain a de facto standard for assessing student knowledge in STEM, despite compelling evidence that other assessment paradigms such as mastery learning can improve student learning. Unfortunately, mastery learning can be costly to implement. We explore the use of optional second-chance testing to sustainably reap the benefits of mastery-based learning at scale. Prior work has shown that course policies affect students' studying and learning but have not compared these effects within the same course context. We conducted a quasi-experimental study in a single course to compare the effect of two grading policies for second-chance exams and the effect of increasing the size of the range of dates for students taking asynchronous exams. The first grading policy, called 90-cap, allowed students to optionally take a second-chance exam that would fully replace their score on a first-chance exam except the second-chance exam would be capped at 90% credit. The second grading policy, called 90-10, combined students' first- and second-chance exam scores as a weighted average (90% max score + 10% min score). The 90-10 policy significantly increased the likelihood that marginally competent students would take the second-chance exam. Further, our data suggests that students learned more under the 90-10 policy, providing improved student learning outcomes at no cost to the instructor. Most students took exams on the last day an exam was available, regardless of how many days the exam was available. 
    more » « less
  2. Carvalho, Paulo F. (Ed.)
    Evidence-based teaching practices are associated with improved student academic performance. However, these practices encompass a wide range of activities and determining which type, intensity or duration of activity is effective at improving student exam performance has been elusive. To address this shortcoming, we used a previously validated classroom observation tool, Practical Observation Rubric to Assess Active Learning (PORTAAL) to measure the presence, intensity, and duration of evidence-based teaching practices in a retrospective study of upper and lower division biology courses. We determined the cognitive challenge of exams by categorizing all exam questions obtained from the courses using Bloom’s Taxonomy of Cognitive Domains. We used structural equation modeling to correlate the PORTAAL practices with exam performance while controlling for cognitive challenge of exams, students’ GPA at start of the term, and students’ demographic factors. Small group activities, randomly calling on students or groups to answer questions, explaining alternative answers, and total time students were thinking, working with others or answering questions had positive correlations with exam performance. On exams at higher Bloom’s levels, students explaining the reasoning underlying their answers, students working alone, and receiving positive feedback from the instructor also correlated with increased exam performance. Our study is the first to demonstrate a correlation between the intensity or duration of evidence-based PORTAAL practices and student exam performance while controlling for Bloom’s level of exams, as well as looking more specifically at which practices correlate with performance on exams at low and high Bloom’s levels. This level of detail will provide valuable insights for faculty as they prioritize changes to their teaching. As we found that multiple PORTAAL practices had a positive association with exam performance, it may be encouraging for instructors to realize that there are many ways to benefit students’ learning by incorporating these evidence-based teaching practices. 
    more » « less
  3. This project aims to enhance students’ learning in foundational engineering courses through oral exams based on the research conducted at the University of California San Diego. The adaptive dialogic nature of oral exams provides instructors an opportunity to better understand students’ thought processes, thus holding promise for improving both assessments of conceptual mastery and students’ learning attitudes and strategies. However, the issues of oral exam reliability, validity, and scalability have not been fully addressed. As with any assessment format, careful design is needed to maximize the benefits of oral exams to student learning and minimize the potential concerns. Compared to traditional written exams, oral exams have a unique design space, which involves a large range of parameters, including the type of oral assessment questions, grading criteria, how oral exams are administered, how questions are communicated and presented to the students, how feedback were provided, and other logistical perspectives such as weight of oral exam in overall course grade, frequency of oral assessment, etc. In order to address the scalability for high enrollment classes, key elements of the project are the involvement of the entire instructional team (instructors and teaching assistants). Thus the project will create a new training program to prepare faculty and teaching assistants to administer oral exams that include considerations of issues such as bias and students with disabilities. The purpose of this study is to create a framework to integrate oral exams in core undergraduate engineering courses, complementing existing assessment strategies by (1) creating a guideline to optimize the oral exam design parameters for the best students learning outcomes; and (2) Create a new training program to prepare faculty and teaching assistants to administer oral exams. The project will implement an iterative design strategy using an evidence-based approach of evaluation. The effectiveness of the oral exams will be evaluated by tracking student improvements on conceptual questions across consecutive oral exams in a single course, as well as across other courses. Since its start in January 2021, the project is well underway. In this poster, we will present a summary of the results from year 1: (1) exploration of the oral exam design parameters, and its impact in students’ engagement and perception of oral exams towards learning; (2) the effectiveness of the newly developed instructor and teaching assistants training programs (3) The development of the evaluation instruments to gauge the project success; (4) instructors and teaching assistants experience and perceptions. 
    more » « less
  4. null (Ed.)
    We describe the deployment of an imperfect NLP-based automatic short answer grading system on an exam in a large-enrollment introductory college course. We characterize this deployment as both high stakes (the questions were on an mid-term exam worth 10% of students’ final grade) and high transparency (the question was graded interactively during the computer-based exam and correct solutions were shown to students that could be compared to their answer). We study two techniques designed to mitigate the potential student dissatisfaction resulting from students incorrectly not granted credit by the imperfect AI grader. We find (1) that providing multiple attempts can eliminate first-attempt false negatives at the cost of additional false positives, and (2) that students not granted credit from the algorithm cannot reliably determine if their answer was mis-scored. 
    more » « less
  5. Kunal Talwar (Ed.)
    This paper studies grading algorithms for randomized exams. In a randomized exam, each student is asked a small number of random questions from a large question bank. The predominant grading rule is simple averaging, i.e., calculating grades by averaging scores on the questions each student is asked, which is fair ex-ante, over the randomized questions, but not fair ex-post, on the realized questions. The fair grading problem is to estimate the average grade of each student on the full question bank. The maximum-likelihood estimator for the Bradley-Terry-Luce model on the bipartite student-question graph is shown to be consistent with high probability when the number of questions asked to each student is at least the cubed-logarithm of the number of students. In an empirical study on exam data and in simulations, our algorithm based on the maximum-likelihood estimator significantly outperforms simple averaging in prediction accuracy and ex-post fairness even with a small class and exam size. 
    more » « less