skip to main content


Title: Strategies for Deploying Unreliable AI Graders in High-Transparency High-Stakes Exams
We describe the deployment of an imperfect NLP-based automatic short answer grading system on an exam in a large-enrollment introductory college course. We characterize this deployment as both high stakes (the questions were on an mid-term exam worth 10% of students’ final grade) and high transparency (the question was graded interactively during the computer-based exam and correct solutions were shown to students that could be compared to their answer). We study two techniques designed to mitigate the potential student dissatisfaction resulting from students incorrectly not granted credit by the imperfect AI grader. We find (1) that providing multiple attempts can eliminate first-attempt false negatives at the cost of additional false positives, and (2) that students not granted credit from the algorithm cannot reliably determine if their answer was mis-scored.  more » « less
Award ID(s):
1915257
NSF-PAR ID:
10200291
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Artificial Intelligence in Education
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In teaching mechanics, we use multiple representations of vectors to develop concepts and analysis techniques. These representations include pictorials, diagrams, symbols, numbers and narrative language. Through years of study as students, researchers, and teachers, we develop a fluency rooted in a deep conceptual understanding of what each representation communicates. Many novice learners, however, struggle to gain such understanding and rely on superficial mimicry of the problem solving procedures we demonstrate in examples. The term representational competence refers to the ability to interpret, switch between, and use multiple representations of a concept as appropriate for learning, communication and analysis. In engineering statics, an understanding of what each vector representation communicates and how to use different representations in problem solving is important to the development of both conceptual and procedural knowledge. Science education literature identifies representational competence as a marker of true conceptual understanding. This paper presents development work for a new assessment instrument designed to measure representational competence with vectors in an engineering mechanics context. We developed the assessment over two successive terms in statics courses at a community college, a medium-sized regional university, and a large state university. We started with twelve multiple-choice questions that survey the vector representations commonly employed in statics. Each question requires the student to interpret and/or use two or more different representations of vectors and requires no calculation beyond single digit integer arithmetic. Distractor answer choices include common student mistakes and misconceptions drawn from the literature and from our teaching experience. We piloted these twelve questions as a timed section of the first exam in fall 2018 statics courses at both Whatcom Community College (WCC) and Western Washington University. Analysis of students’ unprompted use of vector representations on the open-ended problem-solving section of the same exam provides evidence of the assessment’s validity as a measurement instrument for representational competence. We found a positive correlation between students’ accurate and effective use of representations and their score on the multiple choice test. We gathered additional validity evidence by reviewing student responses on an exam wrapper reflection. We used item difficulty and item discrimination scores (point-biserial correlation) to eliminate two questions and revised the remaining questions to improve clarity and discriminatory power. We administered the revised version in two contexts: (1) again as part of the first exam in the winter 2019 Statics course at WCC, and (2) as an extra credit opportunity for statics students at Utah State University. This paper includes sample questions from the assessment to illustrate the approach. The full assessment is available to interested instructors and researchers through an online tool. 
    more » « less
  2. In teaching mechanics, we use multiple representations of vectors to develop concepts and analysis techniques. These representations include pictorials, diagrams, symbols, numbers and narrative language. Through years of study as students, researchers, and teachers, we develop a fluency rooted in a deep conceptual understanding of what each representation communicates. Many novice learners, however, struggle to gain such understanding and rely on superficial mimicry of the problem solving procedures we demonstrate in examples. The term representational competence refers to the ability to interpret, switch between, and use multiple representations of a concept as appropriate for learning, communication and analysis. In engineering statics, an understanding of what each vector representation communicates and how to use different representations in problem solving is important to the development of both conceptual and procedural knowledge. Science education literature identifies representational competence as a marker of true conceptual understanding. This paper presents development work for a new assessment instrument designed to measure representational competence with vectors in an engineering mechanics context. We developed the assessment over two successive terms in statics courses at a community college, a medium-sized regional university, and a large state university. We started with twelve multiple-choice questions that survey the vector representations commonly employed in statics. Each question requires the student to interpret and/or use two or more different representations of vectors and requires no calculation beyond single digit integer arithmetic. Distractor answer choices include common student mistakes and misconceptions drawn from the literature and from our teaching experience. We piloted these twelve questions as a timed section of the first exam in fall 2018 statics courses at both Whatcom Community College (WCC) and Western Washington University. Analysis of students’ unprompted use of vector representations on the open-ended problem-solving section of the same exam provides evidence of the assessment’s validity as a measurement instrument for representational competence. We found a positive correlation between students’ accurate and effective use of representations and their score on the multiple choice test. We gathered additional validity evidence by reviewing student responses on an exam wrapper reflection. We used item difficulty and item discrimination scores (point-biserial correlation) to eliminate two questions and revised the remaining questions to improve clarity and discriminatory power. We administered the revised version in two contexts: (1) again as part of the first exam in the winter 2019 Statics course at WCC, and (2) as an extra credit opportunity for statics students at Utah State University. This paper includes sample questions from the assessment to illustrate the approach. The full assessment is available to interested instructors and researchers through an online tool. 
    more » « less
  3. Carvalho, Paulo F. (Ed.)
    Evidence-based teaching practices are associated with improved student academic performance. However, these practices encompass a wide range of activities and determining which type, intensity or duration of activity is effective at improving student exam performance has been elusive. To address this shortcoming, we used a previously validated classroom observation tool, Practical Observation Rubric to Assess Active Learning (PORTAAL) to measure the presence, intensity, and duration of evidence-based teaching practices in a retrospective study of upper and lower division biology courses. We determined the cognitive challenge of exams by categorizing all exam questions obtained from the courses using Bloom’s Taxonomy of Cognitive Domains. We used structural equation modeling to correlate the PORTAAL practices with exam performance while controlling for cognitive challenge of exams, students’ GPA at start of the term, and students’ demographic factors. Small group activities, randomly calling on students or groups to answer questions, explaining alternative answers, and total time students were thinking, working with others or answering questions had positive correlations with exam performance. On exams at higher Bloom’s levels, students explaining the reasoning underlying their answers, students working alone, and receiving positive feedback from the instructor also correlated with increased exam performance. Our study is the first to demonstrate a correlation between the intensity or duration of evidence-based PORTAAL practices and student exam performance while controlling for Bloom’s level of exams, as well as looking more specifically at which practices correlate with performance on exams at low and high Bloom’s levels. This level of detail will provide valuable insights for faculty as they prioritize changes to their teaching. As we found that multiple PORTAAL practices had a positive association with exam performance, it may be encouraging for instructors to realize that there are many ways to benefit students’ learning by incorporating these evidence-based teaching practices. 
    more » « less
  4. In March 2020, the global COVID-19 pandemic forced universities across the United States to immediately stop face-to-face activities and transition to virtual instruction. While this transition was not easy for anyone, the shift to online learning was especially difficult for STEM courses, particularly engineering, which has a strong practical/laboratory component. Additionally, underrepresented students (URMs) in engineering experienced a range of difficulties during this transition. The purpose of this paper is to highlight underrepresented engineering students’ experiences as a result of COVID-19. In particular, we aim to highlight stories shared by participants who indicated a desire to share their experience with their instructor. In order to better understand these experiences, research participants were asked to share a story, using the novel data collection platform SenseMaker, based on the following prompt: Imagine you are chatting with a friend or family member about the evolving COVID-19 crisis. Tell them about something you have experienced recently as an engineering student. Conducting a SenseMaker study involves four iterative steps: 1) Initiation is the process of designing signifiers, testing, and deploying the instrument; 2) Story Collection is the process of collecting data through narratives; 3) Sense-making is the process of exploring and analyzing patterns of the collection of narratives; and 4) Response is the process of amplifying positive stories and dampening negative stories to nudge the system to an adjacent possible (Van der Merwe et al. 2019). Unlike traditional surveys or other qualitative data collection methods, SenseMaker encourages participants to think more critically about the stories they share by inviting them to make sense of their story using a series of triads and dyads. After completing their narrative, participants were asked a series of triadic, dyadic, and sentiment-based multiple-choice questions (MCQ) relevant to their story. For one MCQ, in particular, participants were required to answer was “If you could do so without fear of judgment or retaliation, who would you share this story with?” and were given the following options: 1) Family 2) Instructor 3) Peers 4) Prefer not to answer 5) Other. A third of the participants indicated that they would share their story with their instructor. Therefore, we further explored this particular question. Additionally, this paper aims to highlight this subset of students whose primary motivation for their actions were based on Necessity. High-level qualitative findings from the data show that students valued Grit and Perseverance, recent experiences influenced their Sense of Purpose, and their decisions were majorly made based on Intuition. Chi-squared tests showed that there were not any significant differences between race and the desire to share with their instructor, however, there were significant differences when factoring in gender suggesting that gender has a large impact on the complexity of navigating school during this time. Lastly, ~50% of participants reported feeling negative or extremely negative about their experiences, ~30% reported feeling neutral, and ~20% reported feeling positive or extremely positive about their experiences. In the study, a total of 500 micro-narratives from underrepresented engineering students were collected from June – July 2020. Undergraduate and graduate students were recruited for participation through the researchers’ personal networks, social media, and through organizations like NSBE. Participants had the option to indicate who is able to read their stories 1) Everyone 2) Researchers Only, or 3) No one. This work presents qualitative stories of those who granted permission for everyone to read. 
    more » « less
  5. Engineering Projects in Community Service (EPICS) High utilizes human-centered design processes to teach high school students how to develop solutions to real-world problems within their communities. The goals of EPICS High are to utilize both principles from engineering and social entrepreneurship to engage high and middle school students as problem-solvers and spark interest in STEM careers. Recently, the Cisco corporate advised fund at the Silicon Valley Community Foundation, granted Arizona State University funds to expand EPICS High to underrepresented students and study the student outcomes from participation in this innovative program. In this exploratory study we combined qualitative methods—in person observations and informal interviews—along with pre and post surveys with high school students, to answer the questions: What skills do students gain and how does their mindset about engineering entrepreneurship develop through participation in EPICS High? Research took place in Title I schools (meaning they have a high number of students from low-income families) as well as non-Title I schools. Our preliminary results show that students made gains in the following areas: their attitudes toward engineering; ability to improve upon existing ideas; incorporating stakeholders; overcoming obstacles; social responsibility; and appreciation of multiple perspectives when solving engineering problems. While males have better baseline scores for most measures, females tend to have the most growth in many of these areas. We conclude that these initial measures show positive outcomes for students participating in EPICS High, and provide questions for further research. 
    more » « less