<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Rubric Development for Technical Reports in Chemical Engineering Unit Operations Laboratory Courses</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2023 June</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10437367</idno>
					<idno type="doi"></idno>
					<title level='j'>ASEE Annual Conference proceedings</title>
<idno>1524-4644</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>J. R. Brown</author><author>S. G. Wettstein</author><author>D. J. Hacker</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The purpose of this work was to test the inter-rater reliability (IRR) of a rubric used to grade technical reports in a senior-level chemical engineering laboratory course that has multiple instructors that grade deliverables. The rubric consisted of fifteen constructs that provided students detailed guidance on instructor expectations with respect to the report sections, formatting and technical writing aspects such as audience, context and purpose. Four student reports from previous years were scored using the rubric, and IRR was assessed using a two-way mixed, consistency, average-measures intraclass correlation (ICC) for each construct. Then, the instructors met as a group to discuss their scoring and reasoning. Multiple revisions were made to the rubric based on instructor feedback and constructs rated by ICC as poor. When fair or poor constructs were combined, the ICCs improved. In addition, the overall score construct continued to be rated as excellent, indicating that while different instructors may have variation at the individual construct level, they evaluate the overall quality of the report consistently. A key learning from this process was the importance of the instructor discussion around their reasoning for the scores and the importance of an ‘instructor orientation’ involving discussion and practice using the rubrics in the case of multiple instructors or a change in instructors. The developed rubric has the potential for broad applicability to engineering laboratory courses with technical writing components and could be adapted for alternative styles of technical writing genre.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Technical communication skills are highly valued in industry <ref type="bibr">[1,</ref><ref type="bibr">2]</ref> and tied to higher levels of career advancement <ref type="bibr">[3]</ref>. Regrettably, they are often lacking in engineering graduates. In chemical engineering programs without a technical writing course requirement, technical communication instruction is often incorporated into the curriculum through laboratory-based or capstone design courses <ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref>. In such courses, designing meaningful communication assignments and assessments is often relegated to engineering faculty, who may or may not have the requisite skills and knowledge to do so.</p><p>Student learning of communication skills is tied to quality of feedback <ref type="bibr">[4]</ref>; however, engineering faculty typically have not received any formal training on how to effectively give feedback on technical writing <ref type="bibr">[2,</ref><ref type="bibr">4]</ref>. Engineering faculty may tend to focus on spelling and grammar, while effective feedback is higher level and corresponds to issues with organization, the use of arguments, or support of evidence <ref type="bibr">[4,</ref><ref type="bibr">8]</ref>. Good feedback is essentially more of a coaching rather than correcting <ref type="bibr">[2]</ref>, and collaboration with communication experts for training is one approach to develop more efficient and purposeful grading rubrics. The goal of rubrics is to reflect the skills targeted in the assignment in order to effectively evaluate technical communication <ref type="bibr">[2]</ref>.</p><p>In previous work, the lead instructors for a two-part series of senior-level chemical engineering unit operations laboratory courses worked with the Writing Center on campus to develop assignments and activities targeted at specific technical communication skills <ref type="bibr">[9]</ref>. Through this collaboration, preliminary rubrics were developed to assess communication skills tied to learning outcomes. These rubrics were constructed based on reflecting what key information the students were to convey and course objectives. Additionally, considerable thought went into what would cause the students to not meet expectations and lose points for each of the constructs in the rubrics. Well-designed rubrics can help faculty set clear expectations for students, provide feedback and assess technical writing skills <ref type="bibr">[10]</ref>. Additionally, it is important for rubrics to be reliable across instructors in team taught courses or when instructors change.</p><p>This study aimed to evaluate the inter-rater reliability (IRR) of the technical report rubric developed in collaboration with the Writing Center across instructors teaching laboratory courses within the chemical engineering curriculum. We present the results of this evaluation as well as lessons learned from the scoring discussion. Additionally, we provide recommendations for incorporating an 'instructor orientation' prior to using rubrics to ensure effective use of the rubric across multiple instructors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head><p>The rubric underwent two rounds of validation. The first was in the 2022 spring semester and the second followed in the 2022 fall semester. For the spring rubric validation, six instructors graded four reports and used a rubric that consisted of 15 constructs ranging in value from 4 to 12 points each (Table <ref type="table">1</ref>) for a total of 100 points. Each construct on the rubric had five levels meant to correspond to "A", "B", "C", "D" and "F" level work. The description of "A" level from the rubric is listed in Table <ref type="table">1</ref>.</p><p>In the case of the fall rubric, three constructs were removed and two new constructs were added (Table <ref type="table">1</ref>), Safety ("A" level description: "Contains all key safety concerns, hazards, and how each issue will be handled considering high probability and most likely outcomes") and Overall Report ("A" level description: "Work earning this score is ready to be passed on to a real client. In every way, it meets audience needs. Document/presentation is formatted and organized to guide to major points. Clear and interesting visuals and prose contribute to professional-level quality. Overall report considerations include: Audience -Demonstrates a thorough understanding of the audience and purpose that is responsive to the assigned task and who the report was to be written for; Tech Writing -Well-ordered, segued, logical flow of material; accurate spelling, punctuation, grammar and sentence structure; clear and concise; correct amount of detail; own words used; good transitions between sentences, paragraphs, and sections; Format -Used 11 pt font minimum and an acceptable font; equation editor was used for equations and for variables in main text; no large white spaces; formatting was uniform throughout document; correct sig figs; equations inline; Figure and Table Formatting -Concise, but descriptive captions that include symbols if needed are present, correct location, and are referred to in the text; figures have axis labels, no title, no outer border, no legend, black font and lines, no extra decimal places; tables are in a simple layout format"). For the fall rubric, the constructs ranged in value from 4 to 24 points for a report total of 100 points. Four new reports were graded by four instructors.</p><p>Inter-rater reliability (IRR) was assessed using a two-way mixed, consistency, average-measures intraclass correlation (ICC) for each of the constructs on which students were rated. The ICC is a descriptive statistic used to assess the level of consistency in ratings from two or more raters on the same construct across participants. The ICC works well with multiple raters who have used ordinal data for their ratings. The ICC can range from 0, random agreement, to +1.0, perfect agreement, with higher ratings indicating higher consistency among raters. For example, a rating of 0.80 would indicate that 80% of the variance among raters was due to true consistency among raters, and 20% was due to unexplained variability or error. In general, ICCs less than 0.40 are poor, ICCs between 0.40 and 0.59 are fair, ICCs between 0.60 and 0.74 are good, and ICCs between 0.75 and 1.0 are excellent. Some caution must be used in interpreting the ICCs reported here mainly because only four reports were rated during each validation. However, the ICCs do provide a general sense of how consistent the raters were within a category. Report is less than 8 pages with at least 1 inch margins and 1.5 line spacing, which includes figures, tables, etc. (Page limit does not include pages of the appendix or references)</p><p>All reports were from a senior-level chemical engineering laboratory course, and it should be emphasized that the spring and fall validations were done with different sets of student reports. That is, in the spring, four reports were selected from a laboratory experiment on heat exchangers, and for the fall, four reports from a different heat transfer modelling experiment were selected. For the spring validation, the six graders consisted of five graders who had been instructors of a laboratory course previously while one was a new teaching assistant for the class and had not graded technical reports previously. For the fall validation, three of the graders remained the same as in the spring validation and had all previously graded technical reports while one was a new instructor for the laboratory course and had not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results and Discussion</head><p>In the spring of 2022, the six graders evaluated four student reports on a heat exchanger experiment done in a senior-level laboratory class using the 15 constructs listed in Table <ref type="table">1</ref>. The point values for each of the 15 constructs were added together, which resulted in an overall report score that can be seen in Figure <ref type="figure">1</ref>. Each colored line represents a different grader (see legend in table ) and each corner labeled 1-4 represents a different report. Using Figure <ref type="figure">1</ref>, discrepancies in the overall score of the report can be seen with most graders being consistent excluding the gray (high) and navy (low) graders for reports 2 and 4, and the teal (low) grader for report 3. When looking at the normalized point scores for each rubric construct for reports 2 and 4 (Figure <ref type="figure">2</ref>), the amount of variability between instructors can be seen. The navy grader from Figure <ref type="figure">1</ref> (shown by navy diamond in Figure <ref type="figure">2</ref>) had lower scores compared to the average across most rubric items.</p><p>The navy grader was a teaching assistant that had not graded reports previously and upon group discussion, it became apparent that it is important to discuss expectations and to review report grades with new graders to ensure consistency. Typically, as training for this laboratory course, an experienced instructor will grade two submitted student reports also graded by teaching assistants and other new instructors. They then meet to discuss the scoring and where there are discrepancies. If the scores are off by a substantial amount, more of the reports are scored in a similar fashion until agreement has been reached. Additionally, for the actual course, grades are   entered by the new grader into the grading software but not published until the experienced instructor reviews the scoring to ensure it is within error of other graders. For the spring validation, the navy grader did not have this training prior to scoring the reports, which emphasizes a need to incorporate a validation session about expectations prior to using a rubric. The rubric row numbers on the x-axis correspond to the 15 rubric constructs (see Table <ref type="table">1</ref>) and the symbol colors correspond to grader colors listed in Figure <ref type="figure">1</ref>.</p><p>As mentioned, following scoring of the reports, the instructors met to discuss the rubric and the scoring. For example, the teal grader identified statistical errors in report 3 that the other graders overlooked which was the reason for their lower grade. In addition, the gray grader had graded technical reports for the class in previous years but was not aware that fractional points could be given for a particular level in the new rubric. Consequently, they erred on the positive side when assigning point levels for a construct and gave the higher points, leading to the higher scores. Finally, as previously mentioned, the navy grader, a graduate-level teaching assistant, had not graded technical reports using a rubric before and had different expectations for the report &#8226; The graders liked how the constructs were descriptive and allowed them to make assessments as they read through the report. For example, the constructs move from cover page to abstract to introduction, etc. The rubric was efficient to grade with because of this. &#8226; The rubric was detailed and directed, which could benefit struggling students who need more guidance on what is expected. &#8226; A holistic aspect was missing.</p><p>&#8226; Point values did not reflect the corresponding letter grade that was listed on the rubric.</p><p>For example, for a rubric construct worth a maximum of 4 points, "A" work was labeled as 4 points while "B" work was labeled as 3 points and "C" work 2 points. However, if a student earned the points labeled as "C" work in all the constructs, they would receive a total score of 54 out of 100, which is an "F" grade rather than a "C" grade. Instructors felt the points labelled on the levels should be re-evaluated to more accurately reflect the letter grades. &#8226; There were some parts of the rubric that double-counted points so discussion revolved around how to modify the constructs to minimize this from occurring. For example, if the student did not have a good discussion section, they may have lost points in that construct as well as in the technical writing construct.</p><p>When ICCs were computed on the four reports graded by the six instructors, the results showed that overall, the raters provided fairly consistent ratings (Table <ref type="table">2</ref>). Seven of the constructs were rated as excellent, two as good, two as fair, and three as poor. In addition, the rating on Page Maximum was perfect, which is a "yes/no" category that gives full points if the report is under 8 pages or 0 points if it is over. In order to take into account feedback from the discussion and address the constructs with poor ICCs, revisions were made to the rubric. The Technical Writing, Format, and Figures/Tables constructs were rated as 'poor' by the ICCs (Table <ref type="table">2</ref>), and these were the constructs that graders had concerns about double-counting errors. These three constructs were combined into a single holistic construct adapted from Sheffield et al. <ref type="bibr">[11]</ref>. In addition, point values were adjusted to represent the letter grade earned and the resulting constructs and maximum point values are listed in Table <ref type="table">1</ref>.</p><p>In the fall of 2022, a new round of validation was conducted using the revised rubric. Four reports for a different lab experiment were scored by three of the previous rubric validators (shown by the same colors in Figures <ref type="figure">1</ref> and<ref type="figure">3</ref>) and a new grader (indicated by the dashed line in Figure <ref type="figure">3</ref>).</p><p>The scores showed less variability between three of the graders (Figure <ref type="figure">3</ref>), but the gray grader continued to score high, particularly on reports 1, 3, and 4. The teal grader also gave a high score for report 4 compared to pink and orange. For reports 1 and 3, the score of the gray grader was outside one standard deviation of the average report score, which could be a concern. However, the teal score for report 4 was within the standard deviation for all graders (91.2% &#177;2.3%; Table <ref type="table">3</ref>) and was therefore not considered an issue. The average standard deviation of scores decreased from 3.1 using the original rubric to 2.3 using the revised rubric.  When looking at discrepancies within each construct, for report 1 (Figure <ref type="figure">4a</ref>) the gray grader gave more points in five of the constructs and less in two of the constructs. The largest discrepancy was in construct 8, "Discussion", where they rated the paper an 8 compared to the 6.5, 5, and 6 given by the other graders. For report 3 (Figure <ref type="figure">4b</ref>), the gray grader had less of a drastic difference, but was on the high-end of scoring for the majority of constructs. Since there are 13 constructs within the rubric, even an extra half point per construct will result in a 6.5% difference in final grade score. In this case, the gray grader's score was approximately 7% higher than the average of the three other graders.</p><p>Figure <ref type="figure">4</ref>: Scores from each of the four graders for student reports a) report 1 and b) report 3 from the fall rubric validation. The rubric row numbers on the x-axis correspond to the 13 rubric categories as listed in Table <ref type="table">1</ref> and the symbol colors correspond to the grader colors in Figure <ref type="figure">3</ref>.</p><p>For the IRR on the fall data, overall, the raters exhibited good consistency in their ratings of the four students: 9 of the 13 constructs had ICC's that were excellent (Table <ref type="table">4</ref>), with Page Maximum showing near perfect consistency. The Overall Report construct was the combination of the three previously rated poor constructs (Tech Writing, Format, and Figure/Table Format) from the spring validation, and was rated as "excellent," which confirms that there was less variability when the report is graded more holistically or potentially with fewer constructs. Additionally, the ICCs of the total scores was excellent at 0.94. One construct, safety, had an ICC that was rated as good and 3 of the 13 constructs had poor consistency: Conclusions, Results, and Abstract. Interestingly, all of these were rated as excellent in the first rubric validation and had no changes made. The three constructs had variability in the ratings and, in going from one report to the next, the raters tended to go in opposite directions, with some raters going higher in their ratings while some going lower. In order to determine if removing a single grader would improve the ICC of the constructs, the ICC was calculated by systematically eliminating a grader. The eliminated grader was then returned to the analysis and another grader eliminated until all four graders had been systematically removed. The last four columns of Table <ref type="table">4</ref> list the results, with the ICCs that were improved upon eliminating that grader being highlighted in green. Removing the pink and teal graders tended to result in lower ICCs, which is not desirable. It should be noted that the pink and teal graders have taught a senior laboratory class over ten times each, assisted in developing the rubric with the Writing Center, and have a great deal of experience grading lab reports. Therefore, when they were removed from the ICC analyses, the overall consistency was lost and the ICC was diminished. Removing the gray and orange graders resulted in five and six increased constructs, respectively. The increased consistency found when the gray and orange graders were removed from the analysis indicates that these two graders may have had a debilitating effect on consistency. The gray grader had been an instructor on the senior lab course previously, while the orange grader was a new instructor. With the removal of the orange grader, the Methods construct went from good to excellent and after removing the gray grader, the Abstract construct went from poor to excellent. Although improved from an ICC of 0.00, the Results and Conclusions/recommendations constructs remained poor after removing the gray grader.</p><p>It is important to note that even with three of the constructs being rated as "poor," the ICC of the total scores was 0.94, which is excellent. This indicates that even though there may be less consistency within the individual constructs, the overall report scores are consistent. Based on discussion with graders, there are likely several reasons for this. During the spring rubric validation discussion, graders noted that they tended to ensure the final report score was close to their overall thoughts on the report. That is, if the report they score receives a C grade, but overall, the instructor felt it was low B work, they revise their scores within the rubric. Additionally, some instructors have higher expectations for certain constructs, such as results or safety, and lower expectations for the abstract or introduction or vice versa. This was due to aspects like the instructor's area of research expertise, their teaching experience, and their employment experience. One instructor with a statistics background due to their experience in industry placed strong importance on the student's accurate use and interpretation of statistics.</p><p>Another instructor with a theoretical research background was more concerned with the student's ability to relate their analysis and interpretation to theoretical concepts. With this perspective, lower ICC values in individual constructs is not unexpected and indeed it is known that a rubric with fewer constructs is generally going to have better inter-rater reliability <ref type="bibr">[12]</ref>. It seems that experienced graders have a good understanding of what constitutes quality technical communication and when grading in a holistic way are more likely to award consistent scores.</p><p>In order to test the hypothesis that experienced graders may be better calibrated in terms of the rubric grading, an additional five reports from different labs (a fuel cell and ionic diffusion experiment) were graded by the teal, pink, and gray grader. These five report scores were combined with the eight they had graded previously in the fall and spring validations and an IRR was completed (Table <ref type="table">5</ref>). Note that since the "Overall" construct was not in the spring rubric, an ICC was not determined for it on the complete dataset. It was found that five constructs, Objectives, Method, Appendices, Page Maximum, and Total Score, had excellent ratings, the cover page and references were good, and five constructs fell in the fair to poor category. It is worth noting that only the Results construct resulted in a poor rating, which is an improvement over previous validation. Looking more closely at the constructs that were rated fair to poor, there are clear areas of overlap between these constructs. For example, in the case of the Results construct and Discussion construct, many students combine these sections within the technical report. Additionally, some students will include the recommendations in the conclusions section and not the discussion (or vice versa). Similarly, the Intro/background/theory and the Objectives constructs have overlap as well. Students are encouraged, but not required, to have a separate Objectives section and sometimes include them in the introduction. Different instructors, as discussed previously, may value certain aspects over others or do "double counting" of error, leading to variation of scoring in these overlapping constructs. In order to test the hypothesis that overlapping constructs are impacting reliability, an IRR was done on the data in which the scores of the Discussion, Results, and Conclusions constructs were combined. This resulted in an ICC of 0.597, which has a rating of good for IRR (Figure <ref type="figure">5</ref>). By combining the Intro/background/theory and Objectives constructs, the ICC was 0.715, resulting in a rating of good (Figure <ref type="figure">5</ref>). If these changes were made, only one construct, Abstract, had a rating below good (Table <ref type="table">6</ref>).</p><p>Figure <ref type="figure">5</ref>: The overlapping constructs that were combined and the resulting ICCs. Training or instructor orientations however are necessary to align expectations and make sure that feedback is not contradictory or that instructors do not have vastly different quality standards. A key take-away is also to ensure that the overall scores are balanced with the understanding that there may be some inconsistency within the individual constructs. Since the rubric has been implemented in this course, the instructors noticed a decrease in student evaluation comments critical of disparity in grading. Historically, these comments appeared even when instructor average scores were similar and consistent with class averages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>In conclusion, while a greater number of constructs may result in lower inter-rater reliability for individual constructs, they provide students and instructors with more detailed descriptions of expectations, making them a valuable learning tool for students and a way for instructors to be more deliberate in their grading process. This study revealed that consolidating constructs that may in points being deducted across multiple categories for the same error into combined categories and a holistic category improved grader consistency. The same detailed descriptions of the expectations can still be included for the combined constructs in order to retain the rubric's value as a learning tool.</p><p>Additionally, providing instructors 'training' or 'orientation' on rubric interpretation, effective feedback, and rubric use before the course begins is recommended. This can ensure that all instructors are grading equivalently, and that overall scores have good reliability. This would consist of discussion on how to interpret each of the rubric constructs, discussion/instruction on how to give effective feedback and practice using the rubrics prior to the course. In laboratory courses with multiple instructors, it is also recommended to periodically check that score averages are consistent, and that feedback is effective and not contradictory. In addition, a sample report, i.e. a 'grading key' could be provided to instructors to compare to student work and graders could be asked to rank the constructs in order of importance in order to make grading biases visible. Curriculums with multiple courses with technical communication components could also benefit from instructors taking time to meet for alignment of standards.</p><p>Finally, we recommend preparing students to understand that technical communication is a complex process without definitive answers and that feedback may differ depending on who is grading. Students in laboratory courses with multiple instructors have the benefit of receiving feedback from multiple perspectives, allowing them to practice navigating shifting expectations in a relatively low stakes environment. Explicitly discussing this with students will help them understand that improving technical communication skills is a lifelong process and prepares them for navigating shifting expectations in their careers.</p></div></body>
		</text>
</TEI>
