Peer evaluations are critical for assessing teams, but are susceptible to bias and other factors that undermine their reliability. At the same time, collaborative tools that teams commonly use to perform their work are increasingly capable of logging activity that can signal useful information about individual contributions and teamwork. To investigate current and potential uses for activity traces in peer evaluation tools, we interviewed (N=11) and surveyed (N=242) students and interviewed (N=10) instructors at a single university. We found that nearly all of the students surveyed considered specific contributions to the team outcomes when evaluating their teammates, but also reported relying on memory and subjective experiences to make the assessment. Instructors desired objective sources of data to address challenges with administering and interpreting peer evaluations, and have already begun incorporating activity traces from collaborative tools into their evaluations of teams. However, both students and instructors expressed concern about using activity traces due to the diverse ecosystem of tools and platforms used by teams and the limited view into the context of the contributions. Based on our findings, we contribute recommendations and a speculative design for a data-centric peer evaluation tool.
more »
« less
Combining GitHub, Chat, and Peer Evaluation Data to Assess Individual Contributions to Team Software Development Projects
Assessing team software development projects is notoriously difficult and typically based on subjective metrics. To help make assessments more rigorous, we conducted an empirical study to explore relationships between subjective metrics based on peer and instructor assessments, and objective metrics based on GitHub and chat data. We studied 23 undergraduate software teams (n= 117 students) from two undergraduate computing courses at two North American research universities. We collected data on teams’ (a) commits and issues from their GitHub code repositories, (b) chat messages from their Slack and Microsoft Teams channels, (c) peer evaluation ratings from the CATME peer evaluation system, and (d) individual assignment grades from the courses. We derived metrics from (a) and (b) to measure both individual team members’contributionsto the team, and theequalityof team members’ contributions. We then performed Pearson analyses to identify correlations among the metrics, peer evaluation ratings, and individual grades. We found significant positive correlations between team members’ GitHub contributions, chat contributions, and peer evaluation ratings. In addition, the equality of teams’ GitHub contributions was positively correlated with teams’ average peer evaluation ratings and negatively correlated with the variance in those ratings. However, no such positive correlations were detected between the equality of teams’ chat contributions and their peer evaluation ratings. Our study extends previous research results by providing evidence that (a) team members’ chat contributions, like their GitHub contributions, are positively correlated with their peer evaluation ratings; (b) team members’ chat contributions are positively correlated with their GitHub contributions; and (c) the equality of team’ GitHub contributions is positively correlated with their peer evaluation ratings. These results lend further support to the idea that combining objective and subjective metrics can make the assessment of team software projects more comprehensive and rigorous.
more »
« less
- PAR ID:
- 10466917
- Publisher / Repository:
- ACM Digital Library
- Date Published:
- Journal Name:
- ACM Transactions on Computing Education
- Volume:
- 23
- Issue:
- 3
- ISSN:
- 1946-6226
- Page Range / eLocation ID:
- 1 to 23
- Subject(s) / Keyword(s):
- Slack, peer evaluation Covid-19 Software engineering education collaborative software development online chat communication Microsoft Teams CATME assessment GitHub
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
When professors assign group work, they assume that peer ratings are a valid source of information, but few studies have evaluated rater consensus in such ratings. We analyzed peer ratings from project teams in a second-year university course to examine consensus. Our first goal was to examine whether members of a team generally agreed on the competence of each team member. Our second goal was to test if a target’s personality traits predicted how well they were rated. Our third goal was to evaluate whether the self-rating of each student correlated with their peer rating. Data were analyzed from 130 students distributed across 21 teams (mean team size = 6.2). The sample was diverse in gender and ethnicity. Social relations model analyses showed that on average 32% of variance in peer-ratings was due to “consensus,” meaning some targets consistently received higher skill ratings than other targets did. Another 20% of the variance was due to “assimilation,” meaning some raters consistently gave higher ratings than other raters did. Thus, peer ratings reflected consensus (target effects), but also assimilation (rater effects) and noise. Among the six HEXACO traits that we examined, only conscientiousness predicted higher peer ratings, suggesting it may be beneficial to assign one highly conscientious person to every team. Lastly, there was an average correlation of.35 between target effects and self-ratings, indicating moderate self-other agreement, which suggests that students were only weakly biased in their self-ratings.more » « less
-
Background and Context: GitHub has been recently used in Software Engineering (SE) classes to facilitate collaboration in student team projects as well as help teachers to evaluate the contributions of their students more objectively. Objective: We explore the benefits and drawbacks of using GitHub as a means for team collaboration and performance evaluation in large SE classes. Method: Our research method takes the form of a case study conducted in a senior level SE class with 91 students. Our study also includes entry and exit surveys, an exit interview, and a qualitative analysis of students’ commit behavior. Findings: Different teams adapt GitHub to their workflow differently. Furthermore, despite the steep learning curve, using GitHub should not affect the quality of students’ submissions. However, using GitHub metrics as a proxy for evaluating team performance can be risky. Implications: We provide several recommendations for integrating Web-based configuration management tools in SE classes.more » « less
-
Abstract. The Circumplex Team Scan (CTS) assesses the degree to which a team’s interaction/communication norms reflect each segment (16th) of the interpersonal circle/circumplex. We developed and evaluated an abbreviated 16-item CTS-16 that uses one CTS item to measure each segment. Undergraduates ( n = 446) completing engineering course projects in 139 teams completed the CTS-16. CTS-16 items showed a good fit to confirmatory structural models (e.g., that expect greater positive covariation between items theoretically closer to the circumplex). Individuals’ ratings sufficiently reflected team-level norms to justify averaging team members’ ratings. However, individual items’ marginal reliabilities suggest using the CTS-16 to assess general circumplex-wide patterns rather than specific segments. CTS-16 ratings correlated with respondents’ and their teammates’ ratings of team climate (inclusion, justice, psychological safety). Teams with more extraverted (introverted) members were perceived as having more confident/engaged (timid/hesitant) cultures. Members predisposed to social alienation perceived their team’s culture as relatively disrespectful/unengaged, but their teammates did not corroborate those perceptions. The results overall support the validity and utility of the CTS-16 and of an interpersonal circumplex model of team culture more generally.more » « less
-
This innovative-practice work-in-progress paper explores student leadership development over multiple semesters in team-structured project-based courses. While student growth is expected in a single semester, the study asks if multiple semesters of participation lead to continued leadership growth, and if so, over how many semesters of participation growth continues. The study examined peer evaluation ratings in general leadership (coordination of teams’ work) and technical leadership (serving as a technical/content area leader) in a single semester of Georgia Tech’s Vertically Integrated Projects (VIP) Program, a multidisciplinary, multi-semester, team-structured, projectbased, and credit-bearing program in which student teams support faculty research. Analysis examined means and distributions on two peer evaluation questions (N = 1,073 and N = 1,047) by student academic rank and number of semesters of participation in the program. Findings indicate that within their teams, students’ leadership increased through the third semester, with students making their greatest leadership contributions in the third semester and beyond; and students of lower academic rank provided as much leadership (including technical leadership) as older students who had comparable experience on the team. Both the VIP model and the operationalization of leadership represent innovative practices, because the VIP model yields measurable gains in student leadership, and the measurement of student leadership is based on peer-evaluations instead of self-assessments. The educational model and research in this paper are aligned with the FIE values of encouraging mentorship and professional growth, appreciating multidisciplinary approaches, valuing new approaches, and generating new knowledge. The paper addresses limitations and next steps for the study.more » « less
An official website of the United States government

