skip to main content


Title: Challenges and Opportunities for Data-Centric Peer Evaluation Tools for Teamwork
Peer evaluations are critical for assessing teams, but are susceptible to bias and other factors that undermine their reliability. At the same time, collaborative tools that teams commonly use to perform their work are increasingly capable of logging activity that can signal useful information about individual contributions and teamwork. To investigate current and potential uses for activity traces in peer evaluation tools, we interviewed (N=11) and surveyed (N=242) students and interviewed (N=10) instructors at a single university. We found that nearly all of the students surveyed considered specific contributions to the team outcomes when evaluating their teammates, but also reported relying on memory and subjective experiences to make the assessment. Instructors desired objective sources of data to address challenges with administering and interpreting peer evaluations, and have already begun incorporating activity traces from collaborative tools into their evaluations of teams. However, both students and instructors expressed concern about using activity traces due to the diverse ecosystem of tools and platforms used by teams and the limited view into the context of the contributions. Based on our findings, we contribute recommendations and a speculative design for a data-centric peer evaluation tool.  more » « less
Award ID(s):
2016908
PAR ID:
10350669
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
5
Issue:
CSCW2
ISSN:
2573-0142
Page Range / eLocation ID:
1 to 20
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Peer evaluations are a well-established tool for evaluating individual and team performance in collaborative contexts, but are susceptible to social and cognitive biases. Current peer evaluation tools have also yet to address the unique opportunities that online collaborative technologies provide for addressing these biases. In this work, we explore the potential of one such opportunity for peer evaluations: data traces automatically generated by collaborative tools, which we refer to as "activity traces". We conduct a between-subjects experiment with 101 students and MTurk workers, investigating the effects of reviewing activity traces on peer evaluations of team members in an online collaborative task. Our findings show that the usage of activity traces led participants to make more and greater revisions to their evaluations compared to a control condition. These revisions also increased the consistency and participants' perceived accuracy of the evaluations that they received. Our findings demonstrate the value of activity traces as an approach for performing more reliable and objective peer evaluations of teamwork. Based on our findings as well as qualitative analysis of free-form responses in our study, we also identify and discuss key considerations and design recommendations for incorporating activity traces into real-world peer evaluation systems. 
    more » « less
  2. Assessing team software development projects is notoriously difficult and typically based on subjective metrics. To help make assessments more rigorous, we conducted an empirical study to explore relationships between subjective metrics based on peer and instructor assessments, and objective metrics based on GitHub and chat data. We studied 23 undergraduate software teams (n= 117 students) from two undergraduate computing courses at two North American research universities. We collected data on teams’ (a) commits and issues from their GitHub code repositories, (b) chat messages from their Slack and Microsoft Teams channels, (c) peer evaluation ratings from the CATME peer evaluation system, and (d) individual assignment grades from the courses. We derived metrics from (a) and (b) to measure both individual team members’contributionsto the team, and theequalityof team members’ contributions. We then performed Pearson analyses to identify correlations among the metrics, peer evaluation ratings, and individual grades. We found significant positive correlations between team members’ GitHub contributions, chat contributions, and peer evaluation ratings. In addition, the equality of teams’ GitHub contributions was positively correlated with teams’ average peer evaluation ratings and negatively correlated with the variance in those ratings. However, no such positive correlations were detected between the equality of teams’ chat contributions and their peer evaluation ratings. Our study extends previous research results by providing evidence that (a) team members’ chat contributions, like their GitHub contributions, are positively correlated with their peer evaluation ratings; (b) team members’ chat contributions are positively correlated with their GitHub contributions; and (c) the equality of team’ GitHub contributions is positively correlated with their peer evaluation ratings. These results lend further support to the idea that combining objective and subjective metrics can make the assessment of team software projects more comprehensive and rigorous. 
    more » « less
  3. Abstract Background

    Teamwork has become a central element of engineering education. However, the race‐ and gender‐based marginalization prevalent in society is also prevalent in engineering student teams. These problematic dynamics limit learning opportunities, isolate historically marginalized students, and ultimately push students away from engineering, further reinforcing the demographic imbalances in the profession.

    Purpose

    While there are strategies to improve the experiences of marginalized students within teams, there are few tools for detecting marginalizing behaviors as they occur. The purpose of this work is to examine how peer evaluations collected as a normal part of an engineering course can be used as a window into team dynamics to reveal marginalization as it occurs.

    Method

    We used a semester of peer evaluation data from a large engineering course in which a team project is the central assignment and peer evaluation occurs four times during the course. We designed an algorithm to identify teams where marginalization may be occurring. We then performed qualitative analyses using a sociolinguistic analysis.

    Results

    Results show that the algorithm helps identify teams where marginalization occurs. Qualitative analyses of four illustrative cases demonstrated the stealth appearance and evolution of marginalization, providing strong evidence that hidden within language of peer evaluation are indicators of marginalization. Based on the wider dataset, we present a taxonomy (eight categories) of linguistic marginalization appearing in peer comments.

    Conclusion

    Both peer evaluation scores and the language used in peer evaluations can reveal team inequities and may serve as a near‐real‐time mechanism to interrupt marginalization within engineering teams.

     
    more » « less
  4. Team formation tools assume instructors should configure the criteria for creating teams, precluding students from participating in a process affecting their learning experience. We propose LIFT, a novel learner-centered workflow where students propose, vote for, and weigh the criteria used as inputs to the team formation algorithm. We conducted an experiment (N=289) comparing LIFT to the usual instructor-led process, and interviewed participants to evaluate their perceptions of LIFT and its outcomes. Learners proposed novel criteria not included in existing algorithmic tools, such as organizational style. They avoided criteria like gender and GPA that instructors frequently select, and preferred those promoting efficient collaboration. LIFT led to team outcomes comparable to those achieved by the instructor-led approach, and teams valued having control of the team formation process. We provide instructors and designers with a workflow and evidence supporting giving learners control of the algorithmic process used for grouping them into teams. 
    more » « less
  5. Jeff Nichols (Ed.)
    Instructors using algorithmic team formation tools must decide which criteria (e.g., skills, demographics, etc.) to use to group students into teams based on their teamwork goals, and have many possible sources from which to draw these configurations (e.g., the literature, other faculty, their students, etc.). However, tools offer considerable flexibility and selecting ineffective configurations can lead to teams that do not collaborate successfully. Due to such tools’ relative novelty, there is currently little knowledge of how instructors choose which of these sources to utilize, how they relate different criteria to their goals for the planned teamwork, or how they determine if their configuration or the generated teams are successful. To close this gap, we conducted a survey (N=77) and interview (N=21) study of instructors using CATME Team-Maker and other criteria-based processes to investigate instructors’ goals and decisions when using team formation tools. The results showed that instructors prioritized students learning to work with diverse teammates and performed “sanity checks” on their formation approach’s output to ensure that the generated teams would support this goal, especially focusing on criteria like gender and race. However, they sometimes struggled to relate their educational goals to specific settings in the tool. In general, they also did not solicit any input from students when configuring the tool, despite acknowledging that this information might be useful. By opening the “black box” of the algorithm to students, more learner-centered approaches to forming teams could therefore be a promising way to provide more support to instructors configuring algorithmic tools while at the same time supporting student agency and learning about teamwork. 
    more » « less