skip to main content


This content will become publicly available on September 30, 2024

Title: Combining GitHub, Chat, and Peer Evaluation Data to Assess Individual Contributions to Team Software Development Projects
Assessing team software development projects is notoriously difficult and typically based on subjective metrics. To help make assessments more rigorous, we conducted an empirical study to explore relationships between subjective metrics based on peer and instructor assessments, and objective metrics based on GitHub and chat data. We studied 23 undergraduate software teams (n= 117 students) from two undergraduate computing courses at two North American research universities. We collected data on teams’ (a) commits and issues from their GitHub code repositories, (b) chat messages from their Slack and Microsoft Teams channels, (c) peer evaluation ratings from the CATME peer evaluation system, and (d) individual assignment grades from the courses. We derived metrics from (a) and (b) to measure both individual team members’contributionsto the team, and theequalityof team members’ contributions. We then performed Pearson analyses to identify correlations among the metrics, peer evaluation ratings, and individual grades. We found significant positive correlations between team members’ GitHub contributions, chat contributions, and peer evaluation ratings. In addition, the equality of teams’ GitHub contributions was positively correlated with teams’ average peer evaluation ratings and negatively correlated with the variance in those ratings. However, no such positive correlations were detected between the equality of teams’ chat contributions and their peer evaluation ratings. Our study extends previous research results by providing evidence that (a) team members’ chat contributions, like their GitHub contributions, are positively correlated with their peer evaluation ratings; (b) team members’ chat contributions are positively correlated with their GitHub contributions; and (c) the equality of team’ GitHub contributions is positively correlated with their peer evaluation ratings. These results lend further support to the idea that combining objective and subjective metrics can make the assessment of team software projects more comprehensive and rigorous.  more » « less
Award ID(s):
1915196
NSF-PAR ID:
10466917
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM Digital Library
Date Published:
Journal Name:
ACM Transactions on Computing Education
Volume:
23
Issue:
3
ISSN:
1946-6226
Page Range / eLocation ID:
1 to 23
Subject(s) / Keyword(s):
["Slack, peer evaluation","Covid-19","Software engineering education","collaborative software development","online chat communication","Microsoft Teams","CATME","assessment","GitHub"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This innovative-practice work-in-progress paper explores student leadership development over multiple semesters in team-structured project-based courses. While student growth is expected in a single semester, the study asks if multiple semesters of participation lead to continued leadership growth, and if so, over how many semesters of participation growth continues. The study examined peer evaluation ratings in general leadership (coordination of teams’ work) and technical leadership (serving as a technical/content area leader) in a single semester of Georgia Tech’s Vertically Integrated Projects (VIP) Program, a multidisciplinary, multi-semester, team-structured, projectbased, and credit-bearing program in which student teams support faculty research. Analysis examined means and distributions on two peer evaluation questions (N = 1,073 and N = 1,047) by student academic rank and number of semesters of participation in the program. Findings indicate that within their teams, students’ leadership increased through the third semester, with students making their greatest leadership contributions in the third semester and beyond; and students of lower academic rank provided as much leadership (including technical leadership) as older students who had comparable experience on the team. Both the VIP model and the operationalization of leadership represent innovative practices, because the VIP model yields measurable gains in student leadership, and the measurement of student leadership is based on peer-evaluations instead of self-assessments. The educational model and research in this paper are aligned with the FIE values of encouraging mentorship and professional growth, appreciating multidisciplinary approaches, valuing new approaches, and generating new knowledge. The paper addresses limitations and next steps for the study. 
    more » « less
  2. A Diversity Index (DI) was developed to quantify eight minority categories (“Women”, “Non Male/Female”, “Afro-American”, “Hispanic”, “Asian/other ethnicity” “LGBQT”, “Disabilities”, and “First Generation”) in contrast to the standard “White American male”. This index is compared with a Minority Index (MI) based only on the ratio of “Non White American male” to the total of group members, which exhibits poor representation for diversity when teams are heavily conformed by minority representatives. In addition, the Diversity Index includes a tuning parameter to adjust for the impact of multiple diversities on the same individual. The Diversity Index has been calculated for four junior courses on Reactive Process Engineering and four senior capstone courses on Process Control and Process Design during the last three years (2019-21). Each course included at least two semester-long projects for 4-6 member teams. The Diversity Index was used to assess the performance of 69 self-selected teams, performing 37 technical projects and 101 outreach projects total. Assessments included relations with grades, peer-grading, team experience, and scope of activities. The analysis provides a quantitative approach to the impact of diversity on team performance. Reliability on some data is still difficult to validate. This study has relied mainly on the instructor interactions with students. In order to protect the students’ personal information, the proposed Diversity Index outputs a quantitative value without exposing the diversity source, and thus promoting more honest, secure and respectful participation. A new step is in progress to offer a “diversity rewarded” option to motivate students to select team members providing for larger inclusion and diversity. 
    more » « less
  3. null (Ed.)
    In Open Source Software (OSS) projects, pre-built tools dominate DevOps-oriented pipelines. In practice, a multitude of configuration management, cloud-based continuous integration, and automated deployment tools exist, and often more than one for each task. Tools are adopted (and given up) by OSS projects regularly. Prior work has shown that some tool adoptions are preceded by discussions, and that tool adoptions can result in benefits to the project. But important questions remain: how do teams decide to adopt a tool? What is discussed before the adoption and for how long? And, what team characteristics are determinant of the adoption? In this paper, we employ a large-scale empirical study in order to characterize the team discussions and to discern the teamlevel determinants of tool adoption into OSS projects' development pipelines. Guided by theories of team and individual motivations and dynamics, we perform exploratory data analyses, do deep-dive case studies, and develop regression models to learn the determinants of adoption and discussion length, and the direction of their effect on the adoption. From data of commit and comment traces of large-scale GitHub projects, our models find that prior exposure to a tool and member involvement are positively associated with the tool adoption, while longer discussions and the number of newer team members associate negatively. These results can provide guidance beyond the technical appropriateness for the timeliness of tool adoptions in diverse programmer teams. Our data and code is available at https://github.com/lkyin/tool_adoptions. 
    more » « less
  4. Background and Context: GitHub has been recently used in Software Engineering (SE) classes to facilitate collaboration in student team projects as well as help teachers to evaluate the contributions of their students more objectively. Objective: We explore the benefits and drawbacks of using GitHub as a means for team collaboration and performance evaluation in large SE classes. Method: Our research method takes the form of a case study conducted in a senior level SE class with 91 students. Our study also includes entry and exit surveys, an exit interview, and a qualitative analysis of students’ commit behavior. Findings: Different teams adapt GitHub to their workflow differently. Furthermore, despite the steep learning curve, using GitHub should not affect the quality of students’ submissions. However, using GitHub metrics as a proxy for evaluating team performance can be risky. Implications: We provide several recommendations for integrating Web-based configuration management tools in SE classes. 
    more » « less
  5. Objective

    We explore the relationships between objective communication patterns displayed during virtual team meetings and established, qualitative measures of team member effectiveness.

    Background

    A key component of teamwork is communication. Automated measures of objective communication patterns are becoming more feasible and offer the ability to measure and monitor communication in a scalable, consistent and continuous manner. However, their validity in reflecting meaningful measures of teamwork processes are not well established, especially in real-world settings.

    Method

    We studied real-world virtual student teams working on semester-long projects. We captured virtual team meetings using the Zoom video conferencing platform throughout the semester and periodic surveys comprising peer ratings of team member effectiveness. Leveraging audio transcripts, we examined relationships between objective measures of speaking time, silence gap duration and vocal turn-taking and peer ratings of team member effectiveness.

    Results

    Speaking time, speaking turn count, degree centrality and (marginally) speaking turn duration, but not silence gap duration, were positively related to individual-level team member effectiveness. Time in dyadic interactions and interaction count, but not interaction length, were positively related to dyad-level team member effectiveness.

    Conclusion

    Our study highlights the relevance of objective measures of speaking time and vocal turn-taking to team member effectiveness in virtual project-based teams, supporting the validity of these objective measures and their use in future research.

    Application

    Our approach offers a scalable, easy-to-use method for measuring communication patterns and team member effectiveness in virtual teams and opens the opportunity to study these patterns in a more continuous and dynamic manner.

     
    more » « less