skip to main content


Title: Student Teamwork on Programming Projects What can GitHub logs show us?
Teamwork, often mediated by version control systems such as Git and Apache Subversion (SVN), is central to professional programming. As a consequence, many colleges are incorporating both collaboration and online development environments into their curricula even in introductory courses. In this research, we collected GitHub logs from two programming projects in two offerings of a CS2 Java programming course for computer science majors. Students worked in pairs for both projects (one optional, the other mandatory) in each year. We used the students’ GitHub history to classify the student teams into three groups, collaborative, cooperative, or solo-submit, based on the division of labor. We then calculated different metrics for students’ teamwork including the total number and the average number of commits in different parts of the projects and used these metrics to predict the students’ teamwork style. Our findings show that we can identify the students’ teamwork style automatically from their submission logs. This work helps us to better understand novices’ habits while using version control systems. These habits can identify the harmful working styles among them and might lead to the development of automatic scaffolds for teamwork and peer support in the future.  more » « less
Award ID(s):
1821475
NSF-PAR ID:
10392590
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Rafferty, Anna N.; Whitehill, Jacob; Cavalli-Sforza, Violetta; Romero, Cristobal
Date Published:
Journal Name:
Proceedings of The 13th International Conference on Educational Data Mining (EDM 2020)
Page Range / eLocation ID:
409 - 416
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cybersecurity continues to be a critical aspect within every computing division, especially in the realm of operating system (OS) development. The OS resides at the lower layer above the hardware in the computing hierarchy. If the layers above the OS are well hardened, a security flaw in the OS will compromise the resources in those higher layers. Although several learning resources and courses are available for OS security, they are taught in advanced UG or graduate-level computer security classes. In this work, we develop cybersecurity educational modules that instructors can adoptin their OS courses to emphasize security in OS while teaching its concepts. The goal of this work is to engage students in learning security aspects in OS, while learning its concepts. It will give students a good understanding of different security concepts and how they are implemented in the OS. Towards this, we develop security educational modules for an OS course that will be available to the instructors for adoption in their courses. These modules are designed to be used in a UG-level OS course. To work on these modules, students should be familiar with C programming and OS concepts taught in the class. The modules are intended to be completed within the course of a semester. To achieve this goal, we organize them into three mini-projects witheach can be completed within a few weeks. We chose xv6 as the platform due to its popularity as an educational OS for developing the modules. To develop the modules, we referred to the recent version of a popular OS textbook for the security concepts. The topics discussed in it include authentication, authorization, cryptography, and distributed system security. We kept our educational modules mostly aligned with these topics except distributed system security. We also included a module for implementing a defense mechanism against buffer-overflow attacks, a famous software vulnerability. We created three mini-projects for these modules, each accompanied by proper documentation and a GitHub repository. Two versions are created for each project, one for a student’s assignment available in the repository and another as a solution version for instructors. The first project implements a user authentication system in xv6. Students will implement various specifications such as password structure with encryption and programs such as useradd, passwd, whoami, and login. The implementation guidelines are provided in the documentation, along with skeleton code. The authorization project implements the Unix-style access control system. In this project, students will modify and create various structures and functions within the xv6 kernel. The last project is to build a defense mechanism against buffer-overflow using Address Space Layout Randomization (ASLR). Students are expected to implement a random number generator and modify the executable file loader in xv6. The submission for each project is expected to demonstrate the module behavior comparable to relevant systems present in production grade OS, such as Linux. 
    more » « less
  2. There is growing evidence of the effectiveness of project-based learning (PBL) in preparing students to solve complex problems. In PBL implementations in engineering, students are treated as professional engineers facing projects centered around real-world problems, including the complexity and uncertainty that influence such problems. Not only does this help students to analyze and solve an authentic real-world task, promoting critical thinking, but also students learn from each other, learning valuable communication and teamwork skills. Faculty play an important part by assuming non-conventional roles (e.g., client, senior professional engineer, consultant) to help students throughout this instructional and learning approach. Typically in PBLs, students work on projects over extended periods of time that culminate in realistic products or presentations. In order to be successful, students need to learn how to frame a problem, identify stakeholders and their requirements, design and select concepts, test them, and so on. Two different implementations of PBL projects in a fluid mechanics course are presented in this paper. This required, junior-level course has been taught since 2014 by the same instructor. The first PBL project presented is a complete design of pumped pipeline systems for a hypothetical plant. In the second project, engineering students partnered with pre-service teachers to design and teach an elementary school lesson on fluid mechanics concepts. With the PBL implementations, it is expected that students: 1) engage in a deeper learning process where concepts can be reemphasized, and students can realize applicability; 2) develop and practice teamwork skills; 3) learn and practice how to communicate effectively to peers and to those from other fields; and 4) increase their confidence working on open-ended situations and problems. The goal of this paper is to present the experiences of the authors with both PBL implementations. It explains how the projects were scaffolded through the entire semester, including how the sequence of course content was modified, how team dynamics were monitored, the faculty roles, and the end products and presentations. Students' experiences are also presented. To evaluate and compare students’ learning and satisfaction with the team experience between the two PBL implementations, a shortened version of the NCEES FE exam and the Comprehensive Assessment of Team Member Effectiveness (CATME) survey were utilized. Students completed the FE exam during the first week and then again during the last week of the semester in order to assess students’ growth in fluid mechanics knowledge. The CATME survey was completed mid-semester to help faculty identify and address problems within team dynamics, and at the end of the semester to evaluate individual students’ teamwork performance. The results showed that no major differences were observed in terms of the learned fluid mechanics content, however, the data showed interesting preliminary observations regarding teamwork satisfaction. Through reflective assignments (e.g., short answer reflections, focus groups), student perceptions of the PBL implementations are discussed in the paper. Finally, some of the challenges and lessons learned from implementing both projects multiple times, as well as access to some of the PBL course materials and assignments will be provided. 
    more » « less
  3. Motivation: Software engineering for High Performace Computing (HPC) environments in general [1] and for big data in particular [5] faces a set of unique challenges including high complexity of middleware and of computing environments. Tools that make it easier for scientists to utilize HPC are, therefore, of paramount importance. We provide an experience report of using one of such highly effective middleware pbdR [9] that allow the scientist to use R programming language without, at least nominally, having to master many layers of HPC infrastructure, such as OpenMPI [4] and ScalaPACK [2]. Objective: to evaluate the extent to which middleware helps improve scientist productivity, we use pbdR to solve a real problem that we, as scientists, are investigating. Our big data comes from the commits on GitHub and other project hosting sites and we are trying to cluster developers based on the text of these commit messages. Context: We need to be able to identify developer for every commit and to identify commits for a single developer. Developer identifiers in the commits, such as login, email, and name are often spelled in multiple ways since that information may come from different version control systems (Git, Mercurial, SVN, ...) and may depend on which computer is used (what is specified in .git/config of the home folder). Method: We train Doc2Vec [7] model where existing credentials are used as a document identifier and then use the resulting 200-dimensional vectors for the 2.3M identifiers to cluster these identifiers so that each cluster represents a specific individual. The distance matrix occupies 32TB and, therefore, is a good target for HPC in general and pbdR in particular. pbdR allows data to be distributed over computing nodes and even has implemented K-means and mixture-model clustering techniques in the package pmclust. Results: We used strategic prototyping [3] to evaluate the capabilities of pbdR and discovered that a) the use of middleware required extensive understanding of its inner workings thus negating many of the expected benefits; b) the implemented algorithms were not suitable for the particular combination of n, p, and k (sample size, data dimension, and the number of clusters); c) the development environment based on batch jobs increases development time substantially. Conclusions: In addition to finding from Basili et al., we find that the quality of the implementation of HPC infrastructure and its development environment has a tremendous effect on development productivity. 
    more » « less
  4. Although there are tools to help developers understand the matching behaviors between a regular expression and a string, regular-expression related faults are still common. Learning developers’ behavior through the change history of regular expressions can identify common edit patterns, which can inform the creation of mutation and repair operators to assist with testing and fixing regular expressions. In this work, we explore how regular expressions evolve over time, focusing on the characteristics of regular expression edits, the syntactic and semantic difference of the edits, and the feature changes of edits. Our exploration uses two datasets. First, we look at GitHub projects that have a regular expression in their current version and look back through the commit logs to collect the regular expressions’ edit history. Second, we collect regular expressions composed by study participants during problem- solving tasks. Our results show that 1) 95% of the regular expressions from GitHub are not edited, 2) most edited regular expressions have a syntactic distance of 4-6 characters from their predecessors, 3) over 50% of the edits in GitHub tend to expand the scope of regular expression, and 4) the number of features used indicates the regular expression language usage increases over time. This work has implications for supporting regular expression repair and mutation to ensure test suite quality. 
    more » « less
  5. Two different implementations of PBL projects in a fluid mechanics course are presented in this paper. This required junior-level course has been taught since 2014 by the same instructor. The first PBL project presented is a complete design of pumped pipeline systems for a hypothetical plant. In the second project, engineering students partnered with pre-service teachers to design and teach an elementary school lesson on fluid mechanics concepts. The goal of this paper is to present the experiences of the authors with both PBL implementations. It explains how the projects were scaffolded through the entire semester, including how the sequence of course content was modified, how team dynamics were monitored, the faculty roles, and the end products and presentations. To evaluate and compare students’ learning and satisfaction with the team experience between the two PBL implementations, a shortened version of the NCEES FE exam and the Comprehensive Assessment of Team Member Effectiveness (CATME) survey were utilized. Students completed the FE exam during the first week and then again during the last week of the semester to assess students’ growth in fluid mechanics knowledge. The CATME survey was completed mid-semester to help faculty identify and address problems within team dynamics, and at the end of the semester to evaluate individual students’ teamwork performance. The results showed that the type of PBL approach used in the course did not have an impact on fluid mechanics content knowledge; however, the data suggests that the cross-disciplinary PBL model led to higher levels of teamwork satisfaction. Through reflective assignments, student perceptions of the PBL implementations are discussed in the paper. Finally, some of the PBL course materials and assignments are provided. 
    more » « less