skip to main content

Title: Collaborative Writing on GitHub: A Case Study of a Book Project
Social coding platforms such as GitHub are increasingly becoming a digital workspace for the production of non-software digital artifacts. Since GitHub offers unique features that are different from traditional ways of collaborative writing, it is interesting to investigate how GitHub features are used for writing. In this paper, we present the preliminary findings of a mixed-methods, case study of collaboration practices in a GitHub book project. We found that the use of GitHub depended on task interdependence and audience participation. GitHub's direct push method was used to coordinate both loosely- and tightly-coupled work, with the latter requiring collaborators to follow socially-accepted conventions. The pull-based method was adopted once the project was released to the public. While face-to-face and online meetings were prominent in the early phases, GitHub's issues became instrumental for communication and project management in later phases. Our findings have implications for the design of collaborative writing tools.
; ;
Award ID(s):
Publication Date:
Journal Name:
Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing
Page Range or eLocation-ID:
305 to 308
Sponsoring Org:
National Science Foundation
More Like this
  1. Theoretical and Empirical Modeling of Identity and Sentiments in Collaborative Groups (THEMIS.COG) was an interdisciplinary research collaboration of computer scientists and social scientists from the University of Waterloo (Canada), Potsdam University of Applied Sciences (Germany), and Dartmouth College (USA). This white paper summarizes the results of our research at the end of the grant term. Funded by the Trans-Atlantic Platform’s Digging Into Data initiative, the project aimed at theoretical and empirical modeling of identity and sentiments in collaborative groups. Understanding the social forces behind self-organized collaboration is important because technological and social innovations are increasingly generated through informal, distributed processesmore »of collaboration, rather than in formal organizational hierarchies or through market forces. Our work used a data-driven approach to explore the social psychological mechanisms that motivate such collaborations and determine their success or failure. We focused on the example of GitHub, the world’s current largest digital platform for open, collaborative software development. In contrast to most, purely inductive contemporary approaches leveraging computational techniques for social science, THEMIS.COG followed a deductive, theory-driven approach. We capitalized on affect control theory, a mathematically formalized theory of symbolic interaction originated by sociologist David R. Heise and further advanced in previous work by some of the THEMIS.COG collaborators, among others. Affect control theory states that people control their social behaviours by intuitively attempting to verify culturally shared feelings about identities, social roles, and behaviour settings. From this principle, implemented in computational simulation models, precise predictions about group dynamics can be derived. It was the goal of THEMIS.COG to adapt and apply this approach to study the GitHub collaboration ecosystem through a symbolic interactionist lens. The project contributed substantially to the novel endeavor of theory development in social science based on large amounts of naturally occurring digital data.« less
  2. In June 2020, at the annual conference of the American Society for Engineering Education (ASEE), which was held entirely online due to the impacts of COVID-19 (SARS-CoV-2), engineering education researchers and social justice scholars diagnosed the spread of two diseases in the United States: COVID-19 and racism. During a virtual workshop (T614A) titled, “Using Power, Privilege, and Intersectionality as Lenses to Understand our Experiences and Begin to Disrupt and Dismantle Oppressive Structures Within Academia,” Drs. Nadia Kellam, Vanessa Svihla, Donna Riley, Alice Pawley, Kelly Cross, Susannah Davis, and Jay Pembridge presented what we might call a pathological analysis of institutionalizedmore »racism and various other “isms.” In order to address the intersecting impacts of this double pandemic, they prescribed counter practices and protocols of anti-racism, and strategies against other oppressive “isms” in academia. At the beginning of the virtual workshop, the presenters were pleasantly surprised to see that they had around a hundred attendees. Did the online format of the ASEE conference afford broader exposure of the workshop? Did recent uprising of Black Lives Matter (BLM) protests across the country, and internationally, generate broader interest in their topic? Whatever the case, at a time when an in-person conference could not be convened without compromising public health safety, ASEE’s virtual conference platform, furnished by Pathable and supplemented by Zoom, made possible the broader social impacts of Dr. Svihla’s land acknowledgement of the unceded Indigenous lands from which she was presenting. Svihla attempted to go beyond a hollow gesture by including a hyperlink in her slides to a COVID-19 relief fund for the Navajo Nation, and encouraged attendees to make a donation as they copied and pasted the link in the Zoom Chat. Dr. Cross’s statement that you are either a racist or an anti-racist at this point also promised broader social impacts in the context of the virtual workshop. You could feel the intensity of the BLM social movements and the broader political climate in the tone of the presenters’ voices. The mobilizing masses on the streets resonated with a cutting-edge of social justice research and education at the ASEE virtual conference. COVID-19 has both exacerbated and made more obvious the unevenness and inequities in our educational practices, processes, and infrastructures. This paper is an extension of a broader collaborative research project that accounts for how an exceptional group of engineering educators have taken this opportunity to socially broaden their curricula to include not just public health matters, but also contemporary political and social movements. Engineering educators for change and advocates for social justice quickly recognized the affordances of diverse forms of digital technologies, and the possibilities of broadening their impact through educational practices and infrastructures of inclusion, openness, and accessibility. They are makers of what Gary Downy calls “scalable scholarship”—projects in support of marginalized epistemologies that can be scaled up from ideation to practice in ways that unsettle and displace the dominant epistemological paradigm of engineering education.[1] This paper is a work in progress. It marks the beginning of a much lengthier project that documents the key positionality of engineering educators for change, and how they are socially situated in places where they can connect social movements with industrial transitions, and participate in the production of “undone sciences” that address “a structured absence that emerges from relations of inequality.”[2] In this paper, we offer a brief glimpse into ethnographic data we collected virtually through interviews, participant observation, and digital archiving from March 2019 to August 2019, during the initial impacts of COVID-19 in the United States. The collaborative research that undergirds this paper is ongoing, and what is presented here is a rough and early articulation of ideas and research findings that have begun to emerge through our engagement with engineering educators for change. This paper begins by introducing an image concept that will guide our analysis of how, in this historical moment, forms of social and racial justice are finding their way into the practices of engineering educators through slight changes in pedagogical techniques in response the debilitating impacts of the pandemic. Conceptually, we are interested in how small and subtle changes in learning conditions can socially broaden the impact of engineering educators for change. After introducing the image concept that guides this work, we will briefly discuss methodology and offer background information about the project. Next, we discuss literature that revolves around the question, what is engineering education for? Finally, we introduce the notion of situating engineering education and give readers a brief glimpse into our ethnographic data. The conclusion will indicate future directions for writing, research, and intervention.« less
  3. Background: Researcher-practitioner partnerships (RPPs) have gained increasing prominence within education, since they are crucial for identifying partners’ problems of practice and seeking solutions for improving district (or school) problems. The CS Pathways RPP project brought together researchers and practitioners, including middle school teachers and administrators from three urban school districts, to build teachers’ capacity to implement an inclusive computer science and digital literacy (CSDL) curriculum for all students in their middle schools. Objective: This study explored the teachers’ self-efficacy development in teaching a middle school CSDL curriculum under the project’s RPP framework. The ultimate goal was to gain insights intomore »how the project’s RPP framework and its professional development (PD) program supported teachers’ self-efficacy development, in particular its challenges and success of the partnership. Method: Teacher participants attended the first-year PD program and were surveyed and/or interviewed about their self-efficacy in teaching CSDL curriculum, spanning topics ranging from digital literacy skills to app creation ability and curriculum implementation. Both survey and interview data were collected and analyzed using mixed methods 1) to examine the reach of the RPP PD program in terms of teachers’ self-efficacy; 2) to produce insightful understandings of the PD program impact on the project’s goal of building teachers’ self-efficacy. Results and Discussion: We reported the teachers’ self-efficacy profiles based on the survey data. A post-survey indicated that a majority of the teachers have high self-efficacy in teaching the CSDL curriculum addressed by the RPP PD program. Our analysis identified five critical benefits the project’s RPP PD program provided, namely collaborative efforts on resource and infrastructure building, content and pedagogical knowledge growth, collaboration and communication, and building teacher identity. All five features have shown direct impacts on teachers' self-efficacy. The study also reported teachers’ perceptions on the challenges they faced and potential areas for improvements. These findings indicate some important features of an effective PD program, informing the primary design of an RPP CS PD program.« less
  4. Obeid, Iyad Selesnick (Ed.)
    Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy [1]. As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency [2]. However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours [3]. Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor [4] and Persyst 14 [5]. In this abstract, we describemore »our efforts to transform a high-performance offline seizure detection system [3] into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing [3]. The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC) [6] features from each EEG channel separately. We use the hypotheses generated by the P1 model and create additional features that carry information about the detected events and their confidence. The P2 model uses these additional features and the LFCC features to learn the temporal and spatial aspects of the EEG signals using a hybrid convolutional neural network (CNN) and LSTM model. Finally, Phase 3 aggregates the results from both P1 and P2 before applying a final postprocessing step. The online system implements Phase 1 by taking advantage of the Linux piping mechanism, multithreading techniques, and multi-core processors. To convert Phase 1 into an online system, we divide the system into five major modules: signal preprocessor, feature extractor, event decoder, postprocessor, and visualizer. The system reads 0.1-second frames from each EEG channel and sends them to the feature extractor and the visualizer. The feature extractor generates LFCC features in real time from the streaming EEG signal. Next, the system computes seizure and background probabilities using a channel-based LSTM model and applies a postprocessor to aggregate the detected events across channels. The system then displays the EEG signal and the decisions simultaneously using a visualization module. The online system uses C++, Python, TensorFlow, and PyQtGraph in its implementation. The online system accepts streamed EEG data sampled at 250 Hz as input. The system begins processing the EEG signal by applying a TCP montage [8]. Depending on the type of the montage, the EEG signal can have either 22 or 20 channels. To enable the online operation, we send 0.1-second (25 samples) length frames from each channel of the streamed EEG signal to the feature extractor and the visualizer. Feature extraction is performed sequentially on each channel. The signal preprocessor writes the sample frames into two streams to facilitate these modules. In the first stream, the feature extractor receives the signals using stdin. In parallel, as a second stream, the visualizer shares a user-defined file with the signal preprocessor. This user-defined file holds raw signal information as a buffer for the visualizer. The signal preprocessor writes into the file while the visualizer reads from it. Reading and writing into the same file poses a challenge. The visualizer can start reading while the signal preprocessor is writing into it. To resolve this issue, we utilize a file locking mechanism in the signal preprocessor and visualizer. Each of the processes temporarily locks the file, performs its operation, releases the lock, and tries to obtain the lock after a waiting period. The file locking mechanism ensures that only one process can access the file by prohibiting other processes from reading or writing while one process is modifying the file [9]. The feature extractor uses circular buffers to save 0.3 seconds or 75 samples from each channel for extracting 0.2-second or 50-sample long center-aligned windows. The module generates 8 absolute LFCC features where the zeroth cepstral coefficient is replaced by a temporal domain energy term. For extracting the rest of the features, three pipelines are used. The differential energy feature is calculated in a 0.9-second absolute feature window with a frame size of 0.1 seconds. The difference between the maximum and minimum temporal energy terms is calculated in this range. Then, the first derivative or the delta features are calculated using another 0.9-second window. Finally, the second derivative or delta-delta features are calculated using a 0.3-second window [6]. The differential energy for the delta-delta features is not included. In total, we extract 26 features from the raw sample windows which add 1.1 seconds of delay to the system. We used the Temple University Hospital Seizure Database (TUSZ) v1.2.1 for developing the online system [10]. The statistics for this dataset are shown in Table 1. A channel-based LSTM model was trained using the features derived from the train set using the online feature extractor module. A window-based normalization technique was applied to those features. In the offline model, we scale features by normalizing using the maximum absolute value of a channel [11] before applying a sliding window approach. Since the online system has access to a limited amount of data, we normalize based on the observed window. The model uses the feature vectors with a frame size of 1 second and a window size of 7 seconds. We evaluated the model using the offline P1 postprocessor to determine the efficacy of the delayed features and the window-based normalization technique. As shown by the results of experiments 1 and 4 in Table 2, these changes give us a comparable performance to the offline model. The online event decoder module utilizes this trained model for computing probabilities for the seizure and background classes. These posteriors are then postprocessed to remove spurious detections. The online postprocessor receives and saves 8 seconds of class posteriors in a buffer for further processing. It applies multiple heuristic filters (e.g., probability threshold) to make an overall decision by combining events across the channels. These filters evaluate the average confidence, the duration of a seizure, and the channels where the seizures were observed. The postprocessor delivers the label and confidence to the visualizer. The visualizer starts to display the signal as soon as it gets access to the signal file, as shown in Figure 1 using the “Signal File” and “Visualizer” blocks. Once the visualizer receives the label and confidence for the latest epoch from the postprocessor, it overlays the decision and color codes that epoch. The visualizer uses red for seizure with the label SEIZ and green for the background class with the label BCKG. Once the streaming finishes, the system saves three files: a signal file in which the sample frames are saved in the order they were streamed, a time segmented event (TSE) file with the overall decisions and confidences, and a hypotheses (HYP) file that saves the label and confidence for each epoch. The user can plot the signal and decisions using the signal and HYP files with only the visualizer by enabling appropriate options. For comparing the performance of different stages of development, we used the test set of TUSZ v1.2.1 database. It contains 1015 EEG records of varying duration. The any-overlap performance [12] of the overall system shown in Figure 2 is 40.29% sensitivity with 5.77 FAs per 24 hours. For comparison, the previous state-of-the-art model developed on this database performed at 30.71% sensitivity with 6.77 FAs per 24 hours [3]. The individual performances of the deep learning phases are as follows: Phase 1’s (P1) performance is 39.46% sensitivity and 11.62 FAs per 24 hours, and Phase 2 detects seizures with 41.16% sensitivity and 11.69 FAs per 24 hours. We trained an LSTM model with the delayed features and the window-based normalization technique for developing the online system. Using the offline decoder and postprocessor, the model performed at 36.23% sensitivity with 9.52 FAs per 24 hours. The trained model was then evaluated with the online modules. The current performance of the overall online system is 45.80% sensitivity with 28.14 FAs per 24 hours. Table 2 summarizes the performances of these systems. The performance of the online system deviates from the offline P1 model because the online postprocessor fails to combine the events as the seizure probability fluctuates during an event. The modules in the online system add a total of 11.1 seconds of delay for processing each second of the data, as shown in Figure 3. In practice, we also count the time for loading the model and starting the visualizer block. When we consider these facts, the system consumes 15 seconds to display the first hypothesis. The system detects seizure onsets with an average latency of 15 seconds. Implementing an automatic seizure detection model in real time is not trivial. We used a variety of techniques such as the file locking mechanism, multithreading, circular buffers, real-time event decoding, and signal-decision plotting to realize the system. A video demonstrating the system is available at: The final conference submission will include a more detailed analysis of the online performance of each module. ACKNOWLEDGMENTS Research reported in this publication was most recently supported by the National Science Foundation Partnership for Innovation award number IIP-1827565 and the Pennsylvania Commonwealth Universal Research Enhancement Program (PA CURE). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the official views of any of these organizations. REFERENCES [1] A. Craik, Y. He, and J. L. Contreras-Vidal, “Deep learning for electroencephalogram (EEG) classification tasks: a review,” J. Neural Eng., vol. 16, no. 3, p. 031001, 2019. [2] A. C. Bridi, T. Q. Louro, and R. C. L. Da Silva, “Clinical Alarms in intensive care: implications of alarm fatigue for the safety of patients,” Rev. Lat. Am. Enfermagem, vol. 22, no. 6, p. 1034, 2014. [3] M. Golmohammadi, V. Shah, I. Obeid, and J. Picone, “Deep Learning Approaches for Automatic Seizure Detection from Scalp Electroencephalograms,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York, New York, USA: Springer, 2020, pp. 233–274. [4] “CFM Olympic Brainz Monitor.” [Online]. Available: [Accessed: 17-Jul-2020]. [5] M. L. Scheuer, S. B. Wilson, A. Antony, G. Ghearing, A. Urban, and A. I. Bagic, “Seizure Detection: Interreader Agreement and Detection Algorithm Assessments Using a Large Dataset,” J. Clin. Neurophysiol., 2020. [6] A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone, “Improved EEG Event Classification Using Differential Energy,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium, 2015, pp. 1–4. [7] V. Shah, C. Campbell, I. Obeid, and J. Picone, “Improved Spatio-Temporal Modeling in Automated Seizure Detection using Channel-Dependent Posteriors,” Neurocomputing, 2021. [8] W. Tatum, A. Husain, S. Benbadis, and P. Kaplan, Handbook of EEG Interpretation. New York City, New York, USA: Demos Medical Publishing, 2007. [9] D. P. Bovet and C. Marco, Understanding the Linux Kernel, 3rd ed. O’Reilly Media, Inc., 2005. [10] V. Shah et al., “The Temple University Hospital Seizure Detection Corpus,” Front. Neuroinform., vol. 12, pp. 1–6, 2018. [11] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [12] J. Gotman, D. Flanagan, J. Zhang, and B. Rosenblatt, “Automatic seizure detection in the newborn: Methods and initial evaluation,” Electroencephalogr. Clin. Neurophysiol., vol. 103, no. 3, pp. 356–362, 1997.« less
  5. With the emergence of social coding platforms, collaboration has become a key and dynamic aspect to the success of software projects. In such platforms, developers have to collaborate and deal with issues of collaboration in open-source software development. Although collaboration is challenging, collaborative development produces better software systems than any developer could produce alone. Several approaches have investigated collaboration challenges, for instance, by proposing or evaluating models and tools to support collaborative work. Despite the undeniable importance of the existing efforts in this direction, there are few works on collaboration from perspectives of developers. In this work, we aim tomore »investigate the perceptions of open-source software developers on collaborations, such as motivations, techniques, and tools to support global, productive, and collaborative development. Following an ad hoc literature review, an exploratory interview study with 12 open-source software developers from GitHub, our novel approach for this problem also relies on an extensive survey with 121 developers to confirm or refute the interview results. We found different collaborative contributions, such as managing change requests. Besides, we observed that most collaborators prefer to collaborate with the core team instead of their peers. We also found that most collaboration happens in software development (60%) and maintenance (47%) tasks. Furthermore, despite personal preferences to work independently, developers still consider collaborating with others in specific task categories, for instance, software development. Finally, developers also expressed the importance of the social coding platforms, such as GitHub, to support maintainers, and contributors in making decisions and developing tasks of the projects. Therefore, these findings may help project leaders optimize the collaborations among developers and reduce entry barriers. Moreover, these findings may support the project collaborators in understanding the collaboration process and engaging others in the project.« less