skip to main content


Title: Developing a Method for Identifying Instances of Group Generative Interactions in Enterprise Social Media
Companies hold particular interest in group generative interactions - the conception of novel ideas and solutions through group exchanges. They are a root-cause of innovation and thus are important to companies’ survival. Enterprise Social Media (ESM) offer a unique opportunity to study generative group interactions, due to the transparent nature of activities on these platforms. In this research-in-progress paper, we conduct a preliminary analysis to develop a method that could identify the instances of ESM-based generative group interactions, where we focus on distinguishing generative versus non-generative group interactions. To do this, we used the text from all group interactions from an ESM platform of a multinational organization. We implemented machine learning models to learn and classify the text as generative or non-generative. As a result, we produced the top important term features from the best performing model. These features will help us understand the nature of discussions that occur in these interactions in future studies.  more » « less
Award ID(s):
1749018
NSF-PAR ID:
10202747
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Eighteenth Annual Pre-ICIS Workshop on HCI Research in MIS
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Teamwork is at the heart of most organizations today. Given increased pressures for organizations to be flexible, and adaptable, teams are organizing in novel ways, using novel technologies to be increasingly agile. One of these technologies that are increasingly used by distributed teams is Enterprise Social Media (ESM): web-based applications utilized by organizations for enabling communication and collaboration between distributed employees. ESM feature unique affordances that facilitate collaboration, including interactions that are generative: group conversations that entail the creation of innovative concepts and resolutions. These types of interactions are an important attraction for companies deciding to implement ESM. There is a unique opportunity offered for researchers in the field of HCI to study such generative interactions, as all contributions to an ESM platform are made visible, and therefore are available for analysis. Our goal in this preliminary study is to understand the nature of group generative interactions through their linguistic indicators. In this study, we utilize data from an ESM platform used by a multinational organization. Using a 1% sub sample of all logged group interactions, we apply machine-learning to classify text as generative or non-generative and extract the linguistic antecedents for the classified generative content. Our results show a promising method for investigating the linguistic indicators of generative content and provide a proof of concept for investigating group interactions in unobtrusive ways. Additionally, our results would also be able to provide an analytics tool for managers to measure the extent to which text-based tools, such as ESM, effectively nudge employees towards generative behaviors. 
    more » « less
  2. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath. 
    more » « less
  3. null (Ed.)
    Cherchiglia et al. Effects of ESM Use for Classroom Teams Proceedings of the Nineteenth Annual Pre-ICIS Workshop on HCI Research in MIS, Virtual Conference, December 12, 2020 1 An Exploration of the Effects of Enterprise Social Media Use for Classroom Teams Leticia Cherchiglia Michigan State University leticia@msu.edu Wietske Van Osch HEC Montreal & Michigan State University wietske.van-osch@hec.ca Yuyang Liang Michigan State University liangyuy@msu.edu Elisavet Averkiadi Michigan State University averkiad@msu.edu ABSTRACT This paper explores the adoption of Microsoft Teams, a group-based Enterprise Social Media (ESM) tool, in the context of a hybrid Information Technology Management undergraduate course from a large midwestern university. With the primary goal of providing insights into the use and design of tools for group-based educational settings, we constructed a model to reflect our expectations that core ESM affordances would enhance students’ perceptions of Microsoft Teams’ functionality and efficiency, which in turn would increase both students’ perceptions of group productivity and students’ actual usage of Microsoft Teams for communication purposes. In our model we used three core ESM affordances from Treem and Leonardi (2013), namely editability (i.e., information can be created and/or edited after creation, usually in a collaborative fashion), persistence (i.e., information is stored permanently), and visibility (i.e., information is visible to other users). Analysis of quantitative (surveys, server-side; N=62) and qualitative (interviews; N=7) data led to intriguing results. It seems that although students considered that editability, persistency, and visibility affordances within Microsoft Teams were convenient functions of this ESM, problems when working collaboratively (such as connectivity, formatting, and searching glitches) might have prevented considerations of this ESM as fast and user-friendly (i.e., efficient). Moreover, although perceived functionality and efficiency were positively connected to group productivity, hidden/non-intuitive communication features within this ESM might help explain the surprising negative connection between efficiency and usage of this ESM for the purpose of group communication. Another explanation is that, given the plethora of competing tools specifically designed to afford seamless/optimal team communication, students preferred to use more familiar tools or tools perceived as more efficient for group communication than Microsoft Teams, a finding consistent with findings in organizational settings (Van Osch, Steinfield, and Balogh, 2015). Beyond theoretical contributions related to the impact that ESM affordances have on users’ interaction perceptions, and the impact of users’ interaction perceptions on team and system outcomes, from a strategic and practical point of view, our findings revealed several challenges for the use of Microsoft Teams (and perhaps ESM at large) in educational settings: 1) As the demand for online education grows, collaborative tools such as Microsoft Teams should strive to provide seamless experiences for multiple-user access to files and messages; 2) Microsoft Teams should improve its visual design in order to increase ease of use, user familiarity, and intuitiveness; 3) Microsoft Teams appears to have a high-learning curve, partially related to the fact that some features are hidden or take extra steps/clicks to be accessed, thus undermining their use; 4) Team communication is a complex topic which should be further studied because, given the choice, students will fall upon familiar tools therefore undermining the full potential for team collaboration through the ESM. We expect that this paper can provide insights for educators faced with the choice for an ESM tool best-suited for group-based classroom settings, as well as designers interested in adapting ESMs to educational contexts, which is a promising avenue for market expansion. 
    more » « less
  4. Abstract

    This descriptive study focuses on using voice activity detection (VAD) algorithms to extract student speech data in order to better understand the collaboration of small group work and the impact of teaching assistant (TA) interventions in undergraduate engineering discussion sections. Audio data were recorded from individual students wearing head‐mounted noise‐cancelling microphones. Video data of each student group were manually coded for collaborative behaviours (eg, group task relatedness, group verbal interaction and group talk content) of students and TA–student interactions. The analysis includes information about the turn taking, overall speech duration patterns and amounts of overlapping speech observed both when TAs were intervening with groups and when they were not. We found that TAs very rarely provided explicit support regarding collaboration. Key speech metrics, such as amount of turn overlap and maximum turn duration, revealed important information about the nature of student small group discussions and TA interventions. TA interactions during small group collaboration are complex and require nuanced treatments when considering the design of supportive tools.

     
    more » « less
  5. Abstract Motivation

    Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability.

    Results

    Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks.

    Availability and implementation

    Training and generated data are made available at https://ieee-dataport.org/documents/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https://anonymous.4open.science/r/D-MolVAE-2799/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less