skip to main content


Title: Analysis Without Data: Teaching Students to Tackle the VAST Challenge
The VAST Challenges have been shown to be an effective tool in visual analytics education, encouraging student learning while enforcing good visualization design and development practices. However, research has observed that students often struggle at identifying a good "starting point" when tackling the VAST Challenge. Consequently, students who could not identify a good starting point failed at finding the correct solution to the challenge. In this paper, we propose a preliminary guideline for helping students approach the VAST Challenge and identify initial analysis directions. We recruited two students to analyze the VAST 2017 Challenge using a hypothesis-driven approach, where they were required to pre-register their hypotheses prior to inspecting and analyzing the full dataset. From their experience, we developed a prescriptive guideline for other students to tackle VAST Challenges. In a preliminary study, we found that the students were able to use the guideline to generate well-formed hypotheses that could lead them towards solving the challenge. Additionally, the students reported that with the guideline, they felt like they had concrete steps that they could follow, thereby alleviating the burden of identifying a good starting point in their analysis process.  more » « less
Award ID(s):
1939945
NSF-PAR ID:
10394301
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Workshop on Visualization Guidelines in Research, Design, and Education
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Several studies have shown that underrepresented minorities (URM) (African Americans, Native Americans, Pacific Islanders, and Latinos) are more likely to drop out of engineering doctorate programs before graduation compared to international and majority students. In addition, transitioning into the doctoral programs without having a good understanding of what it entails can make the PhD experience difficult. To address this issue, a team of researchers from four US universities developed a project called “the Rising Doctoral Institute (RDI)’’. One of the research goals of this project is to better understand how factors in the academic system interact dynamically to influence (i.e., support or hinder) incoming URM students’ access, success, persistence, and retention in engineering doctoral programs. To accomplish this goal, we will use a comprehensive analysis approach known as System Dynamic Model (SDM). This work-In-Progress article represents the starting point to develop this model and its overall goal is to conduct a systematic literature review to identify the factors in the academic system that impact URM students’ experience in doctoral engineering programs. We followed a process suggested by Okoli and Schabram [1] which consists of four major steps. The first step is presenting the purpose of the literature review, protocol, and training. The second step consists of selecting the literature and practical screen. The next step is the quality appraisal and data extraction. Finally, the analysis of findings and writing the review. By identifying the factors and the relation between them, we could help ensure a more diverse and equitable STEM education. Although some external factors can affect students’ access, success, persistence and retention in engineering PhD programs, this study is limited to exploring the factors and interactions within the academic system that can potentially impact the successful experience of underrepresented minorities in PhD programs in engineering such as Advisor-Advisee Relationship, Student’s Experience, Academic Support and Faculty-Students Interaction 
    more » « less
  2. Obeid, I. (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples [1], as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” [2]. The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition [3], image recognition [4] and text processing [5] are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do not have access to such data resources must rely on techniques in which existing models can be adapted to new datasets [6]. A preliminary version of this breast corpus release was tested in a pilot study using a baseline machine learning system, ResNet18 [7], that leverages several open-source Python tools. The pilot corpus was divided into three sets: train, development, and evaluation. Portions of these slides were manually annotated [1] using the nine labels in Table 1 [8] to identify five to ten examples of pathological features on each slide. Not every pathological feature is annotated, meaning excluded areas can include focuses particular to these labels that are not used for training. A summary of the number of patches within each label is given in Table 2. To maintain a balanced training set, 1,000 patches of each label were used to train the machine learning model. Throughout all sets, only annotated patches were involved in model development. The performance of this model in identifying all the patches in the evaluation set can be seen in the confusion matrix of classification accuracy in Table 3. The highest performing labels were background, 97% correct identification, and artifact, 76% correct identification. A correlation exists between labels with more than 6,000 development patches and accurate performance on the evaluation set. Additionally, these results indicated a need to further refine the annotation of invasive ductal carcinoma (“indc”), inflammation (“infl”), nonneoplastic features (“nneo”), normal (“norm”) and suspicious (“susp”). This pilot experiment motivated changes to the corpus that will be discussed in detail in this poster presentation. To increase the accuracy of the machine learning model, we modified how we addressed underperforming labels. One common source of error arose with how non-background labels were converted into patches. Large areas of background within other labels were isolated within a patch resulting in connective tissue misrepresenting a non-background label. In response, the annotation overlay margins were revised to exclude benign connective tissue in non-background labels. Corresponding patient reports and supporting immunohistochemical stains further guided annotation reviews. The microscopic diagnoses given by the primary pathologist in these reports detail the pathological findings within each tissue site, but not within each specific slide. The microscopic diagnoses informed revisions specifically targeting annotated regions classified as cancerous, ensuring that the labels “indc” and “dcis” were used only in situations where a micropathologist diagnosed it as such. Further differentiation of cancerous and precancerous labels, as well as the location of their focus on a slide, could be accomplished with supplemental immunohistochemically (IHC) stained slides. When distinguishing whether a focus is a nonneoplastic feature versus a cancerous growth, pathologists employ antigen targeting stains to the tissue in question to confirm the diagnosis. For example, a nonneoplastic feature of usual ductal hyperplasia will display diffuse staining for cytokeratin 5 (CK5) and no diffuse staining for estrogen receptor (ER), while a cancerous growth of ductal carcinoma in situ will have negative or focally positive staining for CK5 and diffuse staining for ER [9]. Many tissue samples contain cancerous and non-cancerous features with morphological overlaps that cause variability between annotators. The informative fields IHC slides provide could play an integral role in machine model pathology diagnostics. Following the revisions made on all the annotations, a second experiment was run using ResNet18. Compared to the pilot study, an increase of model prediction accuracy was seen for the labels indc, infl, nneo, norm, and null. This increase is correlated with an increase in annotated area and annotation accuracy. Model performance in identifying the suspicious label decreased by 25% due to the decrease of 57% in the total annotated area described by this label. A summary of the model performance is given in Table 4, which shows the new prediction accuracy and the absolute change in error rate compared to Table 3. The breast tissue subset we are developing includes 3,505 annotated breast pathology slides from 296 patients. The average size of a scanned SVS file is 363 MB. The annotations are stored in an XML format. A CSV version of the annotation file is also available which provides a flat, or simple, annotation that is easy for machine learning researchers to access and interface to their systems. Each patient is identified by an anonymized medical reference number. Within each patient’s directory, one or more sessions are identified, also anonymized to the first of the month in which the sample was taken. These sessions are broken into groupings of tissue taken on that date (in this case, breast tissue). A deidentified patient report stored as a flat text file is also available. Within these slides there are a total of 16,971 total annotated regions with an average of 4.84 annotations per slide. Among those annotations, 8,035 are non-cancerous (normal, background, null, and artifact,) 6,222 are carcinogenic signs (inflammation, nonneoplastic and suspicious,) and 2,714 are cancerous labels (ductal carcinoma in situ and invasive ductal carcinoma in situ.) The individual patients are split up into three sets: train, development, and evaluation. Of the 74 cancerous patients, 20 were allotted for both the development and evaluation sets, while the remain 34 were allotted for train. The remaining 222 patients were split up to preserve the overall distribution of labels within the corpus. This was done in hope of creating control sets for comparable studies. Overall, the development and evaluation sets each have 80 patients, while the training set has 136 patients. In a related component of this project, slides from the Fox Chase Cancer Center (FCCC) Biosample Repository (https://www.foxchase.org/research/facilities/genetic-research-facilities/biosample-repository -facility) are being digitized in addition to slides provided by Temple University Hospital. This data includes 18 different types of tissue including approximately 38.5% urinary tissue and 16.5% gynecological tissue. These slides and the metadata provided with them are already anonymized and include diagnoses in a spreadsheet with sample and patient ID. We plan to release over 13,000 unannotated slides from the FCCC Corpus simultaneously with v1.0.0 of TUDP. Details of this release will also be discussed in this poster. Few digitally annotated databases of pathology samples like TUDP exist due to the extensive data collection and processing required. The breast corpus subset should be released by November 2021. By December 2021 we should also release the unannotated FCCC data. We are currently annotating urinary tract data as well. We expect to release about 5,600 processed TUH slides in this subset. We have an additional 53,000 unprocessed TUH slides digitized. Corpora of this size will stimulate the development of a new generation of deep learning technology. In clinical settings where resources are limited, an assistive diagnoses model could support pathologists’ workload and even help prioritize suspected cancerous cases. ACKNOWLEDGMENTS This material is supported by the National Science Foundation under grants nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. REFERENCES [1] N. Shawki et al., “The Temple University Digital Pathology Corpus,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York City, New York, USA: Springer, 2020, pp. 67 104. https://www.springer.com/gp/book/9783030368432. [2] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning.” Major Research Instrumentation (MRI), Division of Computer and Network Systems, Award No. 1726188, January 1, 2018 – December 31, 2021. https://www. isip.piconepress.com/projects/nsf_dpath/. [3] A. Gulati et al., “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 5036-5040. https://doi.org/10.21437/interspeech.2020-3015. [4] C.-J. Wu et al., “Machine Learning at Facebook: Understanding Inference at the Edge,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 331–344. https://ieeexplore.ieee.org/document/8675201. [5] I. Caswell and B. Liang, “Recent Advances in Google Translate,” Google AI Blog: The latest from Google Research, 2020. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. [Accessed: 01-Aug-2021]. [6] V. Khalkhali, N. Shawki, V. Shah, M. Golmohammadi, I. Obeid, and J. Picone, “Low Latency Real-Time Seizure Detection Using Transfer Deep Learning,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2021, pp. 1 7. https://www.isip. piconepress.com/publications/conference_proceedings/2021/ieee_spmb/eeg_transfer_learning/. [7] J. Picone, T. Farkas, I. Obeid, and Y. Persidsky, “MRI: High Performance Digital Pathology Using Big Data and Machine Learning,” Philadelphia, Pennsylvania, USA, 2020. https://www.isip.piconepress.com/publications/reports/2020/nsf/mri_dpath/. [8] I. Hunt, S. Husain, J. Simons, I. Obeid, and J. Picone, “Recent Advances in the Temple University Digital Pathology Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2019, pp. 1–4. https://ieeexplore.ieee.org/document/9037859. [9] A. P. Martinez, C. Cohen, K. Z. Hanley, and X. (Bill) Li, “Estrogen Receptor and Cytokeratin 5 Are Reliable Markers to Separate Usual Ductal Hyperplasia From Atypical Ductal Hyperplasia and Low-Grade Ductal Carcinoma In Situ,” Arch. Pathol. Lab. Med., vol. 140, no. 7, pp. 686–689, Apr. 2016. https://doi.org/10.5858/arpa.2015-0238-OA. 
    more » « less
  3. null (Ed.)
    Mobile devices are becoming a more common part of the education experience. Students can access their devices at any time to perform assignments or review material. Mobile apps can have the added advantage of being able to automatically grade student work and provide instantaneous feedback. However, numerous challenges remain in implementing effective mobile educational apps. One challenge is the small screen size of smartphones, which was a concern for a spatial visualization training app where students sketch isometric and orthographic drawings. This app was originally developed for iPads, but the wide prevalence of smartphones led to porting the software to iPhone and Android phones. The sketching assignments on a smartphone screen required more frequent zooming and panning, and one of the hypotheses of this study was that the educational effectiveness on smartphones was the same as on the larger screen sizes using iPad tablets. The spatial visualization mobile sketching app was implemented in a college freshman engineering graphics course to teach students how to sketch orthographic and isometric assignments. The app provides automatic grading and hint feedback to help students when they are stuck. Students in this pilot were assigned sketching problems as homework using their personal devices. Students were administered a pre- and post- spatial visualization test (PSVT-R, a reliable, well-validated instrument) to assess learning gains. The trial analysis focuses on students who entered the course with limited spatial visualization experience as identified based on a score of ≤70% on the PSVT:R since students entering college with low PSVT:R scores are at higher risk of dropping out of STEM majors. Among these low-performing students, those who used the app showed significant progress: (71%) raised their test scores above 70% bringing them out of the at-risk range for dropping out of engineering. While the PSVT:R test has been well validated, there are benefits to developing alternative methods of assessing spatial visualization skills. We developed an assembly pre- and post- test based upon a timed Lego™ exercise. At the start of the quarter, students were timed to see how long it would take them to build small lego sets using only visual instructions. Students were timed again on a different lego set after completion of the spatial visualization app. One benefit of the test was that it illustrated to the engineering students a skill that could be perceived as more relevant to their careers, and thus possibly increased their motivation for spatial visualization training. In addition, it may be possible to adapt the assembly test to elementary school grade levels where the PSVT:R test would not be suitable. Preliminary results show that the average lego build times decreased significantly after using the mobile app, indicating an improvement in students’ spatial reasoning skills. A comparison will also be done between normalized completion times on the assembly test and the PSVT:R tests in order to see how the assembly test compares to the “gold standard”. In addition to the PSVT-R instrument, a survey was conducted to evaluate student usage and their impressions of the app. Students found the app engaging, easy to use, and something they would do whenever they had “a free moment”. 95% of the students recommended the app to a friend if they are struggling with spatial visualization skills. This paper will describe the implementation of the mobile spatial visualization sketching app in a large college classroom, and highlight the app’s impact in increasing self-efficacy in spatial visualization and sketching 
    more » « less
  4. Responding to the need to teach remotely due to COVID-19, we used readily available computational approaches (and developed associated tutorials (https://mdh-cures-community.squarespace.com/virtual-cures-and-ures)) to teach virtual Course-Based Undergraduate Research Experience (CURE) laboratories that fulfil generally accepted main components of CUREs or Undergraduate Research Experiences (UREs): Scientific Background, Hypothesis Development, Proposal, Experiments, Teamwork, Data Analysis, Conclusions, and Presentation1. We then developed and taught remotely, in three phases, protein-centric CURE activities that are adaptable to virtually any protein, emphasizing contributions of noncovalent interactions to structure, binding and catalysis (an ASBMB learning framework2 foundational concept). The courses had five learning goals (unchanged in the virtual format),focused on i) use of primary literature and bioinformatics, ii) the roles of non-covalent interactions, iii) keeping accurate laboratory notebooks, iv) hypothesis development and research proposal writing, and, v) presenting the project and drawing evidence based conclusions The first phase, Developing a Research Proposal, contains three modules, and develops hallmarks of a good student-developed hypothesis using available literature (PubMed3) and preliminary observations obtained using bioinformatics, Module 1: Using Primary Literature and Data Bases (Protein Data Base4, Blast5 and Clustal Omega6), Module 2: Molecular Visualization (PyMol7 and Chimera8), culminating in a research proposal (Module 3). Provided rubrics guide student expectations. In the second phase, Preparing the Proteins, students prepared necessary proteins and mutants using Module 4: Creating and Validating Models, which leads users through creating mutants with PyMol, homology modeling with Phyre29 or Missense10, energy minimization using RefineD11 or ModRefiner12, and structure validation using MolProbity13. In the third phase, Computational Experimental Approaches to Explore the Questions developed from the Hypothesis, students selected appropriate tools to perform their experiments, chosen from computational techniques suitable for a CURE laboratory class taught remotely. Questions, paired with computational approaches were selected from Modules 5: Exploring Titratable Groups in a Protein using H++14, 6: Exploring Small Molecule Ligand Binding (with SwissDock15), 7: Exploring Protein-Protein Interaction (with HawkDock16), 8: Detecting and Exploring Potential Binding Sites on a Protein (with POCASA17 and SwissDock), and 9: Structure-Activity Relationships of Ligand Binding & Drug Design (with SwissDock, Open Eye18 or the Molecular Operating Environment (MOE)19). All involve freely available computational approaches on publicly accessible web-based servers around the world (with the exception of MOE). Original literature/Journal club activities on approaches helped students suggest tie-ins to wet lab experiments they could conduct in the future to complement their computational approaches. This approach allowed us to continue using high impact CURE teaching, without changing our course learning goals. Quantitative data (including replicates) was collected and analyzed during regular class periods. Students developed evidence-based conclusions and related them to their research questions and hypotheses. Projects culminated in a presentation where faculty feedback was facilitated with the Virtual Presentation platform from QUBES20 These computational approaches are readily adaptable for topics accessible for first to senior year classes and individual research projects (UREs). We used them in both partial and full semester CUREs in various institutional settings. We believe this format can benefit faculty and students from a wide variety of teaching institutions under conditions where remote teaching is necessary. 
    more » « less
  5. CONTEXT The need to better prepare students for the engineering workplace is a long-standing and on-going concern among engineering educators. With the aim of addressing gaps in preparation, the number of new work- and practice-based programs is growing. Identifying the first and most significant challenges recent graduates face in the workplace can contribute new insights into how students could be better prepared for the school-to-work transition. PURPOSE In order to better understand the transition from school to work, this paper presents findings from the first year of a five-year longitudinal study exploring the experiences and career trajectories of early career engineers. The specific question addressed in this paper is: What was the biggest challenge civil engineers experienced during their first year in the workplace? METHODS Eighteen early career civil engineers participated in semi-structured interviews in May of 2019. Participants were recruited from national and local listservs in the United States. None worked in the same office, although two worked for the same company in different offices. They were asked a range of questions related to their experiences transitioning into their careers. For this paper, responses pertaining to the biggest challenge question were analysed through open coding to determine if any themes could be identified in participants’ responses. OUTCOMES Participants were asked about the biggest challenge they had encountered since starting their job. Their responses covered a very wide range of issues. There were three themes of note that appeared in at least four different participants’ responses. They were: 1) interdependence, 2) new practices and material, and 3) negative interactions. 1 and 2 were cited by both men and women; 3 was only cited by women. CONCLUSIONS In addition to providing insights into job readiness that engineering educators can address, the findings speak to several aspects of organizational socialization. Most participants’ biggest challenges (in the form of interdependence and new practices and materials) were related to “learning & adaptation.” Challenges related to “relationship building” and “work group socialization tactics” (in the form of negative interactions) were only the biggest challenges for women, not men. However, negative interactions also extended beyond factors accounted for in current models of organizational socialization, and should be accounted for in revised models. KEYWORDS Early career, job readiness, organizational socialization 
    more » « less