skip to main content

Title: Interactive Demonstrations and Hands-On Use of thenet.science Cyberinfrastructure for Network Science Chairs’ Welcome and Tutorial Summary
Networks are readily identifiable in many aspects of society: cellular telephone networks and social networks are two common examples. Networks are studied within many academic disciplines. Consequently, a large body of (open-source) software is being produced to perform computations on networks. A cyberinfrastructure for network science, called net.science, is being built to provide a computational platform and resource for both producers and consumers of networks and software tools. This tutorial is a hands-on demonstration of some of net.science’s features.
Authors:
; ; ;
Award ID(s):
1916805
Publication Date:
NSF-PAR ID:
10300369
Journal Name:
13th ACM Web Science Conference 2021
Page Range or eLocation-ID:
137 to 137
Sponsoring Org:
National Science Foundation
More Like this
  1. Network representations of socio-physical systems are ubiquitous, examples being social (media) networks and infrastructure networks like power transmission andwater systems. The many software tools that analyze and visualize networks, and carry out simulations on them, require different graph formats. Consequently, it is important to develop software for converting graphs that are represented in a given source format into a required representation in a destination format. For network-based computations, graph conversion is a key capability that facilitates interoperability among software tools. This paper describes such a system called GraphTrans to convert graphs among different formats. This system is part of amore »new cyberinfrastructure for network science called net.science. We present the GraphTrans system design and implementation, results from a performance evaluation, and a case study to demonstrate its utility.« less
  2. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a Nationalmore »Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath.« less
  3. HPC networks and campus networks are beginning to leverage various levels of network programmability ranging from programmable network configuration (e.g., NETCONF/YANG, SNMP, OF-CONFIG) to software-based controllers (e.g., OpenFlow Controllers) to dynamic function placement via network function virtualization (NFV). While programmable networks offer new capabilities, they also make the network more difficult to debug. When applications experience unexpected network behavior, there is no established method to investigate the cause in a programmable network and many of the conventional troubleshooting debugging tools (e.g., ping and traceroute) can turn out to be completely useless. This absence of troubleshooting tools that support programmability ismore »a serious challenge for researchers trying to understand the root cause of their networking problems. This paper explores the challenges of debugging an all-campus science DMZ network that leverages SDN-based network paths for high-performance flows. We propose Flow Tracer, a light-weight, data-plane-based debugging tool for SDN-enabled networks that allows end users to dynamically discover how the network is handling their packets. In particular, we focus on solving the problem of identifying an SDN path by using actual packets from the flow being analyzed as opposed to existing expensive approaches where either probe packets are injected into the network or actual packets are duplicated for tracing purposes. Our simulation experiments show that Flow Tracer has negligible impact on the performance of monitored flows. Moreover, our tool can be extended to obtain further information about the actual switch behavior, topology, and other flow information without privileged access to the SDN control plane.« less
  4. The application areas for plastic optical fibers such as in-building or aircraft networks usually have tight power budgets and require multiple passive components. In addition, advanced modulation formats are being considered for transmission over plastic optical fibers (POFs) to increase spectral efficiency. In this scenario, there is a clear need for a flexible and dynamic system-level simulation framework for POFs that includes models of light propagation in POFs and the components that are needed to evaluate the entire system performance. Until recently, commercial simulation software either was designed specifically for single-mode glass fibers or modeled individual guided modes in multimodemore »fibers with considerable detail, which is not adequate for large-core POFs where there are millions of propagation modes, strong mode coupling and high variability. These are some of the many challenges involved in the modeling and simulation of POF-based systems. Here, we describe how we are addressing these challenges with models based on an intensity-vs-angle representation of the multimode signal rather than one that attempts to model all the modes in the fiber. Furthermore, we present model approaches for the individual components that comprise the POF-based system and how the models have been incorporated into system-level simulations, including the commercial software packages SimulinkTM and ModeSYSTM.« less
  5. 1. Description of the objectives and motivation for the contribution to ECE education The demand for wireless data transmission capacity is increasing rapidly and this growth is expected to continue due to ongoing prevalence of cellular phones and new and emerging bandwidth-intensive applications that encompass high-definition video, unmanned aerial systems (UAS), intelligent transportation systems (ITS) including autonomous vehicles, and others. Meanwhile, vital military and public safety applications also depend on access to the radio frequency spectrum. To meet these demands, the US federal government is beginning to move from the proven but inefficient model of exclusive frequency assignments to amore »more-efficient, shared-spectrum approach in some bands of the radio frequency spectrum. A STEM workforce that understands the radio frequency spectrum and applications that use the spectrum is needed to further increase spectrum efficiency and cost-effectiveness of wireless systems over the next several decades to meet anticipated and unanticipated increases in wireless data capacity. 2. Relevant background including literature search examples if appropriate CISCO Systems’ annual survey indicates continued strong growth in demand for wireless data capacity. Meanwhile, undergraduate electrical and computer engineering courses in communication systems, electromagnetics, and networks tend to emphasize mathematical and theoretical fundamentals and higher-layer protocols, with less focus on fundamental concepts that are more specific to radio frequency wireless systems, including the physical and media access control layers of wireless communication systems and networks. An efficient way is needed to introduce basic RF system and spectrum concepts to undergraduate engineering students in courses such as those mentioned above who are unable to, or had not planned to take a full course in radio frequency / microwave engineering or wireless systems and networks. We have developed a series of interactive online modules that introduce concepts fundamental to wireless communications, the radio frequency spectrum, and spectrum sharing, and seek to present these concepts in context. The modules include interactive, JavaScript-based simulation exercises intended to reinforce the concepts that are presented in the modules through narrated slide presentations, text, and external links. Additional modules in development will introduce advanced undergraduate and graduate students and STEM professionals to configuration and programming of adaptive frequency-agile radios and spectrum management systems that can operate efficiently in congested radio frequency environments. Simulation exercises developed for the advanced modules allow both manual and automatic control of simulated radio links in timed, game-like simulations, and some exercises will enable students to select from among multiple pre-coded controller strategies and optionally edit the code before running the timed simulation. Additionally, we have developed infrastructure for running remote laboratory experiments that can also be embedded within the online modules, including a web-based user interface, an experiment management framework, and software defined radio (SDR) application software that runs in a wireless testbed initially developed for research. Although these experiments rely on limited hardware resources and introduce additional logistical considerations, they provide additional realism that may further challenge and motivate students. 3. Description of any assessment methods used to evaluate the effectiveness of the contribution, Each set of modules is preceded and followed by a survey. Each individual module is preceded by a quiz and followed by another quiz, with pre- and post-quiz questions drawn from the same pool. The pre-surveys allow students to opt in or out of having their survey and quiz results used anonymously in research. 4. Statement of results. The initial modules have been and are being used by three groups of students: (1) students in an undergraduate Introduction to Communication Systems course; (2) an interdisciplinary group of engineering students, including computer science students, who are participating in related undergraduate research project; and (3) students in a graduate-level communications course that includes both electrical and computer engineers. Analysis of results from the first group of students showed statistically significant increases from pre-quiz to post-quiz for each of four modules on fundamental wireless communication concepts. Results for the other students have not yet been analyzed, but also appear to show substantial pre-quiz to post-quiz increases in mean scores.« less