skip to main content


Title: PIOT-Hub - A collaborative cloud tool for generation of physical input–output tables using mechanistic engineering models
Mapping material flows in an economy is crucial to identifying strategies for resource management toward lowering the waste and environmental impacts of society, a key objective of research in industrial ecology. However, constructing models for mapping material flows at a sectoral level, such as in physical input–output tables (PIOTs) at highly disaggregated levels, is tedious and relies on a large amount of empirical data. To overcome this challenge, a novel collaborative cloud platform PIOT-Hub is developed in this work. This platform utilizes a Python-based simulation system for extracting material flow data from mechanistic models, thus semi-automating the generation of PIOTs. The simulation system implements a bottom-up approach of utilizing scaled engineering models to generate physical supply tables (PSTs) and physical use tables (PUTs) which are converted to PIOTs (described in (Vunnava & Singh, 2021)). Mechanistic models can be uploaded by users for sectors on PIOT-Hub to develop PIOTs for any region. Both models and resulting PST/PUT/PIOTs can be shared with other users utilizing the collaborative platform. The automation and sharing features provided by PIOT-Hub will help to significantly reduce the time required to develop PIOT and improve the reproducibility/continuity of PIOT generation, thus allowing the study of the changing nature of material flows in regional economy. In this paper, we describe the simulation system MFDES and PIOT-Hub architecture/functionality through a demo example for creating PIOT in agro-based sectors for Illinois. Future work includes scaling up the cloud infrastructure for large scale PIOT generation and enhancing the tool compatibility for different sectors in economy.  more » « less
Award ID(s):
1805741
NSF-PAR ID:
10344530
Author(s) / Creator(s):
; ; ;
Editor(s):
Lenzen, Manfred
Date Published:
Journal Name:
Journal of industrial ecology
Volume:
26
Issue:
1
ISSN:
1088-1980
Page Range / eLocation ID:
107-120
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Sustainable transition to low carbon and zero waste economy requires a macroscopic evaluation of opportunities and impact of adopting emerging technologies in a region. However, a full assessment of current physical flows and wastes is a tedious task, thus leading to lack of comprehensive assessment before scale up and adoption of emerging technologies. Utilizing the mechanistic models developed for engineering and biological systems with macroeconomic framework of Input-Output models, we propose a novel integrated approach to fully map the physical economy, that automates the process of mapping industrial flows and wastes in a region. The approach is demonstrated by mapping the agro-based physical economy of the state of Illinois, USA by using mechanistic models for 10 sectors, which have high impact on waste generation. Each model mechanistically simulates the material transformation processes in the economic sector and provides the material flow information for mapping. The model for physical economy developed in the form of a Physical Input-Output Table (PIOT) captures the interindustry physical interactions in the region and waste flows, thus providing insights into the opportunities to implement circular economy strategies i.e., adoption of recycling technologies at large scale. In Illinois, adoption of technologies for industrial waste-water & hog manure recycling will have the highest impact by reducing > 62 % of hog industry waste, > 99 % of soybean hull waste, and > 96 % of dry corn milling (corn ethanol production) waste reduction. Small % reduction in fertilizer manufacturing waste was also observed. The physical economy model revealed that Urea sector had the highest material use of 5.52E+08 tons and green bean farming with lowest material use of 1.30E+05 tons for the year modeled (2018). The mechanistic modeling also allowed to capture elemental flows across the physical economy with Urea sector using 8.25E+07 tons of carbon per operation-year (highest) and bean farming using 3.90E+04 tons of elemental carbon per operation-year (least). The approach proposed here establishes a connection between engineering and physical economy modeling community for standardizing the mapping of physical economy that can provide insights for successfully transitioning to a low carbon and zero waste circular economy. 
    more » « less
  2. null (Ed.)
    The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge. 
    more » « less
  3. To remain competitive in the global economy, the United States needs skilled technical workers in occupations requiring a high level of domain-specific technical knowledge to meet the country’s anticipated shortage of 5 million technically-credentialed workers. The changing demographics of the country are of increasing importance to addressing this workforce challenge. According to federal data, half the students earning a certificate in 2016-17 received credentials from community colleges where the percent enrollment of Latinx (a gender-neutral term referencing Latin American cultural or racial identity) students (56%) exceeds that of other post-secondary sectors. If this enrollment rate persists, then by 2050 over 25% of all students enrolled in higher education will be Latinx. Hispanic Serving Institutions (HSIs) are essential points of access as they enroll 64% of all Latinx college students, and nearly 50% of all HSIs are 2-year institutions. Census estimates predict Latinxs are the fastest-growing segment reaching 30% of the U.S. population while becoming the youngest group comprising 33.5% of those under 18 years by 2060. The demand for skilled workers in STEM fields will be met when workers reflect the diversity of the population, therefore more students—of all ages and backgrounds—must be brought into community colleges and supported through graduation: a central focus of community colleges everywhere. While Latinx students of color are as likely as white students to major in STEM, their completion numbers drop dramatically: Latinx students often have distinct needs that evolved from a history of discrimination in the educational system. HSI ATE Hub is a three-year collaborative research project funded by the National Science Foundation Advanced Technological Education Program (NSF ATE) being implemented by Florence Darlington Technical College and Science Foundation Arizona Center for STEM at Arizona State University to address the imperative that 2-year Hispanic Serving Institutions (HSIs) develop and improve engineering technology and related technician education programs in a way that is culturally inclusive. Interventions focus on strengthening grant-writing skills among CC HSIs to fund advancements in technician education and connecting 2-year HSIs with resources for faculty development and program improvement. A mixed methods approach will explore the following research questions: 1) What are the unique barriers and challenges for 2-year HSIs related to STEM program development and grant-writing endeavors? 2) How do we build capacity at 2-year HSIs to address these barriers and challenges? 3) How do mentoring efforts/styles need to differ? 4) How do existing ATE resources need to be augmented to better serve 2-year HSIs? 5) How do proposal submission and success rates compare for 2-year HSIs that have gone through the KS STEM planning process but not M-C, through the M-C cohort mentoring process but not KS, and through both interventions? The project will identify HSI-relevant resources, augment existing ATE resources, and create new ones to support 2-year HSI faculty as potential ATE grantees. To address the distinct needs of Latinx students in STEM, resources representing best practices and frameworks for cultural inclusivity, as well as faculty development will be included. Throughout, the community-based tradition of the ATE Program is being fostered with particular emphasis on forming, nurturing, and serving participating 2-year HSIs. This paper will discuss the need, baseline data, and early results for the three-year program, setting the stage for a series of annual papers that report new findings. 
    more » « less
  4. To remain competitive in the global economy, the United States needs skilled technical workers in occupations requiring a high level of domain-specific technical knowledge to meet the country’s anticipated shortage of 5 million technically-credentialed workers. The changing demographics of the country are of increasing importance to addressing this workforce challenge. According to federal data, half the students earning a certificate in 2016-17 received credentials from community colleges where the percent enrollment of Latinx (a gender-neutral term referencing Latin American cultural or racial identity) students (56%) exceeds that of other post-secondary sectors. If this enrollment rate persists, then by 2050 over 25% of all students enrolled in higher education will be Latinx. Hispanic Serving Institutions (HSIs) are essential points of access as they enroll 64% of all Latinx college students, and nearly 50% of all HSIs are 2-year institutions. Census estimates predict Latinxs are the fastest-growing segment reaching 30% of the U.S. population while becoming the youngest group comprising 33.5% of those under 18 years by 2060. The demand for skilled workers in STEM fields will be met when workers reflect the diversity of the population, therefore more students—of all ages and backgrounds—must be brought into community colleges and supported through graduation: a central focus of community colleges everywhere. While Latinx students of color are as likely as white students to major in STEM, their completion numbers drop dramatically: Latinx students often have distinct needs that evolved from a history of discrimination in the educational system. HSI ATE Hub is a three-year collaborative research project funded by the National Science Foundation Advanced Technological Education Program (NSF ATE) being implemented by Florence Darlington Technical College and Science Foundation Arizona Center for STEM at Arizona State University to address the imperative that 2-year Hispanic Serving Institutions (HSIs) develop and improve engineering technology and related technician education programs in a way that is culturally inclusive. Interventions focus on strengthening grant-writing skills among CC HSIs to fund advancements in technician education and connecting 2-year HSIs with resources for faculty development and program improvement. A mixed methods approach will explore the following research questions: 1) What are the unique barriers and challenges for 2-year HSIs related to STEM program development and grant-writing endeavors? 2) How do we build capacity at 2-year HSIs to address these barriers and challenges? 3) How do mentoring efforts/styles need to differ? 4) How do existing ATE resources need to be augmented to better serve 2-year HSIs? 5) How do proposal submission and success rates compare for 2-year HSIs that have gone through the KS STEM planning process but not M-C, through the M-C cohort mentoring process but not KS, and through both interventions? The project will identify HSI-relevant resources, augment existing ATE resources, and create new ones to support 2-year HSI faculty as potential ATE grantees. To address the distinct needs of Latinx students in STEM, resources representing best practices and frameworks for cultural inclusivity, as well as faculty development will be included. Throughout, the community-based tradition of the ATE Program is being fostered with particular emphasis on forming, nurturing, and serving participating 2-year HSIs. This paper will discuss the need, baseline data, and early results for the three-year program, setting the stage for a series of annual papers that report new findings. 
    more » « less
  5. null (Ed.)
    The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish another major goal, supporting modern search and browse capabilities for a large collection of tweets from the Twitter social media platform, web pages, and electronic theses and dissertations (ETDs). The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers in a pipelined fashion, whether in the cluster or on virtual machines, for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system supports text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by five teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. The teams on this project include three collection management groups -- Electronic Theses and Dissertations (ETD), Tweets (TWT), and Web-Pages (WP) -- as well as the Front-end (FE) group and the Integration (INT) group to help provide the overarching structure for the application. This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. Each team will have several of these containers set up in a pipeline formation to allow scaling and extension of the current system. The INT team also contributes to a cross-team effort for exploring the use of Elasticsearch and its internally associated database. The INT team administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and the Ceph filesystem. During formative stages of development, the INT team also has a role in guiding team evaluations of prospective container components and workflows. The INT team is responsible for the overall project architecture and facilitating the tools and tutorials that assist the other teams in deploying containers in a development environment according to mutual specifications agreed upon with each team. The INT team maintains the status of the Kubernetes cluster, deploying new containers and pods as needed by the collection management teams as they expand their workflows. This team is responsible for utilizing a continuous integration process to update existing containers. During the development stage the INT team collaborates specifically with the collection management teams to create the pipeline for the ingestion and processing of new collection documents, crossing services between those teams as needed. The INT team develops a reasoner engine to construct workflows with information goal as input, which are then programmatically authored, scheduled, and monitored using Apache Airflow. The INT team is responsible for the flow, management, and logging of system performance data and making any adjustments necessary based on the analysis of testing results. The INT team has established a Gitlab repository for archival code related to the entire project and has provided the other groups with the documentation to deposit their code in the repository. This repository will be expanded using Gitlab CI in order to provide continuous integration and testing once it is available. Finally, the INT team will provide a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. The INT team will archive this distribution on the Virginia Tech Docker Container Registry and deploy it on the Virginia Tech CS Cloud. The INT-2020 team owes a sincere debt of gratitude to the work of the INT-2019 team. This is a very large undertaking and the wrangling of all of the products and processes would not have been possible without their guidance in both direct and written form. We have relied heavily on the foundation they and their predecessors have provided for us. We continue their work with systematic improvements, but also want to acknowledge their efforts Ibid. Without them, our progress to date would not have been possible. 
    more » « less