skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Moving from Composable to Programmable
In today’s Big Data era, data scientists require modern workflows to quickly analyze large-scale datasets using complex codes to maintain the rate of scientific progress. These scientists often rely on available campus resources or off-the-shelf computational systems for their applications. Unified infrastructure or over-provisioned servers can quickly become bottlenecks for specific tasks, wasting time and resources. Composable infrastructure helps solve these problems by providing users with new ways to increase resource utilization. Composable infrastructure disaggregates a computer’s components – CPU, GPU (accelerators), storage and networking – into fluid pools of resources, but typically relies upon infrastructure engineers to architect individual machines. Infrastructure is either managed with specialized command-line utilities, user interfaces or specification files. These management models are cumbersome and difficult to incorporate into data-science workflows. We developed a high-level software API, Composastructure, which, when integrated into modern workflows, can be used by infrastructure engineers as well as data scientists to reorganize composable resources on demand. Composastructure enables infrastructures to be programmable, secure, persistent and reproducible. Our API composes machines, frees resources, supports multi-rack operations, and includes a Python module for Jupyter Notebooks.  more » « less
Award ID(s):
1828265
PAR ID:
10356863
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
36th IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)
Page Range / eLocation ID:
1215 to 1220
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In today's Big Data era, data scientists require new computational instruments in order to quickly analyze large-scale datasets using complex codes and quicken the rate of scientific progress. While Federally-funded computer resources, from supercomputers to clouds, are beneficial, they are often limiting - particularly for deep learning and visualization - as they have few Graphics Processing Units (GPUs). GPUs are at the center of modern high-performance computing and artificial intelligence, efficiently performing mathematical operations that can be massively parallelized, speeding up codes used for deep learning, visualization and image processing, more so than general-purpose microprocessors, or Central Processing Units (CPUs). The University of Illinois at Chicago is acquiring a much-in-demand GPU-based instrument, COMPaaS DLV - COMposable Platform as a Service Instrument for Deep Learning & Visualization, based on composable infrastructure, an advanced architecture that disaggregates the underlying compute, storage, and network resources for scaling needs, but operates as a single cohesive infrastructure for management and workload purposes. We are experimenting with a small system and learning a great deal about composability, and we believe COMPaaS DLV users will benefit from the varied workflow that composable infrastructure allows. 
    more » « less
  2. Composable infrastructure holds the promise of accelerating the pace of academic research and discovery by enabling researchers to tailor the resources of a machine (e.g., GPUs, storage, NICs), on-demand, to address application needs. We were first introduced to composable infrastructure in 2018, and at the same time, there was growing demand among our College of Engineering faculty for GPU systems for data science, artificial intelligence / machine learning / deep learning, and visualization. Many purchased their own individual desktop or deskside systems, a few pursued more costly cloud and HPC solutions, and others looked to the College or campus computer center for GPU resources which, at the time, were scarce. After surveying the diverse needs of our faculty and studying product offerings by a few nascent startups in the composable infrastructure sector, we applied for and received a grant from the National Science Foundation in November 2019 to purchase a mid-scale system, configured to our specifications, for use by faculty and students for research and research training. This paper describes our composable infrastructure solution and implementation for our academic community. Given how modern workflows are progressively moving to containers and cloud frameworks (using Kubernetes) and to programming notebooks (primarily Jupyter), both for ease of use and for ensuring reproducible experiments, we initially adapted these tools for our system. We have since made it simpler to use our system, and now provide our users with a public facing JupyterHub server. We also added an expansion chassis to our system to enable composable co-location, which is a shared central architecture in which our researchers can insert and integrate specialized resources (GPUs, accelerators, networking cards, etc.) needed for their research. In February 2020, installation of our system was finalized and made operational and we began providing access to faculty in the College of Engineering. Now, two years later, it is used by over 40 faculty and students plus some external collaborators for research and research training. Their use cases and experiences are briefly described in this paper. Composable infrastructure has proven to be a useful computational system for workload variability, uneven applications, and modern workflows in academic environments. 
    more » « less
  3. Modern science depends on computers, but not all scientists have access to the scale of computation they need. A digital divide separates scientists who accelerate their science using large cyberinfrastructure from those who do not, or who do not have access to the compute resources or learning opportunities to develop the skills needed. The exclusionary nature of the digital divide threatens equity and the future of innovation by leaving people out of the scientific process while over-amplifying the voices of a small group who have resources. However, there are potential solutions: recent advancements in public research cyberinfrastructure and resources developed during the open science revolution are providing tools that can help bridge this divide. These tools can enable access to fast and powerful computation with modest internet connections and personal computers. Here we contribute another resource for narrowing the digital divide: scalable virtual machines running on public cloud infrastructure. We describe the tools, infrastructure, and methods that enabled successful deployment of a reproducible and scalable cyberinfrastructure architecture for a collaborative data synthesis working group in February 2023. This platform enabled 45 scientists with varying data and compute skills to leverage 40,000 hours of compute time over a 4-day workshop. Our approach provides an open framework that can be replicated for educational and collaborative data synthesis experiences in any data- and compute-intensive discipline. 
    more » « less
  4. Communications infrastructures and compute resources are critical to enabling advanced science research projects. Science cyberinfrastructures must meet clear performance requirements, must be adjustable to changing requirements and must facilitate reproducibility. These characteristics can be met by a programmable infrastructure with guaranteed resources such as the BRIDGES infrastructure enabling cross Atlantic research projects. While programmability should be a foundational design principle for research cyberinfrastructures, by itself might not be sufficient to enabling scientists who have no or limited experience with advanced IT technologies operate their testbeds independent of IT support teams. The trend of offering “no code” platforms enabling users without IT core competency to achieve business goals should manifest itself in the context of research and educational infrastructures as well. In this paper we describe the architecture of a “no code” platform which would enable scientists to easily configure and modify a programmable infrastructure by using a large language model-based interface integrated with the composable services language of the infrastructure. The BRIDGES testbed is used as an example for such an integration where the functionality benefits projects operated by large, diverse teams. 
    more » « less
  5. Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation. 
    more » « less