skip to main content


Title: Neocortex and Bridges-2: A High Performance AI+HPC Ecosystem for Science, Discovery, and Societal Good
Artificial intelligence (AI) is transforming research through analysis of massive datasets and accelerating simulations by factors of up to a billion. Such acceleration eclipses the speedups that were made possible though improvements in CPU process and design and other kinds of algorithmic advances. It sets the stage for a new era of discovery in which previously intractable challenges will become surmountable, with applications in fields such as discovering the causes of cancer and rare diseases, developing effective, affordable drugs, improving food sustainability, developing detailed understanding of environmental factors to support protection of biodiversity, and developing alternative energy sources as a step toward reversing climate change. To succeed, the research community requires a high-performance computational ecosystem that seamlessly and efficiently brings together scalable AI, general-purpose computing, and large-scale data management. The authors, at the Pittsburgh Supercomputing Center (PSC), launched a second-generation computational ecosystem to enable AI-enabled research, bringing together carefully designed systems and groundbreaking technologies to provide at no cost a uniquely capable platform to the research community. It consists of two major systems: Neocortex and Bridges-2. Neocortex embodies a revolutionary processor architecture to vastly shorten the time required for deep learning training, foster greater integration of artificial deep learning with scientific workflows, and accelerate graph analytics. Bridges-2 integrates additional scalable AI, high-performance computing (HPC), and high-performance parallel file systems for simulation, data pre- and post-processing, visualization, and Big Data as a Service. Neocortex and Bridges-2 are integrated to form a tightly coupled and highly flexible ecosystem for AI- and data-driven research.  more » « less
Award ID(s):
1833317
NSF-PAR ID:
10274872
Author(s) / Creator(s):
;
Editor(s):
Nesmachnow, S.; Castro, H.; Tchernykh, A.
Date Published:
Journal Name:
Communications in computer and information science
Volume:
1327
ISSN:
1865-0929
Page Range / eLocation ID:
205-219
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Today’s landscape of computational science is evolving rapidly, with a need for new, flexible, and responsive supercomputing platforms for addressing the growing areas of artificial intelligence (AI), data analytics (DA) and convergent collaborative research. To support this community, we designed and deployed the Bridges-2 platform. Building on our highly successful Bridges supercomputer, which was a high-performance computing resource supporting new communities and complex workflows, Bridges-2 supports traditional and nontraditional research communities and applications; integrates new technologies for converged, scalable high-performance computing (HPC), AI, and data analytics; prioritizes researcher productivity and ease of use; and provides an extensible architecture for interoperation with complementary data intensive projects, campuses, and clouds. In this report, we describe Bridges-2’s hardware and configuration, user environments, and systems support and present the results of the successful Early User Program. 
    more » « less
  2. MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need forbenchmark carpentryfor scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection calledcloudmeshwith its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).

     
    more » « less
  3. null (Ed.)
    To advance knowledge by enabling unprecedented AI speed and scalability, the Pittsburgh Supercomputing Center (PSC), a joint research center of Carnegie Mellon University and the University of Pittsburgh, in partnership with Cerebras Systems and Hewlett Packard Enterprise (HPE), has deployed Neocortex, an innovative computing platform that accelerates scientific discovery by vastly shortening the time required for deep learning training and inference, fosters greater integration of deep AI models with scientific workflows, and provides promising hardware for the development of more efficient algorithms for artificial intelligence and graph analytics. Neocortex advances knowledge by accelerating scientific research, enabling development of more accurate models and use of larger training data, scaling model parallelism to unprecedented levels, and focusing on human productivity by simplifying tuning and hyperparameter optimization to create a transformative hardware and software platform for the exploration of new frontiers. Neocortex has been integrated with PSC’s complementary infrastructure. This papers shares experiences, decisions, and findings made in that process. The system is serving science and engineering users via an early user access program. Valuable artifacts developed during the integration phase have been made available via a public repository and have been consulted by other AI system deployments that have seen Neocortex as an inspiration. 
    more » « less
  4. The small sizes of most marine plankton necessitate that plankton sampling occur on fine spatial scales, yet our questions often span large spatial areas. Underwater imaging can provide a solution to this sampling conundrum but collects large quantities of data that require an automated approach to image analysis. Machine learning for plankton classification, and high-performance computing (HPC) infrastructure, are critical to rapid image processing; however, these assets, especially HPC infrastructure, are only available post-cruise leading to an ‘after-the-fact’ view of plankton community structure. To be responsive to the often-ephemeral nature of oceanographic features and species assemblages in highly dynamic current systems, real-time data are key for adaptive oceanographic sampling. Here we used the new In-situ Ichthyoplankton Imaging System-3 (ISIIS-3) in the Northern California Current (NCC) in conjunction with an edge server to classify imaged plankton in real-time into 170 classes. This capability together with data visualization in a heavy.ai dashboard makes adaptive real-time decision-making and sampling at sea possible. Dual ISIIS-Deep-focus Particle Imager (DPI) cameras sample 180 L s -1 , leading to >10 GB of video per min. Imaged organisms are in the size range of 250 µm to 15 cm and include abundant crustaceans, fragile taxa (e.g., hydromedusae, salps), faster swimmers (e.g., krill), and rarer taxa (e.g., larval fishes). A deep learning pipeline deployed on the edge server used multithreaded CPU-based segmentation and GPU-based classification to process the imagery. AVI videos contain 50 sec of data and can contain between 23,000 - 225,000 particle and plankton segments. Processing one AVI through segmentation and classification takes on average 3.75 mins, depending on biological productivity. A heavyDB database monitors for newly processed data and is linked to a heavy.ai dashboard for interactive data visualization. We describe several examples where imaging, AI, and data visualization enable adaptive sampling that can have a transformative effect on oceanography. We envision AI-enabled adaptive sampling to have a high impact on our ability to resolve biological responses to important oceanographic features in the NCC, such as oxygen minimum zones, or harmful algal bloom thin layers, which affect the health of the ecosystem, fisheries, and local communities. 
    more » « less
  5. null ; null ; null ; null ; null ; null (Ed.)
    The National Ecological Observatory Network (NEON) is a continental-scale observatory with sites across the US collecting standardized ecological observations that will operate for multiple decades. To maximize the utility of NEON data, we envision edge computing systems that gather, calibrate, aggregate, and ingest measurements in an integrated fashion. Edge systems will employ machine learning methods to cross-calibrate, gap-fill and provision data in near-real time to the NEON Data Portal and to High Performance Computing (HPC) systems, running ensembles of Earth system models (ESMs) that assimilate the data. For the first time gridded EC data products and response functions promise to offset pervasive observational biases through evaluating, benchmarking, optimizing parameters, and training new ma- chine learning parameterizations within ESMs all at the same model-grid scale. Leveraging open-source software for EC data analysis, we are al- ready building software infrastructure for integration of near-real time data streams into the International Land Model Benchmarking (ILAMB) package for use by the wider research community. We will present a perspective on the design and integration of end-to-end infrastructure for data acquisition, edge computing, HPC simulation, analysis, and validation, where Artificial Intelligence (AI) approaches are used throughout the distributed workflow to improve accuracy and computational performance. 
    more » « less