skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 4, 2026

Title: Visualizing MPI Collective Communication
Communication collectives are at the heart of distributed-memory parallel algorithms and the Message Passing Interface. In parallel computing courses, students can learn about collectives not only to utilize them as building blocks to implement other algorithms, but also as exemplars for designing and analyzing efficient algorithms. We develop a visualization tool to help students understand different algorithms for collective operations as well as evaluate and analyze the algorithms' efficiencies. Our implementation is written in C++ with OpenMP and uses the Thread Safe Graphics Library. We simulate distributed-memory message passing to implement the algorithms, and the threads concurrently illustrate their local memories and message passing using a shared canvas. Our tool includes visualizations of different algorithms for Scatter, Gather, ReduceScatter, AllGather, Broadcast, Reduce, AllReduce, and AlltoAll.  more » « less
Award ID(s):
2106920 1942892
PAR ID:
10618223
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE Computer Society
Date Published:
Format(s):
Medium: X
Location:
Milan, Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The 2019 ABET computer science criteria requires that all computing students learn parallel and distributed computing (PDC) as undergraduates, and CS2013 recommends at least fifteen hours of PDC in the undergraduate curriculum. Consequently, many educators look for easy ways to integrate PDC into courses at their institutions. This hands-on workshop introduces Message Passing Interface (MPI) basics in C/C++ and Python using clusters of Raspberry Pis. The Message Passing Interface (MPI) is a multi-language, platform independent, industry-standard library for parallel and distributed computing. Raspberry Pis are an inexpensive and engaging hardware platform for studying PDC as early as the first course. Participants will experience how to teach distributed computing essentials with MPI by means of reusable, effective "parallel patterns", including single program multiple data (SPMD) execution, send-receive message passing, the master-worker pattern, parallel loop patterns, and other common patterns, plus longer "exemplar" programs that use MPI to solve significant applied problems. The workshop includes: (i) personal experience with the Raspberry Pi (clusters provided for workshop use); (ii) assembly of Beowulf clusters of Raspberry Pis quickly in the classroom; (iii) self-paced hands-on experimentation with the working MPI programs; and (iv) a discussion of how these may be used to achieve the goals of CS2013 and ABET. No prior experience with MPI, PDC, or the Raspberry Pi is expected. All materials from this workshop will be freely available from CSinParallel.org; participants should bring a laptop to access these materials. 
    more » « less
  2. Kokkos provides in-memory advanced data structures, concurrency, and algorithms to support performance portable C++ parallel programming across CPUs and GPUs. The Message Passing Interface (MPI) provides the most widely used message passing model for inter-node communication. Many programmers use both Kokkos and MPI together. In this paper, Kokkos is integrated within an MPI implementation for ease of use in applications that use both Kokkos and MPI, without sacrificing performance. For instance, this model allows passing first-class Kokkos objects directly to extended C++-based MPI APIs. We prototype this integrated model using ExaMPI, a C++17- based subset implementation of MPI-4.We then demonstrate use of our C++-friendly APIs and Kokkos extensions through benchmarks and a mini-application.We explain why direct use of Kokkos within certain parts of the MPI implementation is crucial to performance and enhanced expressivity. Although the evaluation in this paper focuses on CPU-based examples, we also motivate why making Kokkos memory spaces visible to the MPI implementation generalizes the idea of “CPU memory” and “GPU memory” in ways that enable further optimizations in heterogeneous Exascale architectures. Finally, we describe future goals and show how these mesh both with a possible future C++ API for MPI-5 as well as the potential to accelerate MPI on such architectures. 
    more » « less
  3. Topological deep learning (TDL) has emerged as a powerful tool for modeling higher-order interactions in relational data. However, phenomena such as over- squashing in topological message-passing remain understudied and lack theoreti- cal analysis. We propose a unifying axiomatic framework that bridges graph and topological message-passing by viewing simplicial and cellular complexes and their message-passing schemes through the lens of relational structures. This ap- proach extends graph-theoretic results and algorithms to higher-order structures, facilitating the analysis and mitigation of oversquashing in topological message- passing networks. Through theoretical analysis and empirical studies on simplicial networks, we demonstrate the potential of this framework to advance TDL. 
    more » « less
  4. Ghafoor, Sheikh; Prasad, Sushil K. (Ed.)
    The ACM/IEEE CS 2013 curriculum recommendations state that every undergraduate CS major should learn about parallel and distributed computing (PDC). One way to accomplish this is to teach students about the Message Passing Interface (MPI), a platform that is commonly used on modern supercomputers and Beowulf clusters, but can also be used on a Network of Workstations (NoW), or a multicore laptop or desktop. MPI incorporates many PDC concepts and can serve as a platform for hands-on learning activities in which students must apply those concepts. The MPI standard defines language bindings for Fortran and C/C++, but many university instructors lack expertise in these languages, preventing them from using MPI in their courses. OpenMPI is a free implementation of the MPI standard that also provides Java bindings for MPI. This paper describes how to install OpenMPI with these Java bindings; to illustrate the use of these bindings, the paper also presents several patternlets—minimalist example programs—that show how to implement PDC design patterns using OpenMPI and Java. This provides a new means of introducing students to PDC concepts. 
    more » « less
  5. Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To implement and analyze such systems, this article develops the meta theory of probabilistic session types with an application focus on automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is smoothly integrated with the type system and is a variant of automatic amortized resource analysis. Type inference relies on linear constraint solving to automatically derive symbolic bounds for various cost metrics. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with complex type annotations. The type system has been implemented in the language NomosPro with linear-time type checking. Experiments demonstrate that NomosPro is applicable in different domains such as cost analysis of randomized distributed algorithms, analysis of Markov chains, probabilistic analysis of amortized data structures and digital contracts. NomosPro is also shown to be scalable by (i) implementing two broadcast and a bounded retransmission protocol where messages are dropped with a fixed probability, and (ii) verifying the limiting distribution of a Markov chain with 64 states and 420 transitions. 
    more » « less