skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Structure of a Project-Based Course on the Fundamentals of Distributed Computing
We have developed a novel structure for a course on distributed computing suitable for juniors, seniors and graduate students that covers (a) use, design and implementation of state of the art IPC mechanisms, and (b) implementation and experimentation with state of the art consistency algorithms.  more » « less
Award ID(s):
1829752
PAR ID:
10104309
Author(s) / Creator(s):
Date Published:
Journal Name:
EduHiPC-18Workshop at IEEE HiPC Conference, 2018
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We design and implement parallel graph coloring algorithms on the GPU using two different abstractions—one data-centric (Gunrock), the other linear-algebra-based (GraphBLAS). We analyze the impact of variations of a baseline independent-set algorithm on quality and runtime. We study how optimizations such as hashing, avoiding atomics, and a max-min heuristic affect performance. Our Gunrock graph coloring implementation has a peak 2x speed-up, a geomean speed-up of 1.3x and produces 1.6x more colors over previous hardwired state-of-the-art implementations on real-world datasets. Our GraphBLAS implementation of Luby's algorithm produces 1.9x fewer colors than the previous state-of-the-art parallel implementation at the cost of 3x extra runtime, and 1.014x fewer colors than a greedy, sequential algorithm with a geomean speed-up of 2.6x. 
    more » « less
  2. We design and implement parallel graph coloring algorithms on the GPU using two different abstractions—one datacentric (Gunrock), the other linear-algebra-based (GraphBLAS). We analyze the impact of variations of a baseline independent-set algorithm on quality and runtime. We study how optimizations such as hashing, avoiding atomics, and a max-min heuristic affect performance. Our Gunrock graph coloring implementation has a peak 2x speed-up, a geomean speed-up of 1.3x and produces 1.6x more colors over previous hardwired state-of-theart implementations on real-world datasets. Our GraphBLAS implementation of Luby’s algorithm produces 1.9x fewer colors than the previous state-of-the-art parallel implementation at the cost of 3x extra runtime, and 1.014x fewer colors than a greedy, sequential algorithm with a geomean speed-up of 2.6x. 
    more » « less
  3. null (Ed.)
    We present an optimized implementation of the post-quantum Supersingular Isogeny Key Encapsulation (SIKE) for 32-bit ARMv7-A processors supporting NEON engine (i.e., SIMD instruction). Unlike previous SIKE implementations, finite field arithmetic is efficiently implemented in a redundant representation, which avoids carry propagation and pipeline stall. Furthermore, we adopted several state-of-the-art engineering techniques as well as hand-crafted assembly implementation for high performance. Optimized implementations are ported to Microsoft SIKE library written in “a non-redundant representation” and evaluated in high-end 32-bit ARMv7-A processors, such as ARM Cortex-A5, A7, and A15. A full key-exchange execution of SIKEp503 is performed in about 109 million cycles on ARM Cortex-A15 processors (i.e., 54.5 ms @2.0 GHz), which is about 1.58× faster than previous state-of-the-art work presented in CHES’18. 
    more » « less
  4. Abstract In this study, we aimed to democratize access to convolutional neural networks (CNN) for segmenting cartilage volumes, generating state‐of‐the‐art results for specialized, real‐world applications in hospitals and research. Segmentation of cross‐sectional and/or longitudinal magnetic resonance (MR) images of articular cartilage facilitates both clinical management of joint damage/disease and fundamental research. Manual delineation of such images is a time‐consuming task susceptible to high intra‐ and interoperator variability and prone to errors. Thus, enabling reliable and efficient analyses of MRIs of cartilage requires automated segmentation of cartilage volumes. Two main limitations arise in the development of hospital‐ or population‐specific deep learning (DL) models for image segmentation: specialized knowledge and specialized hardware. We present a relatively easy and accessible implementation of a DL model to automatically segment MRIs of human knees with state‐of‐the‐art accuracy. In representative examples, we trained CNN models in 6‐8 h and obtained results quantitatively comparable to state‐of‐the‐art for every anatomical structure. We established and evaluated our methods using two publicly available MRI data sets originating from the Osteoarthritis Initiative, Stryker Imorphics, and Zuse Institute Berlin (ZIB), as representative test cases. We use Google Colabfor editing and adapting the Python codes and selecting the runtime environment leveraging high‐performance graphical processing units. We designed our solution for novice users to apply to any data set with relatively few adaptations requiring only basic programming skills. To facilitate the adoption of our methods, we provide a complete guideline for using our methods and software, as well as the software tools themselves. Clinical significance: We establish and detail methods that clinical personal can apply to create their own DL models without specialized knowledge of DL nor specialized hardware/infrastructure and obtain results comparable with the state‐of‐the‐art to facilitate both clinical management of joint damage/disease and fundamental research. 
    more » « less
  5. Abstract Calibrating with detailed 2D core-collapse supernova (CCSN) simulations, we derive a simple CCSN explosion condition based solely upon the terminal density profiles of state-of-the-art stellar evolution calculations of the progenitor massive stars. This condition captures the vast majority of the behaviour of the one hundred 2D state-of-the-art models we performed to gauge its usefulness. The goal is to predict, without resort to detailed simulation, the explodability of a given massive star. We find that the simple maximum fractional ram pressure jump discriminant we define works well ∼90 per cent of the time and we speculate on the origin of the few false positives and false negatives we witness. The maximum ram pressure jump generally occurs at the time of accretion of the silicon/oxygen interface, but not always. Our results depend upon the fidelity with which the current implementation of our code F ornax adheres to Nature and issues concerning the neutrino–matter interaction, the nuclear equation of state, the possible effects of neutrino oscillations, grid resolution, the possible role of rotation and magnetic fields, and the accuracy of the numerical algorithms employed remain to be resolved. Nevertheless, the explodability condition we obtain is simple to implement, shows promise that it might be further generalized while still employing data from only the unstable Chandrasekhar progenitors, and is a more credible and robust simple explosion predictor than can currently be found in the literature. 
    more » « less