Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Long-running scientific workflows, such as tomographic data analysis pipelines, are prone to a variety of failures, including hardware and network disruptions, as well as software errors. These failures can substantially degrade performance and increase turnaround times, particularly in large-scale, geographically distributed, and time-sensitive environments like synchrotron radiation facilities. In this work, we propose and evaluate resilience strategies aimed at mitigating the impact of failures in tomographic reconstruction workflows. Specifically, we introduce an asynchronous, non-blocking checkpointing mechanism and a dynamic load redistribution technique with lazy recovery, designed to enhance workflow reliability and minimize failure-induced overheads. These approaches facilitate progress preservation, balanced load distribution, and efficient recovery in error-prone environments. To evaluate their effectiveness, we implement a 3D tomographic reconstruction pipeline and deploy it across Argonne's leadership computing infrastructure and synchrotron facilities. Our results demonstrate that the proposed resilience techniques significantly reduce failure impact—by up to 500× —while maintaining negligible overhead (<3%).more » « lessFree, publicly-accessible full text available June 6, 2026
-
Not AvailableProgrammable networks, aside from carrying out their core network functions, can look deep into the data stream and perform application layer processing. But, expect for a few demonstrations, this capability remains largely under explored and under utilized. Currently, scientific computing leverages networks only for communication and not for computation. We propose Computing in Transit to unleash the potential of network computing for scientific workflows. Specifically, we investigate computing in transit in the context of light source experiments. Researchers using light sources are interested in rare events and we intend to leverage computing in transit to solve this problem. As the compute and memory resources available within the network are scarce, we must use these resources prudently without sacrificing on performance metrics. Computing within the network can support significantly higher throughput at low latency but it may be less accurate as there are limitations to how deep a network can inspect the payload. We propose a neutralized checksum that takes in TCP checksum as an input to avoid processing the entire payload. We evaluate this approach to identify rare events by introducing random perturbations to reference frames. We measure the effectiveness of neutralized checksum to identify changes. We see that neutralized checksum identifies all changes and is a very promising approach to rare event detection.more » « lessFree, publicly-accessible full text available December 15, 2025
-
Abstract Ultrasound computed tomography (USCT) shows great promise in nondestructive evaluation and medical imaging due to its ability to quickly scan and collect data from a region of interest. However, existing approaches are a tradeoff between the accuracy of the prediction and the speed at which the data can be analyzed, and processing the collected data into a meaningful image requires both time and computational resources. We propose to develop convolutional neural networks (CNNs) to accelerate and enhance the inversion results to reveal underlying structures or abnormalities that may be located within the region of interest. For training, the ultrasonic signals were first processed using the full waveform inversion (FWI) technique for only a single iteration; the resulting image and the corresponding true model were used as the input and output, respectively. The proposed machine learning approach is based on implementing two-dimensional CNNs to find an approximate solution to the inverse problem of a partial differential equation-based model reconstruction. To alleviate the time-consuming and computationally intensive data generation process, a high-performance computing-based framework has been developed to generate the training data in parallel. At the inference stage, the acquired signals will be first processed by FWI for a single iteration; then the resulting image will be processed by a pre-trained CNN to instantaneously generate the final output image. The results showed that once trained, the CNNs can quickly generate the predicted wave speed distributions with significantly enhanced speed and accuracy.more » « less
An official website of the United States government
