Parallel Discrete Event Simulation (PDES) using distributed synchronization supports the concurrent execution of discrete event simulation models on parallel processing hardware platforms. The multi-core/many-core era has provided a low latency “cluster on a chip” architecture for high-performance simulation and modeling of complex systems. A research many-core processor named the Single-Chip Cloud Computer (SCC) has been created by Intel Labs that contains some interesting opportunities for PDES research and development. The features of most interest in the SCC system are: low-latency messaging hardware, software managed cache coherence, and (user controllable) core independent dynamic frequency and voltage regulation capability. Ideally, each of these features provide interesting opportunities that can be exploited for improving the performance of PDES. This paper reports some preliminary efforts to migrate an optimistically synchronized parallel simulation kernel called WARPED to an SCC emulation system called Rock Creek Communication Environment (RCCE). The WARPED simulation kernel has been ported to the RCCE environment and several test simulation models have also been ported to the RCCE environment. Based on initial efforts, some preliminary insights on how to exploit some of the exotic features of SCC for increasing the performance of PDES applications is noted.
more »
« less
Profile driven partitioning of parallel simulation models
A considerable amount of research on parallel discrete event simulation has been conducted over the past few decades. However, most of this research has targeted the parallel simulation infrastructure; focusing on data structures, algorithms, and synchronization methods for parallel simulation kernels. Unfortunately, distributed environments often have high communication latencies that can reduce the potential performance of parallel simulations. Effective partitioning of the concurrent simulation objects of the real world models can have a large impact on the amount of network traffic necessary in the simulation, and consequently the overall performance. This paper presents our studies on profiling the characteristics of simulation models and using the collected data to perform partitioning of the models for concurrent execution. Our benchmarks show that Profile Guided Partitioning can result in dramatic performance gains in the parallel simulations. In some of the models, 5-fold improvements of the run time of the concurrently executed simulations were observed.
more »
« less
- Award ID(s):
- 0915337
- PAR ID:
- 10350991
- Date Published:
- Journal Name:
- Winter Simulation Conference
- Page Range / eLocation ID:
- 2750 to 2761
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest timestamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.more » « less
-
null (Ed.)Magnetic actuation has emerged as a powerful and versatile mechanism for diverse applications, ranging from soft robotics, biomedical devices to functional metamaterials. This highly interdisciplinary research calls for an easy to use and efficient modeling/simulation platform that can be leveraged by researchers with different backgrounds. Here we present a lattice model for hard-magnetic soft materials by partitioning the elastic deformation energy into lattice stretching and volumetric change, so-called ‘magttice’. Magnetic actuation is realized through prescribed nodal forces in magttice. We further implement the model into the framework of a large-scale atomic/molecular massively parallel simulator (LAMMPS) for highly efficient parallel simulations. The magttice is first validated by examining the deformation of ferromagnetic beam structures, and then applied to various smart structures, such as origami plates and magnetic robots. After investigating the static deformation and dynamic motion of a soft robot, the swimming of the magnetic robot in water, like jellyfish's locomotion, is further studied by coupling the magttice and lattice Boltzmann method (LBM). These examples indicate that the proposed magttice model can enable more efficient mechanical modeling and simulation for the rational design of magnetically driven smart structures.more » « less
-
This dataset contains the codes and data used in the manuscript “Influence of Subsurface Critical Zone Structure on Hydrological Partitioning in Mountainous Headwater Catchments” submitted to Geophysical Research Letters. The software requirement are summarized in requirement.txt; hydrologic modeling input data are in the folder TLnewtest2sfb2; the observation data used in the simulation are indicated as comments in the python scripts. Note that the hydrologic modeling was run in HPC (Linux system) with parallel computing. Below are the abstract of the manuscript: “Headwater catchments play a vital role in regional water supply and ecohydrology, and a quantitative understanding of the hydrological partitioning in these catchments is critically needed, particularly under a changing climate. Recent studies have highlighted the importance of subsurface critical zone (CZ) structure in modulating the partitioning of precipitation in mountainous catchments; however, few existing studies have explicitly taken into account the 3D subsurface CZ structure. In this study, we designed realistic synthetic catchment models based on seismic velocity-estimated 3D subsurface CZ structures. Integrated hydrologic modeling is then used to study the effect of the shape of the weathered bedrock bottom on various hydrologic fluxes and storages in mountainous headwater catchments. Numerical results show that the shape of the weathered bedrock bottom not only affects the magnitude but also the peak time of both streamflow and subsurface dynamic storage.”more » « less
-
In this article, we present a four-layer distributed simulation system and its adaptation to the Material Point Method (MPM). The system is built upon a performance portableC++programming model targeting major High-Performance-Computing (HPC) platforms. A key ingredient of our system is a hierarchical block-tile-cell sparse grid data structure that is distributable to an arbitrary number of Message Passing Interface (MPI) ranks. We additionally propose strategies for efficient dynamic load balance optimization to maximize the efficiency of MPI tasks. Our simulation pipeline can easily switch among backend programming models, including OpenMP and CUDA, and can be effortlessly dispatched onto supercomputers and the cloud. Finally, we construct benchmark experiments and ablation studies on supercomputers and consumer workstations in a local network to evaluate the scalability and load balancing criteria. We demonstrate massively parallel, highly scalable, and gigascale resolution MPM simulations of up to 1.01 billion particles for less than 323.25 seconds per frame with 8 OpenSSH-connected workstations.more » « less
An official website of the United States government

