Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest timestamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.
more »
« less
Simulus: Easy Breezy Simulation in Python
This paper introduces Simulus, a full-fledged open-source discrete-event simulator, supporting both event-driven and process-oriented simulation world-views. Simulus is implemented in Python and aspires to be a part of the Python's ecosystem supporting scientific computing. Simulus also provides several advanced modeling constructs to ease common simulation tasks (e.g., complex queuing models, interprocess synchronizations, and message-passing communications). Simulus also provides organic support for simultaneously running a time-synchronized group of simulators, either sequentially or in parallel, thereby allowing composable simulation of individual simulators handling different aspects of a target system, and enabling large-scale simulation running on parallel computers. This paper describes the salient features of Simulus and examines its major design decisions.
more »
« less
- Award ID(s):
- 2008000
- PAR ID:
- 10251068
- Date Published:
- Journal Name:
- 2020 Winter Simulation Conference
- Page Range / eLocation ID:
- 2329 to 2340
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Due to the increasing complexity of robot swarm algorithms, ana- lyzing their performance theoretically is often very difficult. Instead, simulators are often used to benchmark the performance of robot swarm algorithms. However, we are not aware of simulators that take advantage of the naturally highly parallel nature of distributed robot swarms. This paper presents ParSwarm, a parallel C++ frame- work for simulating robot swarms at scale on multicore machines. We demonstrate the power of ParSwarm by implementing two applications, task allocation and density estimation, and running simulations on large numbers of agents.more » « less
-
The CyberInfrastructure (CI) has been the object of intensive research and development in the last decade, resulting in a rich set of abstractions and interoperable software implementations that are used in production today for supporting ongoing and breakthrough scientific discoveries. A key challenge is the development of tools and application execution frameworks that are robust in current and emerging CI configurations, and that can anticipate the needs of upcoming CI applications. This paper presents WRENCH, a framework that enables simulation-driven engineering for evaluating and developing CI application execution frameworks. WRENCH provides a set of high-level simulation abstractions that serve as building blocks for developing custom simulators. These abstractions rely on the scalable and accurate simulation models that are provided by the SimGrid simulation framework. Consequently, WRENCH makes it possible to build, with minimum software development effort, simulators that that can accurately and scalably simulate a wide spectrum of large and complex CI scenarios. These simulators can then be used to evaluate and/or compare alternate platform, system, and algorithm designs, so as to drive the development of CI solutions for current and emerging applications.more » « less
-
Contagion dynamics on networks are used to study many problems, including disease and virus epidemics, incarceration, obesity, protests and rebellions, needle sharing in drug use, and hurricane and other natural disaster events. Simulators to study these problems range from smaller-scale serial codes to large-scale distributed systems. In recent years, Python-based simulation systems have been built. In this work, we describe a new Python-based agent-based simulator called CSonNet. It differs from codes such as Epidemics on Networks in that it performs discrete time simulations based on the graph dynamical systems formalism. CSonNet is a parallel code; it implements concurrency through an embarrassingly parallel approach of running multiple simulation instances on a user-specified number of forked processes. It has a modeling framework whereby agent models are composed using a set of pre-defined state transition rules. We provide strong-scaling performance results and case studies to illustrate its features.more » « less
-
Contagion dynamics on networks are used to study many problems, including disease and virus epidemics, incarceration, obesity, protests and rebellions, needle sharing in drug use, and hurricane and other natural disaster events. Simulators to study these problems range from smaller-scale serial codes to large-scale distributed systems. In recent years, Python based simulation systems have been built. In this work, we describe a new Python-based agent-based simulator called CSonNet. It differs from codes such as Epidemics on Networks in that it performs discrete time simulations based on the graph dynamical systems formalism. CSonNet is a parallel code; it implements concurrency through an embarrassingly parallel approach of running multiple simulation instances on a user-specified number of forked processes. It has a modeling framework whereby agent models are composed using a set of pre-defined state transition rules. We provide strong-scaling performance results and case studies to illustrate its features.more » « less
An official website of the United States government

