skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: WARP: On-the-fly Program Synthesis for Agile, Real-time, and Reliable Wireless Networks
Emerging Industrial Internet-of-Things systems require wireless solutions to connect sensors, actuators, and controllers as part of high data rate feedback-control loops over real-time flows. A key challenge is to provide predictable performance and agility in response to fluctuations in link quality, variable workloads, and topology changes. We propose WARP to address this challenge. WARP uses programs to specify a network’s behavior and includes a synthesis procedure to automatically generate such programs from a high-level specification of the system’s workload and topology. WARP has three unique features: (1) WARP uses a domain-specific language to specify stateful programs that include conditional statements to control when a flow’s packets are transmitted. The execution paths of programs depend on the pattern of packet losses observed at runtime, thereby enabling WARP to readily adapt to packet losses due to short-term variations in link quality. (2) Our synthesis technique uses heuristics to improve network performance by considering multiple packet loss patterns and associated execution paths when determining the transmissions performed by nodes. Furthermore, the generated programs ensure that the likelihood of a flow delivering its packets by its deadline exceeds a user-specified threshold. (3) WARP can adapt to workload and topology changes without explicitly reconstructing a network’s program based on the observation that nodes can independently synthesize the same program when they share the same workload and topology information. Simulations show that WARP improves network throughput for data collection, dissemination, and mixed workloads on two realistic topologies. Testbed experiments show that WARP reduces the time to add new flows by 5 times over a state-of-the-art centralized control plane and guarantees the real-time and reliability of all flows.  more » « less
Award ID(s):
1750155
PAR ID:
10233730
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 20th International Conference on Information Processing in Sensor Networks
Page Range / eLocation ID:
254 to 267
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Previous studies have observed that TCP pacing evenly spacing out packets-minimizes traffic burstiness, reduces packet losses, and increases throughput. However, the main drawback of pacing is that the number of flows and the bottleneck link capacity must be known in advance. With this information, pacing is achieved by manually tuning sender nodes to send at rates that aggregate to the bottleneck capacity. This paper proposes a scheme based on programmable switches by which rates are dynamically adjusted. These switches store the network's state in the data plane and notify sender nodes to update their pacing rates when the network's state changes, e.g., a new flow joins or leaves the network. The scheme uses a custom protocol that is encapsulated inside the IP Options header field and thus is compatible with legacy switches (i.e., the scheme does not require all switches to be programmable). Furthermore, the processing overhead at programmable switches is minimal, as custom packets are only generated when a flow joins or leaves the network. Simulation results conducted in Mininet demonstrate that the proposed scheme is capable of dynamically notifying hosts to adapt the pacing rate with a minimum delay, increasing throughput, mitigating the TCP sawtooth behavior, and achieving better fairness among concurrent flows. The proposed scheme and preliminary results are particularly attractive to applications such as Science DMZ, where typically a small number of large flows must share the bandwidth capacity. 
    more » « less
  2. null (Ed.)
    Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to extend packet-processing functionality in the Linux operating system. BPF allows users to write code in high-level languages (like C or Rust) and execute them at specific hooks in the kernel, such as the network device driver. To ensure safe execution of a user-developed BPF program in kernel context, Linux uses an in-kernel static checker. The checker allows a program to execute only if it can prove that the program is crash-free, always accesses memory within safe bounds, and avoids leaking kernel data. BPF programming is not easy. One, even modest-sized BPF programs are deemed too large to analyze and rejected by the kernel checker. Two, the kernel checker may incorrectly determine that a BPF program exhibits unsafe behaviors. Three, even small performance optimizations to BPF code (e.g., 5% gains) must be meticulously hand-crafted by expert developers. Traditional optimizing compilers for BPF are often inadequate since the kernel checker's safety constraints are incompatible with rule-based optimizations. We present K2, a program-synthesis-based compiler that automatically optimizes BPF bytecode with formal correctness and safety guarantees. K2 produces code with 6--26% reduced size, 1.36%--55.03% lower average packet-processing latency, and 0--4.75% higher throughput (packets per second per core) relative to the best clang-compiled program, across benchmarks drawn from Cilium, Facebook, and the Linux kernel. K2 incorporates several domain-specific techniques to make synthesis practical by accelerating equivalence-checking of BPF programs by 6 orders of magnitude. 
    more » « less
  3. null (Ed.)
    We previously proposed a method to locate high packetdelay variance links for OpenFlow networks by probing multicast measurement packets along a designed route and by collecting flow-stats of the probe packets from selected OpenFlow switches (OFSs). It is worth AQ1 noting that the packet-delay variance of a link is estimated based on arrival time intervals of probe packets without measuring delay times over the link. However, the previously used route scheme based on the shortest path tree may generate a probing route with many branches in a large network, resulting in many accesses to OFSs to locate all high delay variance links. In this paper, therefore, we apply an Eulerian cycle-based scheme which we previously developed, to control the number of branches in a multicast probing route. Our proposal can reduce the load on the control-plane (i.e., the number of accesses to OFSs) while maintaining an acceptable measurement accuracy with a light load on the data-plane. Additionally, the impacts of packet losses and correlated delays over links on those different types of loads are investigated. By comparing our proposal with the shortest path tree-based and the unicursal route schemes through numerical simulations, we evaluate the advantage of our proposal. 
    more » « less
  4. Benchmarks that closely match the behavior of production workloads are crucial to design and provision computer systems. However, current approaches fall short: First, open-source benchmarks use public datasets that cause different behavior from production workloads. Second, blackbox workload cloning techniques generate synthetic code that imitates the target workload, but the resulting program fails to capture most workload characteristics, such as microarchitectural bottlenecks or time-varying behavior. Generating code that mimics a complex application is an extremely hard problem. Instead, we propose a different and easier approach to benchmark synthesis. Our key insight is that, for many production workloads, the program is publicly available or there is a reasonably similar open-source program. In this case, generating the right dataset is sufficient to produce an accurate benchmark. Based on this observation, we present Datamime, a profile-guided approach to generate representative benchmarks for production workloads. Datamime uses the performance profiles of a target workload to generate a dataset that, when used by a benchmark program, behaves very similarly to the target workload in terms of its microarchitectural characteristics. We evaluate Datamime on several datacenter workloads. Datamime generates synthetic benchmarks that closely match the microarchitectural features of these workloads, with a mean absolute percentage error of 3.2% on IPC. Microarchitectural behavior stays close across processor types. Finally, time-varying behaviors are also replicated, making these benchmarks useful to e.g. characterize and optimize tail latency 
    more » « less
  5. Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest timestamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation. 
    more » « less