NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

NetVRM: Virtual Register Memory for Programmable Networks

Zhu, Hang; Wang, Tao; Hong, Yi; Ports, Dan R.; Sivaraman, Anirudh; Jin, Xin (April 2022, 19th USENIX Symposium on Networked Systems Design and Implementation)

Programmable networks are enabling a new class of applications that leverage the line-rate processing capability and on-chip register memory of the switch data plane. Yet the status quo is focused on developing approaches that share the register memory statically. We present NetVRM, a network management system that supports dynamic register memory sharing between multiple concurrent applications on a programmable network and is readily deployable on commodity programmable switches. NetVRM provides a virtual register memory abstraction that enables applications to share the register memory in the data plane, and abstracts away the underlying details. In principle, NetVRM supports any memory allocation algorithm given the virtual register memory abstraction. It also provides a default memory allocation algorithm that exploits the observation that applications have diminishing returns on additional memory. NetVRM provides an extension of P4, P4VRM, for developing applications with virtual register memory, and a compiler to generate data plane programs and control plane APIs. Testbed experiments show that NetVRM generalizes to a diverse variety of applications, and that its utility-based dynamic allocation policy outperforms static resource allocation. Specifically, it improves the mean satisfaction ratio (i.e., the fraction of a network application’s lifetime that it meets its utility target) by 1.6–2.2× under a range of workloads.
more » « less
Full Text Available
NetVRM: Virtual Register Memory for Programmable Networks

Zhu, Hang; Wang, Tao; Hong, Yi; Ports, Dan R.K.; Sivaraman, Anirudh; Jin, Xin (January 2022, NSDI)

Full Text Available
Network planning with deep reinforcement learning

https://doi.org/10.1145/3452296.3472902

Zhu, Hang; Gupta, Varun; Ahuja, Satyajeet Singh; Tian, Yuandong; Zhang, Ying; Jin, Xin (August 2021, Proceedings of the 2021 ACM SIGCOMM 2021 Conference)

Network planning is critical to the performance, reliability and cost of web services. This problem is typically formulated as an Integer Linear Programming (ILP) problem. Today's practice relies on hand-tuned heuristics from human experts to address the scalability challenge of ILP solvers. In this paper, we propose NeuroPlan, a deep reinforcement learning (RL) approach to solve the network planning problem. This problem involves multi-step decision making and cost minimization, which can be naturally cast as a deep RL problem. We develop two important domain-specific techniques. First, we use a graph neural network (GNN) and a novel domain-specific node-link transformation for state encoding, in order to handle the dynamic nature of the evolving network topology during planning decision making. Second, we leverage a two-stage hybrid approach that first uses deep RL to prune the search space and then uses an ILP solver to find the optimal solution. This approach resembles today's practice, but avoids human experts with an RL agent in the first stage. Evaluation on real topologies and setups from large production networks demonstrates that NeuroPlan scales to large topologies beyond the capability of ILP solvers, and reduces the cost by up to 17% compared to hand-tuned heuristics.
more » « less
Full Text Available
Runtime Recovery of Web Applications under Zero-Day ReDoS Attacks

https://doi.org/10.1109/SP40001.2021.00077

Bai, Zhihao; Wang, Ke; Zhu, Hang; Cao, Yinzhi; Jin, Xin (May 2021, 2021 IEEE Symposium on Security and Privacy (SP))

Regular expression denial of service (ReDoS)— which exploits the super-linear running time of matching regular expressions against carefully crafted inputs—is an emerging class of DoS attacks to web services. One challenging question for a victim web service under ReDoS attacks is how to quickly recover its normal operation after ReDoS attacks, especially these zero-day ones exploiting previously unknown vulnerabilities.In this paper, we present RegexNet, the first payload-based, automated, reactive ReDoS recovery system for web services. RegexNet adopts a learning model, which is updated constantly in a feedback loop during runtime, to classify payloads of upcoming requests including the request contents and database query responses. If detected as a cause leading to ReDoS, RegexNet migrates those requests to a sandbox and isolates their execution for a fast, first-measure recovery.We have implemented a RegexNet prototype and integrated it with HAProxy and Node.js. Evaluation results show that RegexNet is effective in recovering the performance of web services against zero-day ReDoS attacks, responsive on reacting to attacks in sub-minute, and resilient to different ReDoS attack types including adaptive ones that are designed to evade RegexNet on purpose.
more » « less
Full Text Available
RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers

Zhu, Hang; Kaffes, Kostis; Chen, Zixu; Liu, Zhenming; Kozyrakis, Christos; Stoica, Ion (January 2021, OSDI 2021)
null (Ed.)
Full Text Available
RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers

Zhu, Hang; Kaffes, Kostis; Chen, Zixu; Liu, Zhenming; Kozyrakis, Christos; Stoica, Ion; Jin, Xin (November 2020, 14th USENIX Symposium on Operating Systems Design and Implementation)
null (Ed.)
Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k- choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.
more » « less
Full Text Available
Multitenancy for Fast and Programmable Networks in the Cloud

Wang, Tao; Zhu, Hang; Ruffy, Fabian; Jin, Xin; Sivaraman, Anirudh; Ports, Dan RK; Panda, Aurojit (July 2020, USENIX Workshop on Hot Topics in Cloud Computing (HotCloud))

Full Text Available
Neural packet classification

https://doi.org/10.1145/3341302.3342221

Liang, Eric; Zhu, Hang; Jin, Xin; Stoica, Ion (August 2019, ACM SIGCOMM)

Full Text Available
Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection

https://doi.org/10.14778/3368289.3368301

Zhu, Hang; Bai, Zhihao; Li, Jialin; Michael, Ellis; Ports, Dan RK; Stoica, Ion; Jin, Xin (November 2019, Proceedings of the VLDB Endowment)

Full Text Available

Search for: All records