Search for: All records

Creators/Authors contains: "Alizadeh, Mohammad"

« Prev Next »

Total Resources

28

Resource Type
Conference Paper

19

Conference Proceeding

3

Dataset

0

Journal Article

6

Workshop Report

0

Availability
Full Text / Resource Available

26

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Counterfactual Identifiability of Bijective Causal Models

Nasr-Esfahany, Arash ; Alizadeh, Mohammad ; Shah, Devavrat ( July 2023 , Proceedings of the 40th International Conference on Machine Learning)

Free, publicly-accessible full text available July 24, 2024
Counterfactual Identifiability of Bijective Causal Models

Nasr-Esfahany, Arash ; Alizadeh, Mohammad ; Shah, Devavrat ( July 2023 , Proceedings of the 40th International Conference on Machine Learning)

Free, publicly-accessible full text available July 24, 2024
Scalable Tail Latency Estimation for Data Center Networks

Zhao, Kevin ; Goyal, Prateesh ; Alizadeh, Mohammad ; Anderson, Thomas E. ( April 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

In this paper, we consider how to provide fast estimates of flow-level tail latency performance for very large scale data center networks. Network tail latency is often a crucial metric for cloud application performance that can be affected by a wide variety of factors, including network load, inter-rack traffic skew, traffic burstiness, flow size distributions, oversubscription, and topology asymmetry. Network simulators such as ns-3 and OMNeT++ can provide accurate answers, but are very hard to parallelize, taking hours or days to answer what if questions for a single configuration at even moderate scale. Recent work with MimicNet has shown how to use machine learning to improve simulation performance, but at a cost of including a long training step per configuration, and with assumptions about workload and topology uniformity that typically do not hold in practice. We address this gap by developing a set of techniques to provide fast performance estimates for large scale networks with general traffic matrices and topologies. A key step is to decompose the problem into a large number of parallel independent single-link simulations; we carefully combine these link-level simulations to produce accurate estimates of end-to-end flow level performance distributions for the entire network. Like MimicNet, we exploit symmetry where possible to gain additional speedups, but without relying on machine learning, so there is no training delay. On a large-scale net- work where ns-3 takes 11 to 27 hours to simulate five seconds of network behavior, our techniques run in one to two minutes with accuracy within 9% for tail flow completion times.
more » « less
Full Text Available
CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

Alomar, Abdullah ; Hamadanian, Pouya ; Nasr-Esfahany, Arash ; Agarwal, Anish ; Alizadeh, Mohammad ; Shah, Devavrat ( April 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))
CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

Alomar, Abdullah ; Hamadanian, Pouya ; Nasr-Esfahany, Arash ; Agarwal, Anish ; Alizadeh, Mohammad ; Shah, Devavrat ( April 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad ; Ananthanarayanan, Ganesh ; Hsieh, Kevin ; Jiang, Junchen ; Netravali, Ravi ; Shu, Yuanchao ; Alizadeh, Mohammad ; Bahl, Victor ( April 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad ; Ananthanarayanan, Ganesh ; Hsieh, Kevin ; Jiang, Junchen ; Netravali, Ravi ; Shu, Yuanchao ; Alizadeh, Mohammad ; Bahl, Victor ( April 2023 , USENIX Association)

Full Text Available
RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics

Khani, Mehrdad ; Ananthanarayan, Ganesh ; Hsieh, Kevin ; Jiang, Junchen ; Netravali, Ravi ; Shu, Yuanchao ; Alizadeh, Mohammad ; Bahl, Victor ( April 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation)

Full Text Available
Protego: Overload Control for Applications with Unpredictable Lock Contention

Cho, Inho ; Saeed, Ahmed ; Park, Seo Jin ; Alizadeh, Mohammad ; Belay, Adam ( January 2023 , 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Modern datacenter applications are concurrent, so they require synchronization to control access to shared data. Requests can contend for different combinations of locks, depending on application and request state. In this paper, we show that locks, especially blocking synchronization, can squander throughput and harm tail latency, even when the CPU is underutilized. Moreover, the presence of a large number of contention points, and the unpredictability in knowing which locks a request will require, make it difficult to prevent contention through overload control using traditional signals such as queueing delay and CPU utilization. We present Protego, a system that resolves these problems with two key ideas. First, it contributes a new admission control strategy that prevents compute congestion in the presence of lock contention. The key idea is to use marginal improvements in observed throughput, rather than CPU load or latency measurements, within a credit-based admission control algorithm that regulates the rate of incoming requests to a server. Second, it introduces a new latency-aware synchronization abstraction called Active Synchronization Queue Management (ASQM) that allows applications to abort requests if delays exceed latency objectives. We apply Protego to two real-world applications, Lucene and Memcached, and show that it achieves up to 3.3x more goodput and 12.2x lower 99th percentile latency than the state-of-the-art overload control systems while avoiding congestion collapse.
more » « less
Full Text Available
Protego: Overload Control for Applications with Unpredictable Lock Contention.

Cho, Inho ; Saeed, Ahmed ; Park, Seo Jin ; Alizadeh, Mohammad ; and Belay, Adam ( January 2023 , USENIX Symposium on Networked Systems Design and Implementation (NSDI))
Balakrishnan, Mahesh ; Ghobadi, Manya (Ed.)
Modern datacenter applications are concurrent, so they require synchronization to control access to shared data. Requests can contend for different combinations of locks, depending on application and request state. In this paper, we show that locks, especially blocking synchronization, can squander throughput and harm tail latency, even when the CPU is underutilized. Moreover, the presence of a large number of contention points, and the unpredictability in knowing which locks a request will require, make it difficult to prevent contention through overload control using traditional signals such as queueing delay and CPU utilization. We present Protego, a system that resolves these problems with two key ideas. First, it contributes a new admission control strategy that prevents compute congestion in the presence of lock contention. The key idea is to use marginal improvements in observed throughput, rather than CPU load or latency measurements, within a credit-based admission control algorithm that regulates the rate of incoming requests to a server. Second, it introduces a new latency-aware synchronization abstraction called Active Synchronization Queue Management (ASQM) that allows applications to abort requests if delays exceed latency objectives. We apply Protego to two real-world applications, Lucene and Memcached, and show that it achieves up to 3.3x more goodput and 12.2x lower 99th percentile latency than the state-of-the-art overload control systems while avoiding congestion collapse.
more » « less
Full Text Available

« Prev Next »