NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimal Oblivious Routing With Concave Objectives for Structured Networks

https://doi.org/10.1109/TNET.2023.3264632

Chitavisutthivong, Kanatip; Supittayapornpong, Sucha; Namyar, Pooria; Zhang, Mingyang; Yu, Minlan; Govindan, Ramesh (December 2023, IEEE/ACM Transactions on Networking)

Full Text Available
Optimal Oblivious Routing for Structured Networks

https://doi.org/10.1109/INFOCOM48880.2022.9796682

Supittayapornpong, Sucha; Namyar, Pooria; Zhang, Mingyang; Yu, Minlan; Govindan, Ramesh (May 2022, IEEE International Conference on Computer Communications)

Full Text Available
A Throughput-Centric View of the Performance of Datacenter Topologies

Namyar, Pooria; Supittayapornpong, Sucha; Zhang, Mingyang; Yu, Minlan; Govindan, Ramesh (August 2021, ACM SIGCOMM 2021 Conference)
null (Ed.)
While prior work has explored many proposed datacenter designs, only two designs, Clos-based and expander-based, are generally considered practical because they can scale using commodity switching chips. Prior work has used two diferent metrics, bisection band- width and throughput, for evaluating these topologies at scale. Little is known, theoretically or practically, how these metrics relate to each other. Exploiting characteristics of these topologies, we prove an upper bound on their throughput, then show that this upper bound better estimates worst-case throughput than all previously proposed throughput estimators and scales better than most of them. Using this upper bound, we show that for expander-based topologies, unlike Clos, beyond a certain size of the network, no topology can have full throughput, even if it has full bisection band- width; in fact, even relatively small expander-based topologies fail to achieve full throughput. We conclude by showing that using throughput to evaluate datacenter performance instead of bisection bandwidth can alter conclusions in prior work about datacenter cost, manageability, and reliability.
more » « less
Full Text Available
K2: Reading Quickly from Storage Across Many Datacenters

Ngo, Khiem; Lu, Haonan; Lloyd, Wyatt (June 2021, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN))
null (Ed.)
The infrastructure available to large-scale and medium-scale web services now spans dozens of geographically dispersed datacenters. Deploying across many datacenters has the potential to significantly reduce end-user latency by serving users nearer their location. However, deploying across many datacenters requires the backend storage system be partially replicated. In turn, this can sacrifice the low latency benefits of many datacenters, especially when a storage system provides guarantees on what operations will observe. We present the K2 storage system that provides lower latency for large-scale and medium-scale web services using partial replication of data over many datacenters with strong guarantees: causal consistency, read-only transactions, and write-only transactions. K2 provides the best possible worst-case latency for partial replication, a single round trip to remote datacenters, and often avoids sending any requests to far away datacenters using a novel replication approach, write-only transaction algorithm, and read-only transaction algorithm.
more » « less
Full Text Available
Tolerating Slowdowns in Replicated State Machines using Copilots

Ngo, Khiem; Sen, Siddhartha; Lloyd, Wyatt (November 2020, USENIX Symposium on Operating Systems Design and Implementation)
null (Ed.)
Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.
more » « less
Full Text Available
Tolerating Slowdowns in Replicated State Machines using Copilots

Khiem Ngo, Siddhartha Sen (November 2020, 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’20))
null (Ed.)
Replicated state machines are linearizable, fault-tolerant groups of replicas that are coordinated using a consensus algorithm. Copilot replication is the first 1-slowdown-tolerant consensus protocol: it delivers normal latency despite the slowdown of any 1 replica. Copilot uses two distinguished replicas—the pilot and copilot—to proactively add redundancy to all stages of processing a client’s command. Copilot uses dependencies and deduplication to resolve potentially differing orderings proposed by the pilots. To avoid dependencies leading to either pilot being able to slow down the group, Copilot uses fast takeovers that allow a fast pilot to complete the ongoing work of a slow pilot. Copilot includes two optimizations—ping-pong batching and null dependency elimination—that improve its performance when there are 0 and 1 slow pilots respectively. Our evaluation of Copilot shows its performance is lower but competitive with Multi-Paxos and EPaxos when no replicas are slow. When a replica is slow, Copilot is the only protocol that avoids high latencies.
more » « less
Full Text Available
Understanding lifecycle management complexity of datacenter topologies

Zhang, Mingyang; Mysore, Radhika Niranjan; Supittayapornpong, Sucha; Govindan, Ramesh (February 2019, 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI))

Most recent datacenter topology designs have focused on performance properties such as latency and throughput. In this paper, we explore a new dimension, life cycle management complexity, which attempts to understand the complexity of deploying a topology and expanding it. By analyzing current practice in lifecycle management, we devise complexity metrics for lifecycle management, and show that existing topology classes have low lifecycle management complexity by some measures, but not by others. Motivated by this, we design a new class of topologies, FatClique, that, while being performance-equivalent to existing topologies, is comparable to, or better than them by all our lifecycle management complexity metrics.
more » « less
Full Text Available

Search for: All records