Short time-to-localize and time-to-fix for production bugs is extremely important for any 24x7 service-oriented application (SOA). Debugging buggy behavior in deployed applications is hard, as it requires careful reproduction of a similar environment and workload. Prior approaches for automatically reproducing production failures do not scale to large SOA systems. Our key insight is that for many failures in SOA systems (e.g., many semantic and performance bugs), a failure can automatically be reproduced solely by relaying network packets to replicas of suspect services, an insight that we validated through a manual study of 16 real bugs across five different systems. This paper presents Parikshan, an application monitoring framework that leverages user-space virtualization and network proxy technologies to provide a sandbox “debug” environment. In this “debug” environment, developers are free to attach debuggers and analysis tools without impacting performance or correctness of the production environment. In comparison to existing monitoring solutions that can slow down production applications, Parikshan allows application monitoring at significantly lower overhead.
more »
« less
Toward flexible auditing for in-network functionality
Networks today increasingly support in-network functionality via network function virtualization (NFV) or similar technologies. Such approaches enable a wide range of functionality to be deployed on behalf of end systems, such as offloading Tor services, enforcing network usage policies on encrypted traffic, or new functionality in 5G. An important open problem with such approaches is auditing. Namely, such services rely on third-party network providers to faithfully deploy and run their functionality as intended, but often have little to no insight as to whether providers do so. To address this problem, prior work provides point solutions such as verifiable routing with per-packet overhead, or audits of security practices; however, these approaches are not flexible---they are limited to auditing a small set of functionality and do not allow trade-offs between auditing coverage and overhead. In this paper, we propose NFAudit, which allows auditing of deployed NFs with a flexible approach where a wide range of important properties can be audited with configurable, low overhead. Our key insight is that the design of simple, composable, and flexible auditing primitives, combined with limited trust (in the form of secure enclaves) can permit a wide range of auditing functionality and configurable---and often low---cost.
more »
« less
- Award ID(s):
- 1750253
- PAR ID:
- 10429705
- Date Published:
- Journal Name:
- CoNEXT-SW '22: Proceedings of the 3rd International CoNEXT Student Workshop
- Page Range / eLocation ID:
- 32 to 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Privacy technologies support the provision of online services while protecting user privacy. Cryptography lies at the heart of many such technologies, creating remarkable possibilities in terms of functionality while offering robust guarantees of data confidential- ity. The cryptography literature and discourse often represent that these technologies eliminate the need to trust service providers, i.e., they enable users to protect their privacy even against untrusted service providers. Despite their apparent promise, privacy technolo- gies have seen limited adoption in practice, and the most successful ones have been implemented by the very service providers these technologies purportedly protect users from. The adoption of privacy technologies by supposedly adversarial service providers highlights a mismatch between traditional models of trust in cryptography and the trust relationships that underlie deployed technologies in practice. Yet this mismatch, while well known to the cryptography and privacy communities, remains rela- tively poorly documented and examined in the academic literature— let alone broader media. This paper aims to fill that gap. Firstly, we review how the deployment of cryptographic tech- nologies relies on a chain of trust relationships embedded in the modern computing ecosystem, from the development of software to the provision of online services, that is not fully captured by tra- ditional models of trust in cryptography. Secondly, we turn to two case studies—web search and encrypted messaging—to illustrate how, rather than removing trust in service providers, cryptographic privacy technologies shift trust to a broader community of secu- rity and privacy experts and others, which in turn enables service providers to implicitly build and reinforce their trust relationship with users. Finally, concluding that the trust models inherent in the traditional cryptographic paradigm elide certain key trust relation- ships underlying deployed cryptographic systems, we highlight the need for organizational, policy, and legal safeguards to address that mismatch, and suggest some directions for future work.more » « less
-
null (Ed.)Edge data centers are an appealing place for telecommunication providers to offer in-network processing such as VPN services, security monitoring, and 5G. Placing these network services closer to users can reduce latency and core network bandwidth, but the deployment of network functions at the edge poses several important challenges. Edge data centers have limited resource capacity, yet network functions are re-source intensive with strict performance requirements. Replicating services at the edge is needed to meet demand, but balancing the load across multiple servers can be challenging due to diverse service costs, server and flow heterogeneity, and dynamic workload conditions. In this paper, we design and implement a model-based load balancer EdgeBalance for edge network data planes. EdgeBalance predicts the CPU demand of incoming traffic and adaptively distributes flows to servers to keep them evenly balanced. We overcome several challenges specific to network processing at the edge to improve throughput and latency over static load balancing and monitoring-based approaches.more » « less
-
Wi-Fi is an integral part of today's Internet infrastructure, enabling a diverse range of applications and services. Prior approaches to Wi-Fi resource allocation optimized Quality of Service (QoS) metrics, which often do not accurately reflect the user's Quality of Experience (QoE). To address the gap between QoS and QoE, we introduce Maestro, an adaptive method that formulates the Wi-Fi resource allocation problem as a partially observable Markov decision process (PO-MDP) to maximize the overall system QoE and QoE fairness. Maestro estimates QoE without using any application or client data; instead, it treats them as black boxes and leverages temporal dependencies in network telemetry data. Maestro dynamically adjusts policies to handle different classes of applications and variable network conditions. Additionally, Maestro uses a simulation environment for practical training. We evaluate Maestro in an enterprise-level Wi-Fi testbed with a variety of applications, and find that Maestro achieves up to 25× and 78% improvement in QoE and fairness, respectively, compared to the widely-deployed Wi-Fi Multimedia (WMM) policy. Compared to the state-of-the-art learning approach QFlow, Maestro increases QoE by up to 69%. Unlike QFlow which requires modifications to clients, we demonstrate that Maestro improves QoE of popular over-the-top services with unseen traffic without control over clients or servers.more » « less
-
The emergence of big data has created new challenges for researchers transmitting big data sets across campus networks to local (HPC) cloud resources, or over wide area networks to public cloud services. Unlike conventional HPC systems where the network is carefully architected (e.g., a high speed local interconnect, or a wide area connection between Data Transfer Nodes), today's big data communication often occurs over shared network infrastructures with many external and uncontrolled factors influencing performance. This paper describes our efforts to understand and characterize the performance of various big data transfer tools such as rclone, cyberduck, and other provider-specific CLI tools when moving data to/from public and private cloud resources. We analyze the various parameter settings available on each of these tools and their impact on performance. Our experimental results give insights into the performance of cloud providers and transfer tools, and provide guidance for parameter settings when using cloud transfer tools. We also explore performance when coming from HPC DTN nodes as well as researcher machines located deep in the campus network, and show that emerging SDN approaches such as the VIP Lanes system can deliver excellent performance even from researchers' machines.more » « less
An official website of the United States government

