X-Containers: Breaking Down Barriers to Improve Performance and Isolation of Cloud-Native Containers
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- InProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems 2019
- Page Range or eLocation-ID:
- 121 to 135
- Sponsoring Org:
- National Science Foundation
More Like this
Doglioni, C. ; Kim, D. ; Stewart, G.A. ; Silvestris, L. ; Jackson, P. ; Kamleh, W. (Ed.)The University of California system maintains excellent networking between its campuses and a number of other Universities in California, including Caltech, most of them being connected at 100 Gbps. UCSD and Caltech Tier2 centers have joined their disk systems into a single logical caching system, with worker nodes from both sites accessing data from disks at either site. This successful setup has been in place for the last two years. However, coherently managing nodes at multiple physical locations is not trivial and requires an update on the operations model used. The Pacific Research Platform (PRP) provides Kubernetes resource pool spanning resources in the science demilitarized zones (DMZs) in several campuses in California and worldwide. We show how we migrated the XCache services from bare-metal deployments into containers using the PRP cluster. This paper presents the reasoning behind our hardware decisions and the experience in migrating to and operating in a mixed environment.
GenApp is an NSF-funded framework for rapid generation of applications including feature rich science gateways. GenApp is being successfully used to produce science gateways wrapping scientific programs. Its organization is designed to simplify the process of adding new features and capabilities to generated applications. A limited set of definition files define application generation. To bring a new executable into GenApp, one creates a single "module" definition file. The executable must run on some compute resource accessible by the generated application. Installations of the executable on target resources may be complex. To simplify portability of execution, we introduce automatic containerization of defined modules and integration of container execution. Abaco is an NSF-funded web service and distributed computing platform providing functions-as-a-service (FaaS) to the research computing community. Abaco implements functions using the Actor Model of concurrent computation. We introduce GenApp integration of execution with Abaco as a resource.
Application containers, such as those provided by Docker, have recently gained popularity as a solution for agile and seamless software deployment. These light-weight virtualization environments run applications that are packed together with their resources and configuration information, and thus can be deployed across various software platforms. Unfortunately, the ease with which containers can be created is oftentimes a double-edged sword, encouraging the packaging of logically distinct applications, and the inclusion of significant amount of unnecessary components, within a single container. These practices needlessly increase the container size - sometimes by orders of magnitude. They also decrease the overall security, as each included component - necessary or not - may bring in security issues of its own, and there is no isolation between multiple applications packaged within the same container image. We propose algorithms and a tool called Cimplifier, which address these concerns: given a container and simple user-defined constraints, our tool partitions it into simpler containers, which (i) are isolated from each other, only communicating as necessary, and (ii) only include enough resources to perform their functionality. Our evaluation on real-world containers demonstrates that Cimplifier preserves the original functionality, leads to reduction in image size of up to 95%, andmore »
Server systems with large amounts of physical memory can benefit from using some of the available memory capacity for in-memory snapshots of the ongoing computations. In-memory snapshots are useful for services such as scaling of new workload instances, debugging, during scheduling, etc., which do not require snapshot persistence across node crashes/reboots. Since increasingly more frequently servers run containerized workloads, using technologies such as Docker, the snapshot, and the subsequent snapshot restore mechanisms, would be applied at granularity of containers. However, CRIU, the current approach to snapshot/restore containers, suffers from expensive filesystem write/read operations on image files containing memory pages, which dominate the runtime costs and impact the potential benefits of manipulating in-memory process state. In this paper, we demonstrate that these overheads can be eliminated by using MVAS -- kernel support for multiple independent virtual address spaces (VAS), designed specifically for machines with large memory capacities. The resulting VAS-CRIU stores application memory as a separate snapshot address space in DRAM and avoids costly file system operations. This accelerates the snapshot/restore of address spaces by two orders of magnitude, resulting in an overall reduction in snapshot time by up to 10× and restore time by up to 9×. We demonstrate themore »