skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-World, Self-Hosted Kubernetes Experience
Containerized applications have exploded in popularity in recent years, due to their ease of deployment, reproducible nature, and speed of startup. Accordingly, container orchestration tools such as Kubernetes have emerged as resource providers and users alike try to organize and scale their work across clusters of systems. This paper documents some real-world experiences of building, operating, and using self-hosted Kubernetes Linux clusters. It aims at comparisons between Kubernetes and single-node container solutions and traditional multi-user, batch queue Linux clusters. The authors of this paper have background experience first running traditional HPC Linux clusters and queuing systems like Slurm, and later virtual machines using technologies such as Openstack. Much of the experience and perspective below is informed by this perspective. We will also provide a use-case from a researcher who deployed on Kubernetes without being as opinionated about other potential choices.  more » « less
Award ID(s):
1838901
PAR ID:
10300448
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
PEARC '21: Practice and Experience in Advanced Research Computing
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mauro Pezzè (Ed.)
    Context: Kubernetes has emerged as the de-facto tool for automated container orchestration. Business and government organizations are increasingly adopting Kubernetes for automated software deployments. Kubernetes is being used to provision applications in a wide range of domains, such as time series forecasting, edge computing, and high performance computing. Due to such a pervasive presence, Kubernetes-related security misconfigurations can cause large-scale security breaches. Thus, a systematic analysis of security misconfigurations in Kubernetes manifests, i.e., configuration files used for Kubernetes, can help practitioners secure their Kubernetes clusters. Objective: The goal of this paper is to help practitioners secure their Kubernetes clusters by identifying security misconfigurations that occur in Kubernetes manifests . Methodology: We conduct an empirical study with 2,039 Kubernetes manifests mined from 92 open-source software repositories to systematically characterize security misconfigurations in Kubernetes manifests. We also construct a static analysis tool called Security Linter for Kubernetes Manifests (SLI-KUBE) to quantify the frequency of the identified security misconfigurations. Results: In all, we identify 11 categories of security misconfigurations, such as absent resource limit, absent securityContext, and activation of hostIPC. Specifically, we identify 1,051 security misconfigurations in 2,039 manifests. We also observe the identified security misconfigurations affect entities that perform mesh-related load balancing, as well as provision pods and stateful applications. Furthermore, practitioners agreed to fix 60% of 10 misconfigurations reported by us. Conclusion: Our empirical study shows Kubernetes manifests to include security misconfigurations, which necessitates security-focused code reviews and application of static analysis when Kubernetes manifests are developed. 
    more » « less
  2. Container isolation is implemented through OS-level virtualization, such as Linux namespaces. Unfortunately, these mechanisms are extremely challenging to implement correctly and, in practice, suffer from functional interference bugs, which compromise container security. In particular, functional interference bugs allow an attacker to extract information from another container running on the same machine or impact its integrity by modifying kernel resources that are incorrectly isolated. Despite their impact, functional interference bugs in OS-level virtualization have received limited attention in part due to the challenges in detecting them. Instead of causing memory errors or crashes, many functional interference bugs involve hard-to-catch logic errors that silently produce semantically incorrect results. This paper proposes KIT, a dynamic testing framework that discovers functional interference bugs in OS-level virtualization mechanisms, such as Linux namespaces. The key idea of KIT is to detect inter-container functional interference by comparing the system call traces of a container across two executions, where it runs with and without the preceding execution of another container. To achieve high efficiency and accuracy, KIT includes two critical components: an efficient algorithm to generate test cases that exercise inter-container data flows and a system call trace analysis framework that detects functional interference bugs and clusters bug reports. KIT discovered 9 functional interference bugs in Linux kernel 5.13, of which 6 have been confirmed. All bugs are caused by logic errors, showing that this approach is able to detect hard-to-catch semantic bugs. 
    more » « less
  3. Containerization is becoming increasingly popular, but unfortunately, containers often fail to deliver the anticipated performance with the allocated resources. In this paper, we first demonstrate the performance variance and degradation are significant (by up to 5x) in a multi-tenant environment where containers are co-located. We then investigate the root cause of such performance degradation. Contrary to the common belief that such degradation is caused by resource contention and interference, we find that there is a gap between the amount of CPU a container reserves and actually gets. The root cause lies in the design choices of today's Linux scheduling mechanism, which we call Forced Runqueue Sharing and Phantom CPU Time. In fact, there are fundamental conflicts between the need to reserve CPU resources and Completely Fair Scheduler's work-conserving nature, and this contradiction prevents a container from fully utilizing its requested CPU resources. As a proof-of-concept, we implement a new resource configuration mechanism atop the widely used Kubernetes and Linux to demonstrate its potential benefits and shed light on future scheduler redesign. Our proof-of-concept, compared to the existing scheduler, improves the performance of both batch and interactive containerized apps by up to 5.6x and 13.7x. 
    more » « less
  4. Generative artificial intelligence (AI) technologies, such as ChatGPT have shown promise in solving software engineering problems. However, these technologies have also shown to be susceptible to generating software artifacts that contain quality issues. A systematic characterization of quality issues, such as smells in ChatGPT-generated artifacts can help in providing recommendations for practitioners who use generative AI for container orchestration.We conduct an empirical study with 98 Kubernetes manifests to quantify smells in manifests generated by ChatGPT. Our empirical study shows: (i) 35.8% of the 98 Kubernetes manifests generated include at least one instance of smell; (ii) two types of objects Kubernetes namely, Deployment and Service are impacted by identified smells; and (iii) the most frequently occurring smell is unset CPU and memory requirements. Based on our findings, we recommend practitioners to apply quality assurance activities for ChatGPT-generated Kubernetes manifests prior to using these manifests for container orchestration. 
    more » « less
  5. While Kubernetes enables practitioners to rapidly deploy their software and perform container orchestration efficiently, security of the Kubernetes-based deployment infrastructure is a concern for industry practitioners. A systematic understanding of how dynamic analysis can be used for securing Kubernetes deployments can aid practitioners in securing their Kubernetes deployments. We present an experience report, where we describe empirical findings from three dynamic application security testing (DAST) tools on a Kubernetes deployment used by 'Company-Z'. From our empirical study, we find (i) 3,442 recommended security configurations are violated in 'Company-Z's' Kubernetes deployment; and (ii) of the three studied DAST tools, Kubescape and Kubebench provide the highest support with respect to detecting 14 types of recommended security configurations. Based on our findings, we recommend practitioners to apply DAST tools for their Kubernetes deployments, and security researchers to investigate how to detect configuration violations dynamically in the Kubernetes deployment. 
    more » « less