skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Insight from a Docker Container Introspection
Large-scale adoption of virtual containers has stimulated concerns by practitioners and academics about the viability of data acquisition and reliability due to the decreasing window to gather relevant data points. These concerns prompted the idea that introspection tools, which are able to acquire data from a system as it is running, can be utilized as both an early warning system to protect that system and as a data capture system that collects data that would be valuable from a digital forensic perspective. An exploratory case study was conducted utilizing a Docker engine and Prometheus as the introspection tool. The research contribution of this research is two-fold. First, it provides empirical support for the idea that introspection tools can be utilized to ascertain differences between pristine and infected containers. Second, it provides the ground work for future research conducting an analysis of large-scale containerized applications in a virtual cloud.  more » « less
Award ID(s):
1726069
PAR ID:
10093193
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Hawaii International Conference on System Sciences 2019
Page Range / eLocation ID:
7194-7203
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Developments in virtual containers, especially in the cloud infrastructure, have led to diversification of jobs that containers are being used to support, particularly in the big data and machine learning spaces. The diversification has been powered by the adoption of orchestration systems that marshal fleets of containers to accomplish complex programming tasks. The additional components in the vertical technology stack, plus the continued horizontal scaling have led to questions regarding how to forensically analyze complicated technology stacks. This paper proposed a solution through the use of introspection. An exploratory case study has been conducted on a bare-metal cloud that utilizes Kubernetes, the introspection tool Prometheus, and Apache Spark. The contribution of this research is two-fold. First, it provides empirical support that introspection tools can acquire forensically viable data from different levels of a technology stack. Second, it provides the ground work for comparisons between different virtual container platforms. 
    more » « less
  2. Integrated modeling of vehicle, tire and terrain is a fundamental challenge to be addressed for off-road autonomous navigation. The complexities arise due to lack of tools and techniques to predict the continuously varying terrain and environmental conditions and the resultant non-linearities. The solution to this challenge can now be found in the plethora of data driven modeling and control techniques that have gained traction in the last decade. Data driven modeling and control techniques rely on the system’s repeated interaction with the environment to generate a lot of data and then use a function approximator to fit a model for the physical system with the data. Getting good quality and quantity of data may involve extensive experimentation with the physical system impacting developer’s resource. The process is computationally expensive, and the overhead time required is high.High-fidelity simulators coupled with cloud-based containers can help ease the challenge of data ‘quality’ and ‘quantity’. Project Chrono is a multi-physics simulation engine that provides high-fidelity simulation capabilities with emphasis on flow and terrain modeling. With a host of libraries and APIs for industry accepted tools like MATLAB, Simulink and TensorFlow, Project Chrono proves to be a powerful research bed for data-driven modeling and control development for off-road navigation. Containers are lightweight virtual machines that take away repetitive configurations by setting up a computational environment, including all necessary dependencies and libraries. Docker encapsulates an end-to-end platform solution for heavy computation challenges of deep learning applications and allows fast development and testing. The synergy between the high-fidelity simulator and the compute outsourcing capabilities of cloud-based containers proves to be extremely beneficial for continuous integration and continuous deployment (CI/CD) for data driven modeling and control tasks. In the following work, we containerize a high-fidelity simulator (Project Chrono) to develop and validate data driven modeling and control algorithms for off-road autonomous navigation. 
    more » « less
  3. null (Ed.)
    Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets. 
    more » « less
  4. The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation. 
    more » « less
  5. As data-driven methods are becoming pervasive in a wide variety of disciplines, there is an urgent need to develop scalable and sustainable tools to simplify the process of data science, to make it easier to keep track of the analyses being performed and datasets being generated, and to enable introspection of the workflows. In this paper, we describe our vision of a unified provenance and metadata management system to support lifecycle management of complex collaborative data science workflows. We argue that a large amount of information about the analysis processes and data artifacts can, and should be, captured in a semi-passive manner; and we show that querying and analyzing this information can not only simplify bookkeeping and debugging tasks for data analysts but can also enable a rich new set of capabilities like identifying flaws in the data science process itself. It can also significantly reduce the time spent in fixing post-deployment problems through automated analysis and monitoring. We have implemented an initial prototype of our system, called ProvDB, on top of git (a version control system) and Neo4j (a graph database), and we describe its key features and capabilities. 
    more » « less