Enterprise and Cloud environments are rapidly evolving with the use of lightweight virtualization mechanisms such as containers. Containerization allow users to deploy applications in any environment faster and more efficiently than using virtual machines. However, most of the work in this area focused on Linux-based containerization such as Docker and LXC and other mature solutions such as FreeBSD Jails have not been adopted by production-ready environments. In this work we explore the use of FreeBSD virtualization and provide a comparative study with respect to Linux containerization using Apache Spark. Preliminary results show that, while Linux containers provide better performance, FreeBSD solutions provide more stable and consistent results.
more »
« less
Containerization for creating reusable model code
Will you be able to run your computational models in the future? Even with well-documented code, this can be difficult due to changes in the software frameworks and operating systems that your code was built on. In this paper we discuss the use of containers to preserve code and their software dependencies to reproduce simulation results in the future. Containers are standalone lightweight packages of the original model software and their dependencies that can be run independent of the platform. As such they are suitable for reuse and sharing results. However, the use of containers is rare in the field of modeling social-environmental systems. We provide an introduction to the basic principles of containerization, argue why it would be beneficial if this tool became common practice in the field, describe a conceptual walkthrough to the process of containerizing a model, and reflect on near future directions of containerization workflows.
more »
« less
- PAR ID:
- 10354561
- Date Published:
- Journal Name:
- Socio-Environmental Systems Modelling
- Volume:
- 3
- ISSN:
- 2663-3027
- Page Range / eLocation ID:
- 18074
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract. Hutton et al. (2016) argued that computational hydrology can only be a proper science if the hydrological community makes sure that hydrological model studies are executed and presented in a reproducible manner. Hut, Drost and van de Giesen replied that to achieve this hydrologists should not “re-invent the water wheel” but rather use existing technology from other fields (such as containers and ESMValTool) and open interfaces (such as the Basic Model Interface, BMI) to do their computational science (Hut et al., 2017). With this paper and the associated release of the eWaterCycle platform and software package (available on Zenodo: https://doi.org/10.5281/zenodo.5119389, Verhoeven et al., 2022), we are putting our money where our mouth is and providing the hydrological community with a “FAIR by design” (FAIR meaning findable, accessible, interoperable, and reproducible) platform to do science. The eWaterCycle platform separates the experiments done on the model from the model code. In eWaterCycle, hydrological models are accessed through a common interface (BMI) in Python and run inside of software containers. In this way all models are accessed in a similar manner facilitating easy switching of models, model comparison and model coupling. Currently the following models and model suites are available through eWaterCycle: PCR-GLOBWB 2.0, wflow, Hype, LISFLOOD, MARRMoT, and WALRUS While these models are written in different programming languages they can all be run and interacted with from the Jupyter notebook environment within eWaterCycle. Furthermore, the pre-processing of input data for these models has been streamlined by making use of ESMValTool. Forcing for the models available in eWaterCycle from well-known datasets such as ERA5 can be generated with a single line of code. To illustrate the type of research that eWaterCycle facilitates, this paper includes five case studies: from a simple “hello world” where only a hydrograph is generated to a complex coupling of models in different languages. In this paper we stipulate the design choices made in building eWaterCycle and provide all the technical details to understand and work with the platform. For system administrators who want to install eWaterCycle on their infrastructure we offer a separate installation guide. For computational hydrologists that want to work with eWaterCycle we also provide a video explaining the platform from a user point of view (https://youtu.be/eE75dtIJ1lk, last access: 28 June 2022). With the eWaterCycle platform we are providing the hydrological community with a platform to conduct their research that is fully compatible with the principles of both Open Science and FAIR science.more » « less
-
The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation.more » « less
-
Harris, F.; Wu, R.; Redei, A. (Ed.)Networks are pervasive in society: infrastructures (e.g., telephone), commercial sectors (e.g., banking), and biological and genomic systems can be represented as networks. Con- sequently, there are software libraries that analyze networks. Containers (e.g., Docker, Singularity), which hold both runnable codes and their execution environments, are in- creasingly utilized by analysts to run codes in a platform-independent fashion. Portability is further enhanced by not only providing software library methods, but also the driver code (i.e., main() method) for each library method. In this way, a user only has to know the invocation for the main() method that is in the container. In this work, we describe an automated approach for generating a main() method for each software library method. A single intermediate representation (IR) format is used for all library methods, and one IR instance is populated for one library method by parsing its comments and method signature. An IR for the main() method is generated from that for the library method. A source code generator uses the main() method IR and a set of small, hand-generated source code templates|with variables in the templates that are automatically customized for a particular library method|to produce the source code main() method. We apply our approach to two widely used software libraries, SNAP and NetworkX, as examplars, which combined have over 400 library methods.more » « less
-
Networks are pervasive in society: infrastructures (e.g., telephone), commercial sectors (e.g., banking), and biological and genomic systems can be represented as networks. Consequently, there are software libraries that analyze networks. Containers (e.g., Docker, Singularity), which hold both runnable codes and their execution environments, are increasingly utilized by analysts to run codes in a platform-independent fashion. Portability is further enhanced by not only providing software library methods, but also the driver code (i.e., main() method) for each library method. In this way, a user only has to know the invocation for the main() method that is in the container. In this work, we describe an automated approach for generating a main() method for each software library method. A single intermediate representation (IR) format is used for all library methods, and one IR instance is populated for one library method by parsing its comments and method signature. An IR for the main() method is generated from that for the library method. A source code generator uses the main() method IR and a set of small, hand-generated source code templates|with variables in the templates that are automatically customized for a particular library method|to produce the source code main() method. We apply our approach to two widely used software libraries, SNAP and NetworkX, as exemplars, which combined have over 400 library methods.more » « less
An official website of the United States government

