skip to main content

Title: Overcast: Running Controlled Experiments Spanning Research and Commercial Clouds
The Chameleon project developed a unique experi- mental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demon- strated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform.
Authors:
; ; ; ; ;
Award ID(s):
1743358
Publication Date:
NSF-PAR ID:
10273074
Journal Name:
Proceedings of the IEEE Conference Computer and Networking Experimental Research using Testbeds (CNERT 2021)
Sponsoring Org:
National Science Foundation
More Like this
  1. The Chameleon project developed a unique experimental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demonstrated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform.
  2. Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our data collection consists of millions of datapoints gathered while transferring over 9 petabytes of data. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines for practitioners to reduce the volatility of big data performance, making experiments more repeatable.
  3. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We havemore »also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs.« less
  4. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We havemore »also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs« less
  5. Clouds are shareable scientific instruments that create the potential for reproducibility by ensuring that all investigators have access to a common execution platform on which computational experiments can be repeated and compared. By virtue of the interface they present, they also lead to the creation of digital artifacts compatible with the cloud, such as images or orchestration templates, that go a long way—and sometimes all the way—to representing an experiment in a digital, repeatable form. In this article, I describe how we developed these natural advantages of clouds in the Chameleon testbed and argue that we should leverage them to create a digital research marketplace that would make repeating experiments as natural and viable part of research as sharing ideas via reading papers is today.