skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Overcast: Running Controlled Experiments Spanning Research and Commercial Clouds
The Chameleon project developed a unique experimental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demonstrated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform.  more » « less
Award ID(s):
1743313
PAR ID:
10314447
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The Chameleon project developed a unique experi- mental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demon- strated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform. 
    more » « less
  2. Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our data collection consists of millions of datapoints gathered while transferring over 9 petabytes of data. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines for practitioners to reduce the volatility of big data performance, making experiments more repeatable. 
    more » « less
  3. Clouds are shareable scientific instruments that create the potential for reproducibility by ensuring that all investigators have access to a common execution platform on which computational experiments can be repeated and compared. By virtue of the interface they present, they also lead to the creation of digital artifacts compatible with the cloud, such as images or orchestration templates, that go a long way—and sometimes all the way—to representing an experiment in a digital, repeatable form. In this article, I describe how we developed these natural advantages of clouds in the Chameleon testbed and argue that we should leverage them to create a digital research marketplace that would make repeating experiments as natural and viable part of research as sharing ideas via reading papers is today. 
    more » « less
  4. Computational notebooks have gained much pop- ularity as a way of documenting research processes; they allow users to express research narrative by integrating ideas expressed as text, process expressed as code, and results in one executable document. However, the environments in which the code can run are currently limited, often containing only a fraction of the resources of one node, posing a barrier to many computations. In this paper, we make the case that integrating complex experimental environments, such as virtual clusters or complex networking environments that can be provisioned via infrastructure clouds, into computational notebooks will significantly broaden their reach and at the same time help realize the potential of clouds as a platform for repeatable research. To support our argument, we describe the integration of Jupyter notebooks into the Chameleon cloud testbed, which allows the user to define complex experimental environments and then assign processes to elements of this environment similarly to the way a laptop user may switch between different desktops. We evaluate our approach on an actual experiment from both the development and replication perspective. 
    more » « less
  5. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We have also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs 
    more » « less