skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.


Title: Operational Lessons from Chameleon
Chameleon is a large-scale, deeply reconfigurable testbed built to support Computer Science experimentation. Unlike traditional systems of this kind, Chameleon has been configured using an adaptation of a mainstream open source infrastructure cloud system called OpenStack. We show that operating cloud systems requires both more skill and extra effort on the part of the operators - in particular where those systems are expected to evolve quickly - which can make systems of this kind expensive to run. We discuss three ways in which those operations costs can be managed: innovative mon- itoring and automation of systems tasks, building “operator co-ops”, and collaborating with users.  more » « less
Award ID(s):
1743358
NSF-PAR ID:
10107209
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Humanware Advancing Research in the Cloud
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A scalable storage system is an integral requirement for supporting large-scale cloud computing jobs. The raw space on storage systems is made usable with the help of a software layer which is typically called a filesystem (e.g., Google's Cloud Filestore). In this paper, we present the design and implementation of an open-source and free cloud-based filesystem named as "Greyfish" that can be installed on the Virtual Machines (VMs) hosted on different cloud computing systems, such as Jetstream and Chameleon. Greyfish helps in: (1) storing files and directories for different user-accounts in a shared space on the cloud, (2) managing file-access permissions, and (3) purging files when needed. It is currently being used in the implementation of the Gateway-In-A-Box (GIB) project. A simplified version of Greyfish, known as Reef, is already in production in the BOINC@TACC project. Science gateway developers will find Greyfish useful for creating local filesystems that can be mounted in containers. By doing so, they can independently do quick installations of self-contained software solutions in development and test environments while mounting the filesystems on large-scale storage platforms in the production environments only. 
    more » « less
  2. null (Ed.)
    The Chameleon project developed a unique experi- mental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demon- strated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform. 
    more » « less
  3. The Chameleon project developed a unique experimental testbed by adapting a mainstream cloud implementation to the needs of systems research community and thereby demonstrated that clouds can be configured to serve as a platform for this type research. More recently, the CloudBank project embarked on a mission of providing a conduit to commercial clouds for the systems research community that eliminates much of the complexity and some of the cost of using them for research. This creates an opportunity to explore running systems experiments in a combined setting, spanning both research and commercial clouds. In this paper, we present an extension to Chameleon for constructing controlled experiments across its resources and commercial clouds accessible via CloudBank, present a case study of an experiment running across such combined resources, and discuss the impact of using a combined research platform. 
    more » « less
  4. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We have also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs 
    more » « less
  5. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We have also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs. 
    more » « less