User research for scientific software can inform design and account for the unique concerns of academic researchers. In this study, we explored the user experience on a testbed for cloud computing research, CloudLab. Through 15 semi-structured interviews and observation, we observed the importance of time as a resource to system users. We observed CloudLab users strategically coordinating their time on the platform with other users, navigating the constraints of publication and other academic deadlines. Surprisingly, we found that this coordination may involve altruistic behaviors where users share time on CloudLab that had been allocated for personal use. In light of prior CSCW literature on how actors seek to harness time, we propose concrete opportunities for design interventions. Our strategy across all possible interventions is to increase users' awareness of the rhythms affecting their peers' platform use, allowing coordination based not just on knowledge of CloudLab reservations but also users' progress toward deadlines. The implications of this work inform the design of other similar cyberinfrastructure systems in science where users independently coordinate use of resources.
more »
« less
The Design and Operation of CloudLab
Given the highly empirical nature of research in cloud computing, networked systems, and related fields, testbeds play an important role in the research ecosystem. In this paper, we cover one such facility, CloudLab, which supports systems research by providing raw access to programmable hardware, enabling research at large scales, and creating as hared platform for repeatable research.We present our experiences designing CloudLab and operating it for four years, serving nearly 4,000 users who have run over 79,000 experiments on 2,250 servers, switches, and other pieces of datacenter equipment. From this experience,we draw lessons organized around two themes. The first set comes from analysis of data regarding the use of CloudLab:how users interact with it, what they use it for, and the implications for facility design and operation. Our second set of lessons comes from looking at the ways that algorithms used“under the hood,” such as resource allocation, have important—and sometimes unexpected—effects on user experience and behavior. These lessons can be of value to the designers and operators of IaaS facilities in general, systems testbeds in particular, and users who have a stake in understanding how these systems are built.
more »
« less
- Award ID(s):
- 1743363
- PAR ID:
- 10107039
- Date Published:
- Journal Name:
- The Design and Operation of CloudLab
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
SSH (Secure Shell) is widely used for remote access to systems and cloud services. This access comes with the persistent threat of SSH password-guessing brute-force attacks (BFAs) directed at sshd-enabled devices connected to the Internet. In this work, we present a comprehensive study of such attacks on a production facility (CloudLab), offering previously unreported insight. Our study provides a detailed analysis of SSH BFAs occurring on the Internet today through an in-depth analysis of sshd logs collected over a period of four years from over 500 servers. We report several patterns in attacker behavior, present insight on the targets of the attacks, and devise a method for tracking individual attacks over time across sources. Leveraging our insight, we develop a defense mechanism against SSH BFAs that blocks 99.5% of such attacks, significantly outperforming the 66.1% coverage of current state-of-the-art rate-based blocking while also cutting false positives by 83%. We have deployed our defense in production on CloudLab, where it catches four-fifths of SSH BFAs missed by other defense strategies.more » « less
-
FABRIC is a unique national research infrastructure to enable cutting-edge andexploratory research at-scale in networking, cybersecurity, distributed computing andstorage systems, machine learning, and science applications. It is an everywhere-programmable nationwide instrument comprised of novel extensible network elementsequipped with large amounts of compute and storage, interconnected by high speed,dedicated optical links. It will connect a number of specialized testbeds for cloudresearch (NSF Cloud testbeds CloudLab and Chameleon), for research beyond 5Gtechnologies (Platforms for Advanced Wireless Research or PAWR), as well as productionhigh-performance computing facilities and science instruments to create a rich fabric fora wide variety of experimental activities.more » « less
-
Empirical performance measurements of computer systems almost always exhibit variability and anomalies. Run-to-run and server-to-server variations are common for CPU, memory, disk, and network performance characteristics. In our previous work, we focused on taming performance variability for memory, disk, and network and established an interactive analysis service at: https://confirm.fyi/ to help users of the CloudLab testbed better plan and conduct their experiments. In this paper, we describe our analysis of CPU variability based on over 1.3M performance measurements from nearly 1,800 servers and present our initial findings. The focus of this work is on capturing hardware variability, which can make repeatable experiments more difficult and can impact conclusions; it it this important for systems researchers to understand. (We note that, though we do not study it in this work, in the cloud, multi-tenancy and resource sharing an exacerbate the problem.) Variability also inevitably impacts performance and operation of middleware and high-level applications, contributing to the straggler problems in many domains, including HPC, Big Data, and Machine Learning, and on many types of cyberinfrastructures. We analyze the data from the CloudLab servers allocated in an exclusive fashion, with no virtualization. While our analysis focuses on the testbed that aims to promote reproducible research, we believe our approach and the findings can be of value to people who manage, analyze, and utilize shared computing resources in supercomputers, clouds, and datacenters.more » « less
-
While substantial advances have been made in recommender systems -- both in general and for news -- using datasets, offline analyses, and one-shot experiments, longitudinal studies of real users remain the gold standard, and the only way to effectively measure the impact of recommender system designs (algorithmic and otherwise) on long-term user experience and behavior. While such infrastructure exists for studies within some individual organizations, the extensive cost and effort to build the systems, content streams, and user base make it prohibitive for most researchers to conduct such studies. We propose to develop shared research infrastructure for the research community, and have received funding to gather community input on requirements, resources, and research goals for such an infrastructure. If the full infrastructure proposal is funded, it would result in recruiting a community of thousands of users who agree to use a news delivery application within which various researchers would be install and conduct experiments. In this short paper we outline what we have heard and learned so far and present a set of questions to be directed to INRA attendees to gather their feedback at the workshop.more » « less