skip to main content


Title: On the Future of Cloud Engineering
Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape of cloud computing has been undergoing remarkable changes with the emergence of many different types of service offerings, developer productivity enhancement tools, and new application classes as well as the manifestation of cloud functionality closer to the user at the edge. The notion of utility computing, however, has remained constant throughout its evolution, which means that cloud users always seek to save costs of leasing cloud resources while maximizing their use. On the other hand, cloud providers try to maximize their profits while assuring service-level objectives of the cloud-hosted applications and keeping operational costs low. All these outcomes require systematic and sound cloud engineering principles. The aim of this paper is to highlight the importance of cloud engineering, survey the landscape of best practices in cloud engineering and its evolution, discuss many of the existing cloud engineering advances, and identify both the inherent technical challenges and research opportunities for the future of cloud computing in general and cloud engineering in particular.  more » « less
Award ID(s):
2107101 1703560 2027977
NSF-PAR ID:
10334311
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Cloud Engineering
Page Range / eLocation ID:
264 to 275
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cloud computing has motivated renewed interest in resource allocation problems with new consumption models. A common goal is to share a resource, such as CPU or I/O bandwidth, among distinct users with different demand patterns as well as different quality of service requirements. To ensure these service requirements, cloud offerings often come with a service level agreement (SLA) between the provider and the users. A SLA specifies the amount of a resource a user is entitled to utilize. In many cloud settings, providers would like to operate resources at high utilization while simultaneously respecting individual SLAs. There is typically a trade-off between these two objectives; for example, utilization can be increased by shifting away resources from idle users to “scavenger” workload, but with the risk of the former then becoming active again. We study this fundamental tradeoff by formulating a resource allocation model that captures basic properties of cloud computing systems, including SLAs, highly limited feedback about the state of the system, and variable and unpredictable input sequences. Our main result is a simple and practical algorithm that achieves near-optimal performance on the above two objectives. First, we guarantee nearly optimal utilization of the resource even if compared with the omniscient offline dynamic optimum. Second, we simultaneously satisfy all individual SLAs up to a small error. The main algorithmic tool is a multiplicative weight update algorithm and a primal-dual argument to obtain its guarantees. We also provide numerical validation on real data to demonstrate the performance of our algorithm in practical applications. 
    more » « less
  2. Computing landscape is evolving rapidly. Exascale computers have arrived, which can perform 10^18 mathematical operations per second. At the same time, quantum supremacy has been demonstrated, where quantum computers have outperformed these fastest supercomputers for certain problems. Meanwhile, artificial intelligence (AI) is transforming every aspect of science and engineering. A highly anticipated application of the emerging nexus of exascale computing, quantum computing and AI is computational design of new materials with desired functionalities, which has been the elusive goal of the federal materials genome initiative. The rapid change in computing landscape resulting from these developments has not been matched by pedagogical developments needed to train the next generation of materials engineering cyberworkforce. This gap in curricula across colleges and universities offers a unique opportunity to create educational tools, enabling a decentralized training of cyberworkforce. To achieve this, we have developed training modules for a new generation of quantum materials simulator, named AIQ-XMaS (AI and quantum-computing enabled exascale materials simulator), which integrates exascalable quantum, reactive and neural-network molecular dynamics simulations with unique AI and quantum-computing capabilities to study a wide range of materials and devices of high societal impact such as optoelectronics and health. As a singleentry access point to these training modules, we have also built a CyberMAGICS (cyber training on materials genome innovation for computational software) portal, which includes step-by-step instructions in Jupyter notebooks and associated tutorials, while providing online cloud service for those who do not have access to adequate computing platform. The modules are incorporated into our open-source AIQ-XMaS software suite as tutorial examples and are piloted in classroom and workshop settings to directly train many users at the University of Southern California (USC) and Howard University—one of the largest historically black colleges and universities (HBCUs), with a strong focus on underrepresented groups. In this paper, we summarize these educational developments, including findings from the first CyberMAGICS Workshop for Underrepresented Groups, along with an introduction to the AIQ-XMaS software suite. Our training modules also include a new generation of open programming languages for exascale computing (e.g., OpenMP target) and quantum computing (e.g., Qiskit) used in our scalable simulation and AI engines that underlie AIQ-XMaS. Our training modules essentially support unique dual-degree opportunities at USC in the emerging exa-quantum-AI era: Ph.D. in science or engineering, concurrently with MS in computer science specialized in high-performance computing and simulations, MS in quantum information science or MS in materials engineering with machine learning. The developed modular cyber-training pedagogy is applicable to broad engineering education at large. 
    more » « less
  3. The landscape of research in science and engineering is heavily reliant on computation and data processing. There is continued and expanded usage by disciplines that have historically used advanced computing resources, new usage by disciplines that have not traditionally used HPC, and new modalities of the usage in Data Science, Machine Learning, and other areas of AI. Along with these new patterns have come new advanced computing resource methods and approaches, including the availability of commercial cloud resources. The Coalition for Academic Scientific Computation (CASC) has long been an advocate representing the needs of academic researchers using computational resources, sharing best practices and offering advice to create a national cyberinfrastructure to meet US science, engineering, and other academic computing needs. CASC has completed the first of what we intend to be an annual survey of academic cloud and data center usage and practices in analyzing return on investment in cyberinfrastructure. Critically important findings from this first survey include the following: many of the respondents are engaged in some form of analysis of return in research computing investments, but only a minority currently report the results of such analyses to their upper-level administration. Most respondents are experimenting with use of commercial cloud resources but no respondent indicated that they have found use of commercial cloud services to create financial benefits compared to their current methods. There is clear correlation between levels of investment in research cyberinfrastructure and the scale of both cpu core-hours delivered and the financial level of supported research grants. Also interesting is that almost every respondent indicated that they participate in some sort of national cooperative or nationally provided research computing infrastructure project and most were involved in academic computing-related organizations, indicating a high degree of engagement by institutions of higher education in building and maintaining national research computing ecosystems. Institutions continue to evaluate cloud-based HPC service models, despite having generally concluded that so far cloud HPC is too expensive to use compared to their current methods. 
    more » « less
  4. Collections digitization relies increasingly upon computational and data management resources that occasionally exceed the capacity of natural history collections and their managers and curators. Digitization of many tens of thousands of micropaleontological specimen slides, as evidenced by the effort presented here by the Indiana University Paleontology Collection, has been a concerted effort in adherence to the recommended practices of multifaceted aspects of collections management for both physical and digital collections resources. This presentation highlights the contributions of distributed cyberinfrastructure from the National Science Foundation-supported Extreme Science and Engineering Discovery Environment (XSEDE) for web-hosting of collections management system resources and distributed processing of millions of digital images and metadata records of specimens from our collections. The Indiana University Center for Biological Research Collections is currently hosting its instance of the Specify collections management system (CMS) on a virtual server hosted on Jetstream, the cloud service for on-demand computational resources as provisioned by XSEDE. This web-service allows the CMS to be flexibly hosted on the cloud with additional services that can be provisioned on an as-needed basis for generating and integrating digitized collections objects in both web-friendly and digital preservation contexts. On-demand computing resources can be used for the manipulation of digital images for automated file I/O, scripted renaming of files for adherence to file naming conventions, derivative generation, and backup to our local tape archive for digital disaster preparedness and long-term storage. Here, we will present our strategies for facilitating reproducible workflows for general collections digitization of the IUPC nomenclatorial types and figured specimens in addition to the gigapixel resolution photographs of our large collection of microfossils using our GIGAmacro system (e.g., this slide of conodonts). We aim to demonstrate the flexibility and nimbleness of cloud computing resources for replicating this, and other, workflows to enhance the findability, accessibility, interoperability, and reproducibility of the data and metadata contained within our collections. 
    more » « less
  5. null (Ed.)
    Serverless computing is a rapidly growing paradigm that easily harnesses the power of the cloud. With serverless computing, developers simply provide an event-driven function to cloud providers, and the provider seamlessly scales function invocations to meet demands as event-triggers occur. As current and future serverless offerings support a wide variety of serverless applications, effective techniques to manage serverless workloads becomes an important issue. This work examines current management and scheduling practices in cloud providers, uncovering many issues including inflated application run times, function drops, inefficient allocations, and other undocumented and unexpected behavior. To fix these issues, a new quality-of-service function scheduling and allocation framework, called Sequoia, is designed. Sequoia allows developers or administrators to easily def ne how serverless functions and applications should be deployed, capped, prioritized, or altered based on easily configured, flexible policies. Results with controlled and realistic workloads show Sequoia seamlessly adapts to policies, eliminates mid-chain drops, reduces queuing times by up to 6.4X, enforces tight chain-level fairness, and improves run-time performance up to 25X. 
    more » « less