skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Detecting Performance Anomalies in Cloud Platform Applications
We present Roots, a full-stack monitoring and analysis system for performance anomaly detection and bottleneck identification in cloud platform-as-a-service (PaaS) systems. Roots facilitates application performance monitoring as a core capability of PaaS clouds, and relieves the developers from having to instrument application code. Roots tracks HTTP/S requests to hosted cloud applications and their use of PaaS services. To do so it employs lightweight monitoring of PaaS service interfaces. Roots processes this data in the background using multiple statistical techniques that in combination detect performance anomalies (i.e. violations of service-level objectives). For each anomaly, Roots determines whether the event was caused by a change in the request workload or by a performance bottleneck in a PaaS service. By correlating data collected across different layers of the PaaS, Roots is able to trace high-level performance anomalies to bottlenecks in specific components in the cloud platform. We implement Roots using the AppScale PaaS and evaluate its overhead and accuracy.  more » « less
Award ID(s):
1703560
NSF-PAR ID:
10057652
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Cloud Computing
ISSN:
2168-7161
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We introduce a novel methodology for anomaly detection in time-series data. The method uses persistence diagrams and bottleneck distances to identify anomalies. Specifically, we generate multiple predictors by randomly bagging the data (reference bags), then for each data point replacing the data point for a randomly chosen point in each bag (modified bags). The predictors then are the set of bottleneck distances for the reference/modified bag pairs. We prove the stability of the predictors as the number of bags increases. We apply our methodology to traffic data and measure the performance for identifying known incidents. 
    more » « less
  2. Today's serverless provides "function-as-a-service" with dynamic scaling and fine-grained resource charging, enabling new cloud applications. Serverless functions are invoked as a best-effort service. We propose an extension to serverless, called real-time serverless that provides an invocation rate guarantee, a service-level objective (SLO) specified by the application, and delivered by the underlying implementation. Real-time serverless allows applications to guarantee real-time performance. We study real-time serverless behavior analytically and empirically to characterize its ability to support bursty, real-time cloud and edge applications efficiently. Finally, we use a case study, traffic monitoring, to illustrate the use and benefits of real-time serverless, on our prototype implementation. 
    more » « less
  3. Abstract

    The CMS detector is a general-purpose apparatus that detects high-energy collisions produced at the LHC. Online data quality monitoring of the CMS electromagnetic calorimeter is a vital operational tool that allows detector experts to quickly identify, localize, and diagnose a broad range of detector issues that could affect the quality of physics data. A real-time autoencoder-based anomaly detection system using semi-supervised machine learning is presented enabling the detection of anomalies in the CMS electromagnetic calorimeter data. A novel method is introduced which maximizes the anomaly detection performance by exploiting the time-dependent evolution of anomalies as well as spatial variations in the detector response. The autoencoder-based system is able to efficiently detect anomalies, while maintaining a very low false discovery rate. The performance of the system is validated with anomalies found in 2018 and 2022 LHC collision data. In addition, the first results from deploying the autoencoder-based system in the CMS online data quality monitoring workflow during the beginning of RunĀ 3 of the LHC are presented, showing its ability to detect issues missed by the existing system.

     
    more » « less
  4. In the last few years, Cloud computing technology has benefited many organizations that have embraced it as a basis for revamping the IT infrastructure. Cloud computing utilizes Internet capabilities in order to use other computing resources. Amazon Web Services (AWS) is one of the most widely used cloud providers that leverages the endless computing capabilities that the cloud technology has to offer. AWS is continuously evolving to offer a variety of services, including but not limited to, infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service. Among the other important services offered by AWS is Video Surveillance as a Service (VSaaS) that is a hosted cloud-based video surveillance service. Even though this technology is complex and widely used, some security experts have pointed out that some of its vulnerabilities can be exploited in launching attacks aimed at cloud technologies. In this paper, we present a holistic security analysis of cloud-based video surveillance systems by examining the vulnerabilities, threats, and attacks that these technologies are susceptible to. We illustrate our findings by implementing several of these attacks on a test bed representing an AWS-based video surveillance system. The main contributions of our paper are: (1) we provided a holistic view of the security model of cloud based video surveillance summarizing the underlying threats, vulnerabilities and mitigation techniques (2) we proposed a novel taxonomy of attacks targeting such systems (3) we implemented several related attacks targeting cloud-based video surveillance system based on an AWS test environment and provide some guidelines for attack mitigation. The outcome of the conducted experiments showed that the vulnerabilities of the Internet Protocol (IP) and other protocols granted access to unauthorized VSaaS files. We aim that our proposed work on the security of cloud-based video surveillance systems will serve as a reference for cybersecurity researchers and practitioners who aim to conduct research in this field. 
    more » « less
  5. We present 3MileBeach, a tracing and fault injection platform designed for microservice-based architectures. 3Mile-Beach interposes on the message serialization libraries that are ubiquitous in this environment, avoiding the application code instrumentation that tracing and fault injection infrastructures typically require. 3MileBeach provides message-level distributed tracing at less than 50% of the overhead of the state-of-the-art tracing frameworks, and fault injection that allows higher precision experiments than existing solutions. We measure the overhead of 3MileBeach as a tracer and its efficacy as a fault injector. We qualitatively measure its promise as a platform for tuning and debugging by sharing concrete use cases in the context of bottleneck identification, performance tuning, and bug finding. Finally, we use 3MileBeach to perform a novel type of fault injection - Temporal Fault Injection (TFI), which more precisely controls individual inter-service message flow with temporal prerequisites, and makes it possible to catch an entirely new class of fault tolerance bugs. 
    more » « less