skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Performance Health Index for Complex Cyber Infrastructures
Most IT systems depend on a set of configuration variables (CVs) , expressed as a name/value pair that collectively defines the resource allocation for the system. While the ill effects of misconfiguration or improper resource allocation are well-known, there are no effective a priori metrics to quantify the impact of the configuration on the desired system attributes such as performance, availability, etc. In this paper, we propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system. We show how CHI , which is defined as a configuration scoring system, can take advantage of the domain knowledge and the available (but rather limited) performance data to produce important insights into the configuration settings. We compare the CHI with both well-advertised segmented non-linear models and state-of-the-art data-driven models, and show that the CHI not only consistently provides better results but also avoids the dangers of a pure data drive approach which may predict incorrect behavior or eliminate some essential configuration variables from consideration.  more » « less
Award ID(s):
2011252
PAR ID:
10419605
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM Transactions on Modeling and Performance Evaluation of Computing Systems
Volume:
7
Issue:
1
ISSN:
2376-3639
Page Range / eLocation ID:
1 to 32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate prediction of parallel application performance in HPC systems is essential for efficient resource allocation and system design. Classical performance models estimate of speedup based on theoretical assumptions, but their applicability is limited by parameter estimation, data acquisition, and real-world system issues such as latency and network congestion. This paper describes performance prediction using classical performance models boosted by a trainable machine learning framework. Domain-informed machine-learning models estimate the overhead of an application for a given problem size and resource configuration as a coefficient of the estimated speedup provided by performance laws. We evaluate this approach on two HPC mini-applications and two full applications with varying patterns of computation and communication and also evaluate the prediction accuracy on runs with varying processors-per-node configurations. Our results show that this method significantly improves the accuracy of performance predictions over standard analytical models and black-box regressors, while remaining robust even with limited training data. 
    more » « less
  2. Given the difficulty in obtaining adequate data from production systems, characterizing performance as a function of configuration variables (CVs) via supervised learning is difficult, and the use of standard semi-supervised learning (SSL) techniques may or may not help. In this paper, we describe a knowledge-assisted (KA) SSL algorithm that determines the confidence level of the generated data independently based on the domain knowledge. We demonstrate that such an approach outperforms plain SSL with the most popular SSL algorithms for all the workloads used in this study. 
    more » « less
  3. Optimal resource allocation in wireless systems still stands as a rather challenging task due to the inherent statistical characteristics of channel fading. On the one hand, minimax/outage-optimal policies are often overconservative and analytically intractable, despite advertising maximally reliable system performance. On the other hand, ergodic-optimal resource allocation policies are often susceptible to the statistical dispersion of heavy-tailed fading channels, leading to relatively frequent drastic performance drops. We investigate a new risk-aware formulation of the classical stochastic resource allocation problem for point-to-point power-constrained communication networks over fading channels with no cross-interference, by leveraging the Conditional Value-at-Risk (CV@R) as a coherent measure of risk. We rigorously derive closed-form expressions for the CV@R-optimal risk-aware resource allocation policy, as well as the optimal associated quantiles of the corresponding user rate functions by capitalizing on the underlying fading distribution, parameterized by dual variables. We then develop a purely dual tail waterfilling scheme, achieving significantly more rapid and assured convergence of dual variables, as compared with the primal-dual tail waterfilling algorithm, recently proposed in the literature. The effectiveness of the proposed scheme is also readily confirmed via detailed numerical simulations. 
    more » « less
  4. The polynomial chaos expansions (PCE) provide stochastic representations of quantities of interest (QoI) in terms of a vector of standardized random variables that represent all uncertainties influencing the QoI. These uncertainties could reflect statistical scatter in estimated probabilistic model (of which the mean, variance, or PCE coefficients are but examples), or errors in the underlying functional model between input and output (e.g. physics models). In this paper, we show how PCE permit the evaluation of sensitivities with respect to all these uncertainties, and provide a rational paradigm for resource allocation aimed at model validation. We will demonstrate the methodologies on examples drawn across science and engineering. 
    more » « less
  5. This research proposes a dynamic resource allocation method for vehicle-to-everything (V2X) communications in the six generation (6G) cellular networks. Cellular V2X (C-V2X) communications empower advanced applications but at the same time bring unprecedented challenges in how to fully utilize the limited physical-layer resources, given the fact that most of the applications require both ultra low latency, high data rate and high reliability. Resource allocation plays a pivotal role to satisfy such requirements as well as guarantee quality of service (QoS). Based on this observation, a novel fuzzy-logic-assisted Q learning model (FAQ) is proposed to intelligently and dynamically allocate resources by taking advantage of the centralized allocation mode. The proposed FAQ model reuses the resources to maximize the network throughput while minimizing the interference caused by concurrent transmissions. The fuzzy-logic module expedites the learning and improves the performance of the Q-learning. A mathematical model is developed to analyze the network throughput considering the interference. To evaluate the performance, a system model for V2X communications is built for urban areas, where various V2X services are deployed in the network. Simulation results show that the proposed FAQ algorithm can significantly outperform deep reinforcement learning, Q-learning and other advanced allocation strategies regarding the convergence speed and the network throughput. 
    more » « less