skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2011252

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Misconfiguration is a known and increasingly serious problem in enterprise systems due to frequent code updates and retuning of the configuration parameters. Diagnosing complex, residual misconfiguration problems that lead to inaccessible services or failed transactions often starts with either a user complaint or observation by administrators, followed by a largely manual process of deciding what tests to run and how to proceed with further testing based on the test results. The goal of this paper is to automate this process and thereby make root-cause analysis of accessibility related misconfigurations much speedier and much more effective. We explore a domain-knowledge-driven methodology, called ConfExp using a network emulator that runs real enterprise networking protocols. Thus, by using commonly used tests, we show that the root-cause can be determined in all cases where discriminative tests exist. The methodology also highlights areas where more discriminative tests are needed to pinpoint the precise configuration variables at fault. 
    more » « less
  2. Free, publicly-accessible full text available December 1, 2025
  3. Modern data center storage systems are invariably networked to allow for consolidation and flexible management of storage. They also include high-performance storage devices based on flash or other emerging technologies, generally accessed through low-latency and high-throughput protocols such as Non-volatile Memory Express (NVMe) (or its derivatives) carried over the network. With the increasing complexity and data-centric nature of the applications, properly configuring the quality of service (QoS) for the storage path has become crucial for ensuring the desired application performance. Such QoS is substantially influenced by the QoS in the network path, in the access protocol, and in the storage device. In this article, we define a new transport-level QoS mechanism for the network segment and demonstrate how it can augment and coordinate with the access-level QoS mechanism defined for NVMe, and a similar QoS mechanism configured in the device. We show that the transport QoS mechanism not only provides the desired QoS to different classes of storage accesses but is also able to protect the access to the shared persistent memory devices located along with the storage but requiring much lower latency than storage. We demonstrate that a proper coordinated configuration of the three QoS’s on the path is crucial to achieve the desired differentiation, depending on where the bottlenecks appear. 
    more » « less
  4. Given the difficulty in obtaining adequate data from production systems, characterizing performance as a function of configuration variables (CVs) via supervised learning is difficult, and the use of standard semi-supervised learning (SSL) techniques may or may not help. In this paper, we describe a knowledge-assisted (KA) SSL algorithm that determines the confidence level of the generated data independently based on the domain knowledge. We demonstrate that such an approach outperforms plain SSL with the most popular SSL algorithms for all the workloads used in this study. 
    more » « less
  5. The increasing use of the DevOps paradigm in software systems has substantially increased the frequency of configuration parameter setting changes. Ensuring the correctness of such settings is generally a very challenging problem due to the complex interdependencies, and calls for an automated mechanism that can both run quickly and provide accurate settings. In this paper, we propose an efficient discrete combinatorial optimization technique that makes two unique contributions: (a) an improved and extended metaheuristic that exploits the application domain knowledge for fast convergence, and (b) the development and quantification of a discrete version of the classical tunneling mechanism to improve the accuracy of the solution. Our extensive evaluation using available workload traces that do include configuration information shows that the proposed technique can provide a lower-cost solution (by ~60%) with faster convergence (by ~48%) as compared to the traditional metaheuristic algorithms. Also, our solution succeeds in finding a feasible solution in approximately 30% more cases than the baseline algorithm. 
    more » « less
  6. Most IT systems depend on a set of configuration variables (CVs) , expressed as a name/value pair that collectively defines the resource allocation for the system. While the ill effects of misconfiguration or improper resource allocation are well-known, there are no effective a priori metrics to quantify the impact of the configuration on the desired system attributes such as performance, availability, etc. In this paper, we propose a Configuration Health Index (CHI) framework specifically attuned to the performance attribute to capture the influence of CVs on the performance aspects of the system. We show how CHI , which is defined as a configuration scoring system, can take advantage of the domain knowledge and the available (but rather limited) performance data to produce important insights into the configuration settings. We compare the CHI with both well-advertised segmented non-linear models and state-of-the-art data-driven models, and show that the CHI not only consistently provides better results but also avoids the dangers of a pure data drive approach which may predict incorrect behavior or eliminate some essential configuration variables from consideration. 
    more » « less