skip to main content


Search for: All records

Creators/Authors contains: "Hong, Yili"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Although high-performance computing (HPC) systems have been scaled to meet the exponentially growing demand for scientific computing, HPC performance variability remains a major challenge in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management. In this article, we propose a new framework to predict performance distributions. The proposed framework is a modified Gaussian process that can predict the distribution function of the input/output (I/O) throughput under a specific HPC system configuration. We also impose a monotonic constraint so that the predicted function is nondecreasing, which is a property of the cumulative distribution function. Additionally, the proposed model can incorporate both quantitative and qualitative input variables. We predict the HPC I/O distribution using the proposed method for the IOzone variability data. Data analysis results show that our framework can generate accurate predictions, and outperform existing methods. We also show how the predicted functional output can be used to generate predictions for a scalar summary of the performance distribution, such as the mean, standard deviation, and quantiles. Our prediction results can further be used for HPC system variability monitoring and optimization. This article has online supplementary materials.

     
    more » « less
  2. The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled answers to questions such as, how has the SPEC benchmark suite evolved empirically over time and what micro-architecture artifacts have had the most influence on performance? - have any micro-benchmarks within the suite had undue influence on the results and comparisons among the codes? - can the answers to these questions provide insights to the future of computer system performance? To answer these questions, we detail our historical and statistical analysis of specific hardware artifacts (clock frequencies, core counts, etc.) on the performance of the SPEC benchmarks since 1995. We discuss in detail several methods to normalize across benchmark evolutions. We perform both isolated and collective sensitivity analyses for various hardware artifacts and we identify one benchmark (libquantum) that had somewhat undue influence on performance outcomes. We also present the use of SPEC data to predict future performance. 
    more » « less
    Free, publicly-accessible full text available January 31, 2025
  3. null (Ed.)
    Multi-type recurrent events are often encountered in medical applications when two or more different event types could repeatedly occur over an observation period. For example, patients may experience recurrences of multi-type nonmelanoma skin cancers in a clinical trial for skin cancer prevention. The aims in those applications are to characterize features of the marginal processes, evaluate covariate effects, and quantify both the within-subject recurrence dependence and the dependence among different event types. We use copula-frailty models to analyze correlated recurrent events of different types. Parameter estimation and inference are carried out by using a Monte Carlo expectation-maximization (MCEM) algorithm, which can handle a relatively large (i.e. three or more) number of event types. Performances of the proposed methods are evaluated via extensive simulation studies. The developed methods are used to model the recurrences of skin cancer with different types. 
    more » « less
  4. null (Ed.)
    Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a general bivariate recurrent event process, where a bivariate lognormal distribution and a Gumbel copula with different marginal distributions are used to model an interdependent dual-type event system. The maximum likelihood approach is used to estimate model parameters. The proposed method is applied to analyzing the Yellowstone geyser eruption data for a bivariate geyser system and offers a deeper understanding of the event occurrence mechanism of individual events as well as the system as a whole. A comprehensive simulation study is conducted to evaluate the performance of the proposed method. 
    more » « less
  5. null (Ed.)
  6. null (Ed.)