Understanding the different workloaddependent factors that impact the latency or reliability of a storage system is essential for SLA satisfaction and fair resource provisioning. However, due to the volatility of system behavior under multiple workloads, determining even the number of concurrent types of workload functions, a necessary precursor to workload separation, is an unsolved problem in the general case. We introduce CENSUS, a novel classification framework that combines time-series analysis with gradient boosting to identify the number of functional workloads in a shared storage system by projecting workload traces into a high-dimensional feature representation space. We show that CENSUS can distinguish the number of interleaved workloads in a real-world trace segment with up to 95% accuracy, leading to a decrement of the mean square error to as little as 5% compared to the 
                        more » 
                        « less   
                    
                            
                            Storage System Trace Characterization, Compression, and Synthesis using Machine Learning – An Extended Abstract
                        
                    
    
            This study addresses the knowledge gap in request-level storage trace analysis by incorporating workload characterization, com- pression, and synthesis. The aim is to better understand workload behavior and provide unique workloads for storage system test- ing under different scenarios. Machine learning techniques like K-means clustering and PCA analysis are employed to understand trace properties and reduce manual workload selection. By gener- ating synthetic workloads, the proposed method facilitates simu- lation and modeling-based studies of storage systems, especially for emerging technologies like Storage Class Memory (SCM) with limited workload availability. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1956229
- PAR ID:
- 10413957
- Date Published:
- Journal Name:
- International Conference on Principles of Advanced Discrete Simulation (PADS)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Identifying the characteristics of a storage workload is critical for resource provisioning for metrics including performance, reliability, and utilization. Although multi-tenant systems are increasingly commonplace, characterization of multiple workloads within a single system trace is difficult because workloads are highly dynamic and typically not labeled. We show that, by converting a block I/O workload to a signal and applying blind source separation, we are able to successfully separate many application workloads.more » « less
- 
            Identifying the characteristics of a storage workload is critical for resource provisioning for metrics including performance, reliability, and utilization. Although multi-tenant systems are increasingly commonplace, characterization of multiple workloads within a single system trace is difficult because workloads are highly dynamic and typically not labeled. We show that, by converting a block I/O workload to a signal and applying blind source separation, we are able to successfully separate many application workloads.more » « less
- 
            Identifying the characteristics of a storage workload is critical for resource provisioning for metrics including performance, reliability, and utilization. Although multi-tenant systems are increasingly commonplace, characterization of multiple workloads within a single system trace is difficult because workloads are highly dynamic and typically not labeled. We show that, by converting a block I/O workload to a signal and applying blind source separation, we are able to successfully separate many application workloads.more » « less
- 
            Traditional workload analysis uses discrete times measured by data accesses. An example is the classic independent reference model (IRM). Effective solutions have been developed to model workloads with stochastic access patterns, but they incur a high cost for Zipfian workloads, which may contain millions of items each accessed with a different frequency. This paper first presents a continuous-time model of locality for workloads with stochastic access patterns. It shows that two previous techniques by Dan and Towsley and by Denning and Schwartz can be interpreted as a single model using different discrete times. Using continuous time, it derives a closed-form solution for an item and a general solution that is a differentiable function. In addition, the paper presents an approximation technique by grouping items into partitions. When evaluated using Zipfian workloads, it shows that a workload with millions of items can be approximated using a small number of partitions, and the continuous-time model has greater accuracy and is faster to compute numerically. For the largest data size verifiable using trace generation and simulation, the new techniques reduce the time of locality analysis by 6 orders of magnitude.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    