- NSF-PAR ID:
- 10302620
- Date Published:
- Journal Name:
- the ASME International Manufacturing Science and Engineering Conference (MSEC)
- Volume:
- 85062
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Accurate prediction of product failures and the need for repair services become critical for various reasons, including understanding the warranty performance of manufacturers, defining cost-efficient repair strategies, and compliance with safety standards. The purpose of this study is to use machine learning tools to analyze several parameters crucial for achieving a robust repair service system, including the number of repairs, the time of the next repair ticket or product failure, and the time to repair. A large data set of over 530,000 repairs and maintenance of medical devices has been investigated by employing the Support Vector Machine (SVM) tool. SVM with four kernel functions is used to forecast the timing of the next failure or repair request in the system for two different products and two different failure types, namely, random failure and physical damage. Frequency analysis is also conducted to explore the product quality level based on product failure and the time to repair it. Besides, the best probability distributions are fitted for the failure count, the time between failures, and the time to repair. The results reveal the value of data analytics and machine learning tools in analyzing post-market product performance and the cost of repair and maintenance operations.more » « less
-
Products often experience different failure and repair needs during their lifespan. Prediction of the type of failure is crucial to the maintenance team for various reasons, such as realizing the device performance, creating standard strategies for repair, and analyzing the trade-off between cost and profit of repair. This study aims to apply machine learning tools to forecast failure types of medical devices and help the maintenance team properly decides on repair strategies based on a limited dataset. Two types of medical devices are used as the case study. The main challenge resides in using the limited attributes of the dataset to forecast product failure type. First, a multilayer perceptron (MLP) algorithm is used as a regression model to forecast three attributes, including the time of next failure, repair time, and repair time z-scores. Then, eight classification models, including Naïve Bayes with Bernoulli (NB-Bernoulli), Gaussian (NB-Gaussian), Multinomial (NB-Multinomial) model, Support Vector Machine with linear (SVM-Linear), polynomial (SVM-Poly), sigmoid (SVM-Sigmoid), and radical basis (SVM-RBF) function, and K-Nearest Neighbors (KNN) are used to forecast the failure type. Finally, Gaussian Mixture Model (GMM) is used to identify maintenance conditions for each product. The results reveal that the classification models could forecast failure type with similar performance, although the attributes of the dataset were limited.more » « less
-
Smart manufacturing systems are considered the next generation of manufacturing applications. One important goal of the smart manufacturing system is to rapidly detect and anticipate failures to reduce maintenance cost and minimize machine downtime. This often boils down to detecting anomalies within the sensor data acquired from the system which has different characteristics with respect to the operating point of the environment or machines, such as, the RPM of the motor. In this paper, we analyze four datasets from sensors deployed in manufacturing testbeds. We detect the level of defect for each sensor data leveraging deep learning techniques. We also evaluate the performance of several traditional and ML-based forecasting models for predicting the time series of sensor data. We show that careful selection of training data by aggregating multiple predictive RPM values is beneficial. Then, considering the sparse data from one kind of sensor, we perform transfer learning from a high data rate sensor to perform defect type classification. We release our manufacturing database corpus (4 datasets) and codes for anomaly detection and defect type classification for the community to build on it. Taken together, we show that predictive failure classification can be achieved, paving the way for predictive maintenance.
-
null (Ed.)Large-scale high-performance computing systems frequently experience a wide range of failure modes, such as reliability failures (e.g., hang or crash), and resource overload-related failures (e.g., congestion collapse), impacting systems and applications. Despite the adverse effects of these failures, current systems do not provide methodologies for proactively detecting, localizing, and diagnosing failures. We present Kaleidoscope, a near real-time failure detection and diagnosis framework, consisting of of hierarchical domain-guided machine learning models that identify the failing components, the corresponding failure mode, and point to the most likely cause indicative of the failure in near real-time (within one minute of failure occurrence). Kaleidoscope has been deployed on Blue Waters supercomputer and evaluated with more than two years of production telemetry data. Our evaluation shows that Kaleidoscope successfully localized 99.3% and pinpointed the root causes of 95.8% of 843 real-world production issues, with less than 0.01% runtime overhead.more » « less
-
Data reliability and availability, and serviceability (RAS) of erasure-coded data centers are highly affected by data repair induced by node failures. Compared to the recovery phase of the data repair, which is widely studied and well optimized, the failure identification phase of the data repair is less investigated. Moreover, in a traditional failure identification scheme, all chunks share the same identification time threshold, thus losing opportunities to further improve the RAS. To solve this problem, we propose RAFI, a novel risk-aware failure identification scheme. In RAFI, chunk failures in stripes experiencing different numbers of failed chunks are identified using different time thresholds. For those chunks in a high risk stripe (a stripe with many failed chunks), a shorter identification time is adopted, thus improving the overall data reliability and availability. For those chunks in a low risk stripe (one with only a few failed chunks), a longer identification time is adopted, thus reducing the repair network traffic. Therefore, the RAS can be improved simultaneously. We use both simulations and prototyping implementation to evaluate RAFI. Results collected from extensive simulations demonstrate the effectiveness and efficiency of RAFI on improving the RAS. We implement a prototype on HDFS to verify the correctness and evaluate the computational cost of RAFI.more » « less