skip to main content


Title: Fault Classification of Nonlinear Small Sample Data through Feature Sub-Space Neighbor Vote
The fault classification of a small sample of high dimension is challenging, especially for a nonlinear and non-Gaussian manufacturing process. In this paper, a similarity-based feature selection and sub-space neighbor vote method is proposed to solve this problem. To capture the dynamics, nonlinearity, and non-Gaussianity in the irregular time series data, high order spectral features, and fractal dimension features are extracted, selected, and stacked in a regular matrix. To address the problem of a small sample, all labeled fault data are used for similarity decisions for a specific fault type. The distances between the new data and all fault types are calculated in their feature subspaces. The new data are classified to the nearest fault type by majority probability voting of the distances. Meanwhile, the selected features, from respective measured variables, indicate the cause of the fault. The proposed method is evaluated on a publicly available benchmark of a real semiconductor etching dataset. It is demonstrated that by using the high order spectral features and fractal dimensionality features, the proposed method can achieve more than 84% fault recognition accuracy. The resulting feature subspace can be used to match any new fault data to the fingerprint feature subspace of each fault type, and hence can pinpoint the root cause of a fault in a manufacturing process.  more » « less
Award ID(s):
1916866
NSF-PAR ID:
10265908
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Electronics
Volume:
9
Issue:
11
ISSN:
2079-9292
Page Range / eLocation ID:
1952
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Electrical Impedance Tomography (EIT) is a well-known imaging technique for detecting the electrical properties of an object in order to detect anomalies, such as conductive or resistive targets. More specifically, EIT has many applications in medical imaging for the detection and location of bodily tumors since it is an affordable and non-invasive method, which aims to recover the internal conductivity of a body using voltage measurements resulting from applying low frequency current at electrodes placed at its surface. Mathematically, the reconstruction of the internal conductivity is a severely ill-posed inverse problem and yields a poor quality image reconstruction. To remedy this difficulty, at least in part, we regularize and solve the nonlinear minimization problem by the aid of a Krylov subspace-type method for the linear sub problem during each iteration. In EIT, a tumor or general anomaly can be modeled as a piecewise constant perturbation of a smooth background, hence, we solve the regularized problem on a subspace of relatively small dimension by the Flexible Golub-Kahan process that provides solutions that have sparse representation. For comparison, we use a well-known modified Gauss–Newton algorithm as a benchmark. Using simulations, we demonstrate the effectiveness of the proposed method. The obtained reconstructions indicate that the Krylov subspace method is better adapted to solve the ill-posed EIT problem and results in higher resolution images and faster convergence compared to reconstructions using the modified Gauss–Newton algorithm. 
    more » « less
  2. Abstract

    Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with “largepsmalln”, and relatively less work has been done to address problems withpandnbeing both large, though data with such a feature have now become more accessible than before, whereprepresents the number of variables andnstands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large‐sized survival data, where the sample sizenis large and the dimensionpof covariates is of non‐polynomial order of the sample sizen, or the so‐called NP‐dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high‐dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large‐scale data including genomic data.

     
    more » « less
  3. null (Ed.)
    Accurate, precise, and timely estimation of crop yield is key to a grower’s ability to proactively manage crop growth and predict harvest logistics. Such yield predictions typically are based on multi-parametric models and in-situ sampling. Here we investigate the extension of a greenhouse study, to low-altitude unmanned aerial systems (UAS). Our principal objective was to investigate snap bean crop (Phaseolus vulgaris) yield using imaging spectroscopy (hyperspectral imaging) in the visible to near-infrared (VNIR; 400–1000 nm) region via UAS. We aimed to solve the problem of crop yield modelling by identifying spectral features explaining yield and evaluating the best time period for accurate yield prediction, early in time. We introduced a Python library, named Jostar, for spectral feature selection. Embedded in Jostar, we proposed a new ranking method for selected features that reaches an agreement between multiple optimization models. Moreover, we implemented a well-known denoising algorithm for the spectral data used in this study. This study benefited from two years of remotely sensed data, captured at multiple instances over the summers of 2019 and 2020, with 24 plots and 18 plots, respectively. Two harvest stage models, early and late harvest, were assessed at two different locations in upstate New York, USA. Six varieties of snap bean were quantified using two components of yield, pod weight and seed length. We used two different vegetation detection algorithms. the Red-Edge Normalized Difference Vegetation Index (RENDVI) and Spectral Angle Mapper (SAM), to subset the fields into vegetation vs. non-vegetation pixels. Partial least squares regression (PLSR) was used as the regression model. Among nine different optimization models embedded in Jostar, we selected the Genetic Algorithm (GA), Ant Colony Optimization (ACO), Simulated Annealing (SA), and Particle Swarm Optimization (PSO) and their resulting joint ranking. The findings show that pod weight can be explained with a high coefficient of determination (R2 = 0.78–0.93) and low root-mean-square error (RMSE = 940–1369 kg/ha) for two years of data. Seed length yield assessment resulted in higher accuracies (R2 = 0.83–0.98) and lower errors (RMSE = 4.245–6.018 mm). Among optimization models used, ACO and SA outperformed others and the SAM vegetation detection approach showed improved results when compared to the RENDVI approach when dense canopies were being examined. Wavelengths at 450, 500, 520, 650, 700, and 760 nm, were identified in almost all data sets and harvest stage models used. The period between 44–55 days after planting (DAP) the optimal time period for yield assessment. Future work should involve transferring the learned concepts to a multispectral system, for eventual operational use; further attention should also be paid to seed length as a ground truth data collection technique, since this yield indicator is far more rapid and straightforward. 
    more » « less
  4. Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition. Recently, an efficient iterative spectral (eigendecomposition) method (ISM) has been proposed for this objective in the context of alternative clustering. However, ISM only provides theoretical guarantees for the Gaussian kernel. This greatly constrains ISM's usage since any kernel method using ISM is now limited to a single kernel. This work extends the theoretical guarantees of ISM to an entire family of kernels, thereby empowering ISM to solve any kernel method of the same objective. In identifying this family, we prove that each kernel within the family has a surrogate Φ matrix and the optimal projection is formed by its most dominant eigenvectors. With this extension, we establish how a wide range of IKDR applications across different learning paradigms can be solved by ISM. To support reproducible results, the source code is made publicly available on \url{https://github.com/ANONYMIZED} 
    more » « less
  5. null (Ed.)
    Due to the growing complexity and numerous manufacturing variation in safety-critical analog and mixed-signal (AMS) circuit design, rare failure detection in the high-dimensional variational space is one of the major challenges in AMS verification. Efficient AMS failure detection is very demanding with limited samples on account of high simulation and manufacturing cost. In this work, we combine a reversible network and a gating architecture to identify essential features from datasets and reduce feature dimension for fast failure detection. While reversible residual networks (RevNets) have been actively studied for its restoration ability from output to input without the loss of information, the gating network facilitates the RevNet to aim at effective dimension reduction. We incorporate the proposed reversible gating architecture into Bayesian optimization (BO) framework to reduce the dimensionality of BO embedding important features clarified by gating fusion weights so that the failure points can be efficiently located. Furthermore, we propose a conditional density estimation of important and non-important features to extract high-dimensional original input features from the low-dimension important features, improving the efficiency of the proposed methods. The improvements of our proposed approach on rare failure detection is demonstrated in AMS data under the high-dimensional process variations. 
    more » « less