skip to main content


Title: Prediction of Protein Tertiary Structure via Regularized Template Classification Techniques
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.  more » « less
Award ID(s):
1661391
NSF-PAR ID:
10172567
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Molecules
Volume:
25
Issue:
11
ISSN:
1420-3049
Page Range / eLocation ID:
2467
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We discuss the use of the Singular Value Decomposition as a model reduction technique in Protein Tertiary Structure prediction, alongside to the uncertainty analysis associated to the tertiary protein predictions via Particle Swarm Optimization (PSO). The algorithm presented in this paper corresponds to the category of the decoy-based modelling, since it first finds a good protein model located in the low energy region of the protein energy landscape, that is used to establish a three-dimensional space where the free-energy optimization and search is performed via an exploratory version of PSO. The ultimate goal of this algorithm is to get a representative sample of the protein backbone structure and the alternate states in an energy region equivalent or lower than the one corresponding to the protein model that is used to establish the expansion (model reduction), obtaining as result other protein structures that are closer to the native structure and a measure of the uncertainty in the protein tertiary protein reconstruction. The strength of this methodology is that it is simple and fast, and serves to alleviate the ill-posed character of the protein structure prediction problem, which is very highly dimensional, improving the results when it is performed in a good protein model of the low energy region. To prove this fact numerically we present the results of the application of the SVD-PSO algorithm to a set of proteins of the CASP competition whose native’s structures are known. 
    more » « less
  2. We discuss applicability of Principal Component Analysis and Particle Swarm Optimization in protein tertiary structure prediction. The proposed algorithm is based on establishing a low-dimensional space where the sampling (and optimization) is carried out via Particle Swarm Optimizer (PSO). The reduced space is found via Principal Component Analysis (PCA) performed for a set of previously found low- energy protein models. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. Our results show that PSO improves the energy of the best decoy used in the PCA considering an adequate number of PCA terms. 
    more » « less
  3. Summary

    We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p≫n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features. We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher’s discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher’s discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extending LDA to the high dimensional setting and explore their relationships with our proposal.

     
    more » « less
  4. Multilinear discriminant analysis (MLDA), a novel approach based upon recent developments in tensor-tensor decomposition, has been proposed recently and showed better performance than traditional matrix linear discriminant analysis (LDA). The current paper presents a nonlinear generalization of MLDA (referred to as KMLDA) by extending the well known ``kernel trick" to multilinear data. The approach proceeds by defining a new dot product based on new tensor operators for third-order tensors. Experimental results on the ORL, extended Yale B, and COIL-100 data sets demonstrate that performing MLDA in feature space provides more class separability. It is also shown that the proposed KMLDA approach performs better than the Tucker-based discriminant analysis methods in terms of image classification. 
    more » « less
  5. Linear discriminant analysis (LDA) is widely used for dimensionality reduction under supervised learning settings. Traditional LDA objective aims to minimize the ratio of squared Euclidean distances that may not perform optimally on noisy data sets. Multiple robust LDA objectives have been proposed to address this problem, but their implementations have two major limitations. One is that their mean calculations use the squared l2-norm distance to center the data, which is not valid when the objective does not use the Euclidean distance. The second problem is that there is no generalized optimization algorithm to solve different robust LDA objectives. In addition, most existing algorithms can only guarantee the solution to be locally optimal, rather than globally optimal. In this paper, we review multiple robust loss functions and propose a new and generalized robust objective for LDA. Besides, to better remove the mean value within data, our objective uses an optimal way to center the data through learning. As one important algorithmic contribution, we derive an efficient iterative algorithm to optimize the resulting non-smooth and non-convex objective function. We theoretically prove that our solution algorithm guarantees that both the objective and the solution sequences converge to globally optimal solutions at a sub-linear convergence rate. The experimental results demonstrate the effectiveness of our new method, achieving significant improvements compared to the other competing methods. 
    more » « less