Thu. Nov 21st, 2024

Nd small error, as reflected by the fact that using the GM(1,1) model has remarkably improved the success rates in predicting protein structural classes [59]. However, if the series concerned are not monotonic, the simulating effect of the GM(1,1) model would not be good and its error might be quite large. To overcome such a shortcoming, in this study we are to use a different grey system model called GM(2,1) [33], which can be effectively used to deal 18325633 with the oscillation series. To extract the serial information of Eq.4, let us consider the L components in its j-th column, i.e., m(1) 1,j m(1) 2,j 22948146 ?m(1) , as L,j an initial series. Obviously, the j-th column of the Eq.4 is an oscillation series but not monotonic as in the case investigated in [59]. To deal with such a problem, instead of the GM(1,1), let us adopt the GM(2,1) model here. According to the GM(2,1) model [33], we have the following 2nd-order grey differential equation with one variable: a(1) m(1) zaj1 m(1) zaj2 z(1) (k) bj k,j k,j (k 2,3, ???,L; where j 1,2, ???,20)6 6 {m(1) 6 3,j B 6 6 . 6 . 4 . {m(1) L,j and7 17 7 7 .7 .7 .56 7 6 a(1) m(1) 7 6 3,j 7 7 U 6 6 7 . 6 7 . . 4 5 a(1) m(1) L,j Accordingly, the V elements in Eq.2 are given by 8 > y3j{2 aj1 fj w1 > < y aj2 fj w2 > 3j{1 > : y bj f w3j ja(1) m(1) 2,j3 ?2?(j 1,2, ???,20)?3?where fi (i 1,2, ???,20) are the occurrence frequencies of the 20 different types of amino acids in the protein sample concerned, and w1 , w2 , and w3 are the weight factors that will be determined by optimizing the performance of the predictor, and their concrete values will be explicitly given in the SMER28 web footnote of Table 1. Substituting Eq.13 into Eq.2, we immediately obtain a feature vector with V 3|20 60 components. The 60D feature vector thus derived will be used to represent the samples of protein Thiazole Orange sequences for further study.??3. The SVM Operation EngineIn this study, the Support Vector Machine (SVM) algorithm was adopted to perform the prediction. The SVM software was implemented from the LIBSVM package [60]. The software thus obtained provided a simple interface by which the users can easilyPredicting Secretory Proteins of Malaria ParasiteTable 1. A comparison between iSMP-Grey and K-MID by the jackknife test.5. Performance EvaluationIn statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling (Kfold cross-validation) test, and jackknife test. However, as elaborated by a recent review [34] and demonstrated by Eqs.28?2 therein, among the three cross-validation methods, the jackknife test is deemed the least arbitrary and most objective because it can always yield a unique result for a given benchmark dataset, and hence has been widely recognized and increasingly used by investigators for examining the accuracy of various predictors (see, e.g., [36,38,39,41,44,47,61,62,63,64,65,66]). Accordingly, the jackknife test was also adopted in this study to examine the anticipated success rates of the current predictor. Also, to use a more intuitive and easier-to-understand method to measure the prediction quality, the rates of correct predictions for Pz the secretory proteins of malaria parasite in dataset and P { the non-secretory proteins of malaria parasite in dataset are respectively defined by [67] 8 z z > Lz N {m , > < z N > { N {m >L : , N{{ {Predictor iSMP-Greya K-MIDbSn ( ) 93.25 81.Sp ( ) 96.46. 99.Ac.Nd small error, as reflected by the fact that using the GM(1,1) model has remarkably improved the success rates in predicting protein structural classes [59]. However, if the series concerned are not monotonic, the simulating effect of the GM(1,1) model would not be good and its error might be quite large. To overcome such a shortcoming, in this study we are to use a different grey system model called GM(2,1) [33], which can be effectively used to deal 18325633 with the oscillation series. To extract the serial information of Eq.4, let us consider the L components in its j-th column, i.e., m(1) 1,j m(1) 2,j 22948146 ?m(1) , as L,j an initial series. Obviously, the j-th column of the Eq.4 is an oscillation series but not monotonic as in the case investigated in [59]. To deal with such a problem, instead of the GM(1,1), let us adopt the GM(2,1) model here. According to the GM(2,1) model [33], we have the following 2nd-order grey differential equation with one variable: a(1) m(1) zaj1 m(1) zaj2 z(1) (k) bj k,j k,j (k 2,3, ???,L; where j 1,2, ???,20)6 6 {m(1) 6 3,j B 6 6 . 6 . 4 . {m(1) L,j and7 17 7 7 .7 .7 .56 7 6 a(1) m(1) 7 6 3,j 7 7 U 6 6 7 . 6 7 . . 4 5 a(1) m(1) L,j Accordingly, the V elements in Eq.2 are given by 8 > y3j{2 aj1 fj w1 > < y aj2 fj w2 > 3j{1 > : y bj f w3j ja(1) m(1) 2,j3 ?2?(j 1,2, ???,20)?3?where fi (i 1,2, ???,20) are the occurrence frequencies of the 20 different types of amino acids in the protein sample concerned, and w1 , w2 , and w3 are the weight factors that will be determined by optimizing the performance of the predictor, and their concrete values will be explicitly given in the footnote of Table 1. Substituting Eq.13 into Eq.2, we immediately obtain a feature vector with V 3|20 60 components. The 60D feature vector thus derived will be used to represent the samples of protein sequences for further study.??3. The SVM Operation EngineIn this study, the Support Vector Machine (SVM) algorithm was adopted to perform the prediction. The SVM software was implemented from the LIBSVM package [60]. The software thus obtained provided a simple interface by which the users can easilyPredicting Secretory Proteins of Malaria ParasiteTable 1. A comparison between iSMP-Grey and K-MID by the jackknife test.5. Performance EvaluationIn statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling (Kfold cross-validation) test, and jackknife test. However, as elaborated by a recent review [34] and demonstrated by Eqs.28?2 therein, among the three cross-validation methods, the jackknife test is deemed the least arbitrary and most objective because it can always yield a unique result for a given benchmark dataset, and hence has been widely recognized and increasingly used by investigators for examining the accuracy of various predictors (see, e.g., [36,38,39,41,44,47,61,62,63,64,65,66]). Accordingly, the jackknife test was also adopted in this study to examine the anticipated success rates of the current predictor. Also, to use a more intuitive and easier-to-understand method to measure the prediction quality, the rates of correct predictions for Pz the secretory proteins of malaria parasite in dataset and P { the non-secretory proteins of malaria parasite in dataset are respectively defined by [67] 8 z z > Lz N {m , > < z N > { N {m >L : , N{{ {Predictor iSMP-Greya K-MIDbSn ( ) 93.25 81.Sp ( ) 96.46. 99.Ac.