Combining diverse classifiers using class specific precision index function
In this work the overall precision index (PIN) (Ko and Windle 2011) is extended to a class-specific precision index function (PIC) and used to combine the results of diverse classifiers. The PIN is a measure of overall local prediction accuracy for a classifier, whereas the PIC is a measure of local class prediction accuracy. The main motivation for extending the precision index function to be class-specific is to generate higher recall high precision prediction sets than can be used to identify potential candidates such as when classifying proteins to sub-cellular location sites and genes to biological processes. This work compares the performances of PIC and PIN combining methods to other combining methods including majority voting, stacked generalization, and cluster-selection for the well-known data sets: 1) vowel recognition data (Hastie, Tibshirani and Freidman 2009) which is a balanced data set, and 2) yeast protein localization data (Frank and Asuncion 2010) which is an unbalanced data set. When comparing the PIC method to other combining methods for the vowel recognition data, the PIC method was not able to generate high precision prediction sets. Similar results were obtained with an extension of the static cluster-selection method to the class level. Modified PIC curves were generated which used the two best classes predicted by a classifier with the associated posterior probabilities for each prediction point instead of using only the best predicted class with the maximum posterior probability. This enhancement increased the overall precision of the PIC method and extended the results to higher precisions for the vowel recognition data. A new weighted precision index which combines the PIC and PIN indexes was also developed which further extended the PIC results to higher precisions. The weighted precision index method outperformed all combining methods at higher precisions. Even though the PIC method was outperformed by other combining methods at lower precisions for the yeast protein localization data, it generated higher recalls in the high precision range. The class-specific cluster selection extension, developed as part of this work, outperformed all other combining methods for the yeast protein localization data set demonstrating great potential for this method to leverage class-specific performance. The overall precisions obtained for the yeast protein localization data set for both precision index methods and cluster-selection methods exceeded previous results reported by Chen (Chen 2010) where several classifying methods including: decision trees, neural networks, naive Bayes, and Bayesian model averaging methods were considered.