Classifier Accuracy Measures In Data Mining

accuracy of classifer

Classifier Accuracy

Evaluating & estimating the accuracy of classifiers is important in that it allows one to evaluate how accurately a given classifier will label future data, that, is, data on which the classifier has not been trained.

For example, suppose you used data from previous sales to train a classifier to predict customer purchasing behavior.

You would like an estimate of how accurately the classifier can predict the purchasing behavior of future customers, that is, future customer data on which the classifier has not been trained.

Accuracy estimates to help in the comparison of different classifiers.

Methods To Find Accuracy Of The Classifiers

  • Holdout Method
  • Random Subsampling 
  • K-fold Cross-Validation 
  • Bootstrap Methods

Estimating The Classifier Accuracy

Holdout & Random Subsampling

The holdout method is what we have alluded to so far in our discussions about accuracy.

In this method, the given data are randomly partitioned into two independent sets, a training set, and a test set. 

Typically, two-thirds of the data are allocated to the training set, and the remaining one-third is allocated to the test set. 

The training set is used to derive the model, whose accuracy is estimated with the test set.

The estimate is pessimistic because only a portion of the initial data is used to derive the model.

Random subsampling is a variation of the holdout method in which the holdout method is repeated k times. 

The overall accuracy estimate is taken as the average of the accuracies obtained from each iteration. (For prediction, we can take the average of the predictor error rates.)

Holdout random subsampling


In k-fold cross-validation, the initial data are randomly partitioned into k mutually exclusive subsets or “folds,” D1, D2,....., Dk, each of approximately equal size.

Training and testing are performed k times. In iteration i, partition Di is reserved as the test set, and the remaining partitions are collectively used to train the model.

That is, in the first iteration, subsets D2..., Dk collectively serves as the training set to obtain a first model, which is tested on D1; the second iteration is trained on subsets D1, D3,..., Dk and tested on D2; and so on. 

Unlike the holdout and random subsampling methods above, here, each sample is used the same number of times for training and once for testing.

For classification, the accuracy estimate is the overall number of correct classifications from the k iterations, divided by the total number of tuples in the initial data. 

For prediction, the error estimate can be computed as the total loss from the k iterations, divided by the total number of initial tuples.


Unlike the accuracy estimation methods mentioned above, the bootstrap method samples the given training tuples uniformly with replacement.

That is, each time a tuple is selected, it is equally likely to be selected again and readded to the training set.

For instance, imagine a machine that randomly selects tuples for our training set. In sampling with replacement, the machine is allowed to select the same tuple more than once.

Ensemble Methods -  Increasing The Accuracy

Are there general strategies for improving classifier and predictor accuracy?

YES, Bagging and boosting are two such techniques.


We first take an intuitive look at how bagging works as a method of increasing accuracy.

For ease of explanation, we will assume at first that our model is a classifier. Suppose that you are a patient and would like to have a diagnosis made based on your symptoms.

Instead of asking one doctor, you may choose to ask several. If a certain diagnosis occurs more than any of the others, you may choose this as the final or best diagnosis.

That is, the final diagnosis is made based on a majority vote, where each doctor gets an equal vote. Now replace each doctor by a classifier, and you have the basic idea behind bagging. 

Intuitively, a majority vote made by a large group of doctors may be more reliable than a majority vote made by a small group.

Given a set, D, of d tuples, bagging works as follows.

For iteration i (i = 1, 2,..., k), a training set, Di, of d tuples is sampled with replacement from the original set of tuples, D.

Note that the term bagging stands for bootstrap aggregation.
A classifier model, Mi, is learned for each training set, Di.

To classify an unknown tuple, X, each classifier, Mi, returns its class prediction, which counts as one vote.

The bagged classifier, M, counts the votes and assigns the class with the most votes to X.

Bagging can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple.
The bagged classifier often has significantly greater accuracy than a single classifier derived from D, the original training data.


We now look at the ensemble method of boosting. As in the previous section, suppose that as a patient, you have certain symptoms. 

Instead of consulting one doctor, you choose to consult several. 

Suppose you assign weights to the value or worth of each doctor’s diagnosis, based on the accuracies of previous diagnoses they have made. 

The final diagnosis is then a combination of the weighted diagnoses. This is the essence behind boosting.

In boosting, weights are assigned to each training tuple.

A series of k classifiers is iteratively learned. After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1, to “pay more attention” to the training tuples that were misclassified by Mi. 

The final boosted classifier, M, combines the votes of each individual classifier, where the weight of each classifier’s vote is a function of its accuracy. 

The boosting algorithm can be extended for the prediction of continuous values.


Partition: Training-and-Testing
  • It uses two independent data sets, e.g., training set (2/3), test set(1/3).
  • It is used for data set with a large number of samples.

  • It divides the data set into k subsamples.
  • And uses k-1 subsamples as training data and one sub-sample as test data --- k-fold cross-validation.
  • It is for data set with a moderate size.

Bootstrapping (leave-one-out)
  • It is for small size data.


Methods to find Classifier Accuracy
  • Holdout Method
  • Random Subsampling 
  • K-fold Cross-Validation 
  • Bootstrap Methods

Subscribe us for more content on Data. 


Post a Comment