Evaluation Metrics for Classification (Accuracy Score, Precision, Recall, Confusion Metric, F1-Score)

Evaluation Metric is one of the most critical parameters of any machine learning project. If we talk about classification, we can use various metrics to evaluate our model, such as accuracy score, precision, recall,f1-score, etc. All these metrics are chosen according to the use cases, such as if we talk about medical uses, recall is very important. If the data-set is highly imbalanced, f1-score might be a good measure of the performance, but it is difficult to interpret. Confusion metrics is also one such performance evaluation metric, and it helps in visualizing and understanding our classification results. At the end of this post, you must know about the following things:

1. 1. What is precision, recall?
2. 2. What are the various components of the confusion metric?
3. 2. What is the accuracy score?
4. 3. What is f1-score?
5. 4. When to use all these performance evaluation metrics?

Accuracy Score:

Accuracy score is defined as the ratio between the total number of correctly predicted points to that of the total number of data points in the dataset.

Accuracy Score = Correctly Predicted/Total Number of Data Points

Note: When the dataset is imbalanced accuracy score might be misleading. So in the case of the imbalanced dataset, we use f1-score to evaluate the model.

Confusion Metric:

It is one of the most used performance measurement tools for classification tasks. Here we will take an example of binary class classification to understand the confusion metric in a better way.

Each cell of the confusion metric, we will understand with the help of an example. Let suppose we are trying to evaluate a cancer detection model, in the model 0, 1 are representing patient don’t have cancer and patient have cancer respectively.

True Negative (TN):

This cell tells that the patient did not have cancer, and the model also predicted that the patient doesn’t have cancer.

False Negative (FN):

It tells that the patient had cancer, but the model predicted that the patient doesn’t have a disease. It can be very dangerous because the model is not able to detect cancer, and the patient may die. In most of the medical domains, False Negative should be as less as possible.

False Positive:

It tells that the patient did not have cancer, but the model predicted that the patient has cancer. If we talk about the medical domain, this scenario will not be hazardous because in the further diagnosis doctor will come to know that the patient did not have cancer, and the patient will not die.

True Positive:

This cell of the confusion metric tells that patient was having cancer and model detected it correctly.

Precision:

Precision is defined as the ratio of True Positive with that of Total Number of data points predicted as positive by our model.

Precision = TP/(TP+FP)

Precision is an important criterion when we want to reduce false positives from the model. Let suppose we trained a model to predict rain, if every time our model is telling that it will rain (Means very high number of false-positive), such a model does not make any sense.

Recall:

The recall is defined as the ratio of True Positive with that of the Total Number of positive data points in my data.

Recall = TP/(TP+FN)

The recall is an important criterion when we want to reduce the False Negative. Let suppose I want to train a model to predict whether the patient has cancer or not. If every time model is saying that the patient does not have cancer, it can be fatal for the patient.

F1-Score:

As discussed in the first section, if the dataset is imbalanced, the accuracy score might be a misleading criterion to evaluate the performance of the model. In such cases, we use f1-score to assess the model. It is the harmonic mean of precision and recall.

f1-score = (2*precision*recall)/(precision+recall)
Like accuracy score, higher the f1-score of the model better the model is.

I have practically implemented all these criteria using python on some dataset. You can see the code on this GitHub profile: You Tube Video Category Classification using Machine Learning

If there is any mistake in any section of the blog please let me know in the comment section I will try to improve it as soon as possible.