With this confusion matrix calculator, we aim to help you to calculate various metrics that can be used to assess your machine learning model's performance. The confusion matrix is the most prevalent way of analyzing the results of a classification machine learning model. It is thus a critical topic to understand in this field.
We have prepared this article to help you understand what a confusion matrix is and how to calculate a confusion matrix. We will also explain how to interpret the confusion matrix examples to make sure you understand the concept thoroughly.
What is a confusion matrix in machine learning?
You can see a confusion matrix as way of measuring the performance of a classification machine learning model. It summarizes the results of a classification problem using four metrics: true positive, false negative, false positive, and true negative.
However, the use of a confusion matrix goes way beyond just these four metrics. Using these four metrics, the confusion matrix allows us to assess the performance of the classification machine learning model using more versatile metrics, such as accuracy, precision, recall, and more.
We will talk about the definitions of these metrics in detail in the next section. You will be able, for example, to calculate accuracy from the confusion matrix all by yourself!
How to read a confusion matrix?
After understanding the definition of a confusion matrix in machine learning, it's time to talk about how to read a confusion matrix.
A confusion matrix has four components:
- True positive (
TP) - These are the correct predictions made that are labeled as positive. You can input this and the below values in the confusion matrix calculator's first section.
- False negative (
FN) - These are the wrong predictions made that are labeled as negative.
- False positive (
FP) - These are the wrong predictions made that are labeled as positive.
- True negative (
TN) - These are the correct predictions made that are labeled as negative.
Using these four components, we can calculate various metrics to help us in analyzing the performance of the machine learning model:
accuracy- Accuracy is the proportion of the correct predictions in the confusion matrix out of all predictions made. You can calculate accuracy from confusion matrix, as well as other metrics, using our tool.
precision- Precision is the proportion of the correct predictions in the confusion matrix out of all positive predictions.
recall- Recall is the proportion of correct predictions in the confusion matrix out of all positive classes.
F1 score- F1 score allows you to compare low-precision models to high-recall models, or vice versa, by using the harmonic mean of
recallto punish extreme values.
TPR- True positive rate is the probability that a positive prediction will be true.
FNR- False negative rate is the probability of getting a type II error, which is wrongly labeling a negative class as positive.
FPR- False positive rate is the probability of getting a type I error, which is wrongly labeling a positive class as negative.
TNR- True negative rate is the probability that a negative prediction will be true.
FDR- False discovery rate is the ratio of the number of false positive to the total number of positive predictions.
MCC- Matthews correlation coefficient, also known as the phi coefficient, is a metric that measures the association between two binary variables.
Next, let's look at the calculations of these metrics using the confusion matrix example.
Confusion matrix calculator with an example
Finally, it is time to talk about the calculations. We will use the confusion matrix example below to demonstrate our calculation. Let's take the classification results below as an example:
The calculation of the metrics are shown below:
accuracyfrom confusion matrix, use the formula below:
accuracy = (TP + TN) / (TP + FN + FP + TN)
accuracyfor this example is
(80 + 70) / (80 + 70 + 20 + 30) = 0.55.
precisioncan be calculated using the formula below:
precision = TP / (TP + FP)
precisionfor this example is
80 / (80 + 20) = 0.8.
recallusing the formula below:
recall = TP / (TP + FN)
recallfor this example is
80 / (80 + 70) = 0.53.
- F1 score
F1 score, use the following formula:
F1 score = (2 * precision * recall) / (precision + recall)
F1 scorefor this example is
(2 * 0.8 * 0.53) / (0.8 + 0.53) = 0.64.
- True positive rate
The true positive rate
TPR(also called sensitivity) can be calculated using the formula below:
TPR = TP / (TP + FN)
TPRfor this example is
80 / (80 + 70) = 0.53.
- False negative rate
We express the false negative rate
FNRin a similar way:
FNR = FN / (TP + FN)
FNRfor this example is
70 / (80 + 70) = 0.47.
- False positive rate
The false positive rate
FPRis as follows:
FPR = FP / (FP + TN)
FPRfor this example is
20 / (20 + 30) = 0.4.
- True negative rate
The true negative rate
TNR(also called specificity) is:
TNR = TN / (TN + FP)
TNRfor this example is
30 / (30 + 20) = 0.6.
- False discovery rate
We can calculate the false discovery rate as follows:
FDR = FP / (TP + FP)
FDRfor our example is
20 / (80 = 20) = 0.2.
- Matthews correlation coefficient
Finally, we can calculate the Matthews correlation coefficient using the formula below:
MCC = (TP * TN - FP * FN) / √((TP + FP) * (TN + FN) * (FP + TN) * (TP + FN))
(80 * 30 - 20 * 70) / √((80 + 20) * (30 + 70) * (20 + 30) * (80 + 70)) = 0.11547.
If all of these confusion matrix calculations look complicated, just use the confusion matrix calculator we have built for you!
What is machine learning?
Machine learning is a branch of artificial intelligence that involves using algorithms or statistical models to automate the data analysis process. It can be used to perform predictions using techniques such as regression and classification.
What is accuracy?
In the field of machine learning, accuracy is understood as the metric to assess the performance of a machine learning model. In general, the higher the accuracy, the more reliable the model.
What is classification?
Classification is a machine learning operation that involves taking in data and separating the data points into different groups based on their characteristics. For example, an algorithm can tell you if an email is considered a spam email or not.
What is regression?
Regression is a machine learning operation that involves using data points to predict a continuous outcome. For instance, a regression model can be used to predict future stock prices based on economic variables.
How do I find precision for 80 true and 20 false positive samples?
The precision for 80 true positive and 20 false positive samples is 0.83. You can find the answer following the steps below:
- Determine the true positive
- Determine the false positive
- Apply the precision formula:
precision = TP / (TP + FP)
What is the difference between accuracy and precision?
Precision is the percentage of correct predictions made out of all positive predictions, whereas accuracy is the percentage of correct predictions made out of all predictions.