Correlation Coefficient Calculator (Matthews)
This correlation coefficient calculator can help you explore the world of statistics by explaining what a correlation coefficient is and how to calculate a correlation coefficient. Unlike other correlation coefficients, the Matthews equation is based on a binary classification, not on continuous values. The text below covers the Matthews correlation formula and other useful correlation statistics or metrics.
Statistics is a branch of mathematics that collects, analysis, and interprets data. It's used in medicine & physics, as well as by governments and many other types of organization looking to find the best way to spend their time and money. The Matthews correlation is a common measure for interpreting data. In factories, it's used for quality control; in medicine, it helps with testing for disease.
What is a correlation coefficient? - correlation coefficient definition
A correlation coefficient is a measure of the strength of a correlation, the statistical connection between two variables. In other words, it describes how changing the value of one variable will affect the value of another. There are many types of correlation coefficients: Pearson, Intraclass, or Rank. They're all normalized, i.e., they operate on the same scale from -1 to +1, where:
- 0 means no relationship between a set of variables
- +1 means a perfect positive relation, i.e., variables change in the same direction
- -1 means an ideal negative relation, i.e., variables change in the opposite direction
How to find the correlation coefficient?
Our correlation coefficient calculator uses the Matthews correlation formula that, despite the relative risk, is often used in medicine to do such things as evaluate the applicability of drugs. It also finds use the biological sciences as well as in machine learning - the scientific field that combines statistical models and algorithms to build computer systems that learn.
So, what is correlation coefficient proposed by Matthews? It measures the correlation between the predicted and observed binary classification of a sample. The Matthews correlation coefficient formula is based on the so called confusion matrix:
|Said is||Said is not|
|Actually is||True positive||False negative|
|Actually is not||False positive||True negative|
Confusing, right? Try to think of the columns being a prediction, while the rows are the true result.
The relation between these classifications is expressed with Matthews correlation formula:
MCC = [(TP * TN) - (FP * FN)] / √[(TP + FP)(TP + FN)(TN + FP)(TN + FN)]
TP- true positive
FP- false positive
TN- true negative
FN- false negative
The scale of this coefficient is defined a little differently to the correlation coefficient definition we mentioned before:
- +1 describes a perfect prediction
- 0 doesn't give you any valid information
- -1 represents a complete inconsistency between prediction and outcome
If you still have doubts about how to find correlation coefficient - keep reading, we'll give you an example calculation a bit further down.
Other correlation statistics
The Matthews correlation coefficient formula is believed to be the best determinant of the quality of a binary classification. If you aren't new to solving statistical problems, you might also find other scores relevant. In our calculator, you can find them by clicking the advanced mode button. They all range from 0% to 100%:
Sensitivity (true positive rate, recall) - a measure of how many actual positives have been correctly labeled as such:
Sensitivity = TP / (TP + FN)
Specificity (true negative rate, selectivity) - a measure of how many of the negative items in the data are, indeed, negative:
Specificity = TN / (TN + FP)
Precision - a proportion of actual positives to all the items predicted to be positive:
Precision = TP / (TP + FP)
Accuracy - a ratio of true result, both true positives and true negatives, to all elements:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
F1 score - a measure of the test's accuracy, based on precision and recall:
F1 score = (2 * TP) / (2 * TP + FN + FP)
Which of these give you the most information? Well, it depends on the data in your research. F1 score, as a function of precision and recall, is a better measure than accuracy when there are many points that are actually negative. Precision is valuable when you can't afford many false positives. Sensitivity is a similar case, but instead for false negatives values. In the end, it is down to you to decide which metric is the most significant.
How to use our calculator - a correlation coefficient example
You already have answers to "what does correlation mean?", "what is correlation coefficient formula?" and "what are some other correlation statistics?", but you may still not know how to calculate correlation coefficient on your own? Let's have a look at this correlation coefficient example:
Let's say that you work in a ceramic factory, and you need to check if some plates are correctly manufactured. You checked 100 plates and you said 15 of them have defects, but, in fact, 25 of them are defective. So you were right in only 10 of the cases. The confusion matrix looks like this:
|Said is defective||Said is not defective|
|Actually is||10 - TP||15 - FN|
|Actually is not||5 - FP||70 - TN|
MCC = [(10 * 70) - (5 * 15)] / √[(10 + 5)(10 + 15)(70 + 5)(70 + 15)] = 0.4042
It's not a bad outcome, but it's one that would probably cost you your job! When you look at sensitivity, you see that you only correctly identified 40% of the broken plates.
Have you enjoyed this calculator? Check out the birthday paradox calculator next!