Correlation Coefficient Calculator (Matthews)

This correlation coefficient calculator can help you explore the world of statistics by explaining what a correlation coefficient is and how to calculate a correlation coefficient. Unlike other correlation coefficients, the Matthews equation is based on a binary classification, not on continuous values. The text below covers the Matthews correlation formula and other useful correlation statistics or metrics.

Statistics is a branch of mathematics that collects, analysis, and interprets data. It's used in medicine & physics, as well as by governments and many other types of organization looking to find the best way to spend their time and money. The Matthews correlation is a common measure for interpreting data. In factories, it's used for quality control; in medicine, it helps with testing for disease.

Before we dive in, you should know that we have many other statistics calculators! Check out our Pearson correlation calculator and Spearman's correlation calculator calculators to discover other correlation coefficients. Our p-value calculator may come in handy during your statistical journey as well.

What is a correlation coefficient? - correlation coefficient definition

A correlation coefficient is a measure of the strength of a correlation, the statistical connection between two variables. In other words, it describes how changing the value of one variable will affect the value of another. There are many types of correlation coefficients: Pearson, Intraclass, or Rank. They're all normalized, i.e., they operate on the same scale from -1 to +1, where:

0 means no relationship between a set of variables
+1 means a perfect positive relation, i.e., variables change in the same direction
-1 means an ideal negative relation, i.e., variables change in the opposite direction

How to find the correlation coefficient?

Our correlation coefficient calculator uses the Matthews correlation formula that, despite the relative risk, is often used in medicine to do such things as evaluate the applicability of drugs. It also finds use the biological sciences as well as in machine learning - the scientific field that combines statistical models and algorithms to build computer systems that learn.

So, what is correlation coefficient proposed by Matthews? It measures the correlation between the predicted and observed binary classification of a sample. The Matthews correlation coefficient formula is based on the so called confusion matrix:

	Said is	Said is not
Actually is	True positive	False negative
Actually is not	False positive	True negative

Confusing, right? Try to think of the columns as a prediction, while the rows are the true result.

The relation between these classifications is expressed with Matthews correlation formula:

MCC = [(TP × TN) - (FP × FN)] / √[(TP + FP)(TP + FN)(TN + FP)(TN + FN)]

where:

TP — true positive
FP — false positive
TN — true negative
FN — false negative

The scale of this coefficient is defined a little differently from the correlation coefficient definition we mentioned before:

+1 describes a perfect prediction
0 doesn't give you any valid information
-1 represents a complete inconsistency between prediction and outcome

If you still have doubts about how to find correlation coefficient — keep reading, we'll give you an example calculation a bit further down.

Other correlation statistics

The Matthews correlation coefficient formula is believed to be the best determinant of the quality of a binary classification. If you aren't new to solving statistical problems, you might also find other scores relevant. In our calculator, you can find them by opening the advanced metrics section of the calculator. They all range from 0% to 100%:

Sensitivity (true positive rate, recall) — a measure of how many actual positives have been correctly labeled as such:

Sensitivity = TP / (TP + FN)
Specificity (true negative rate, selectivity) — a measure of how many of the negative items in the data are, indeed, negative:

Specificity = TN / (TN + FP)
Precision — a proportion of actual positives to all the items predicted to be positive:

Precision = TP / (TP + FP)
Accuracy — a ratio of true result, both true positives and true negatives, to all elements:

Accuracy = (TP + TN) / (TP + TN + FP + FN)
F1 score — a measure of the test's accuracy, based on precision and recall:

F1 score = (2 × TP) / (2 × TP + FN + FP)

Which of these gives you the most information? Well, it depends on the data in your research. F1 score, as a function of precision and recall, is a better measure than accuracy when there are many points that are actually negative. Precision is valuable when you can't afford many false positives. Sensitivity is a similar case, but instead for false negative values. In the end, it is down to you to decide which metric is the most significant.

How to use our calculator — a correlation coefficient example

You already have answers to "what does correlation mean?", "what is correlation coefficient formula?" and "what are some other correlation statistics?", but you may still not know how to calculate correlation coefficient on your own? Let's have a look at this correlation coefficient example:

Let's say that you work in a ceramic factory, and you need to check if some plates are correctly manufactured. You checked 100 plates and you said 15 of them have defects, but, in fact, 25 of them are defective. So you were right in only 10 of the cases. The confusion matrix looks like this:

	Said is defective	Said is not defective
Actually is	10 - TP	15 - FN
Actually is not	5 - FP	70 - TN

MCC = [(10 × 70) - (5 × 15)] / √[(10 + 5)(10 + 15)(70 + 5)(70 + 15)] = 0.4042

It's not a bad outcome, but it's one that would probably cost you your job! When you look at sensitivity, you see that you only correctly identified 40% of the broken plates.

Have you enjoyed this calculator? Check out the birthday paradox calculator next!