Cosine Similarity Calculator

Created by Rijk de Wet
Reviewed by Anna Szczepanek, PhD and Steven Wooding
Last updated: Mar 09, 2023

The cosine similarity calculator will teach you all there is to know about the cosine similarity measure, which is widely used in machine learning and other fields of data science.

• What the cosine similarity is;
• What the formula for the cosine similarity is;
• Whether the cosine similarity can be negative; and
• How to calculate the cosine similarity in Python.

How to use the cosine similarity calculator

Here's how to use this cosine similarity calculator:

1. Enter your vectors $\vec{a}$ and $\vec{b}$ into the calculator, one element at a time.

• More fields will appear as you need them.

• Empty fields are treated as zeroes.

• The vectors will automatically be extended to matching lengths.

2. The cosine similarity $\rm S_C$ (and derivative values, like the angle between the vectors, $\theta$, and the cosine distance, $\rm D_C$) are displayed below the vector inputs.

3. The calculations for finding the cosine similarity are shown below the results so that you may understand your specific result.

What is the cosine similarity?

The cosine similarity measure indicates how similar two vectors are using the cosine of the angle between them. It gives no information on the comparative magnitudes of the vectors.

Cosine similarity is widely used in data analysis and data science, particularly in the field of natural language processing.

🔎 Remember what the cosine is? No? Then head on over to our cosine calculator.

The cosine similarity formula

It helps to know what the cosine similarity is conceptually, but how do we calculate it? Let's explore the formula.

The cosine similarity between two $N$-dimensional vectors $\vec{a}$ and $\vec{b}$, which is denoted as ${\rm S_C}(\vec{a}, \vec{b})$, is defined as the cosine of the angle between the two vectors, $\theta$:

$\small {\rm S_C}(\vec{a}, \vec{b}) = \cos \theta$

However, we don't always know the angle $\theta$ — then what? Well, a more complex yet more helpful formula can be derived from the dot product. Let's investigate!

The dot product of the two vectors is denoted as $\vec{a}\cdot\vec{b}$, and is defined as:

$\small \vec{a} \cdot \vec{b} = \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert\ \cos\theta,$

where $\Vert\vec{a}\Vert$ is the magnitude of the vector $\vec{a}$ (and similar for $\Vert\vec{b}\Vert$). We can rearrange this equation to become:

$\small \cos\theta = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }$

And so we have a handy formula for the cosine similarity that doesn't rely on the angle directly:

$\small {\rm S_C} = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }$

And what's more, we can rewrite the formula with sums:

$\small {\rm S_C} = \frac{ \sum_{i=1}^N a_i b_i }{ \sqrt{\sum_{i=1}^N a_i^2} \sqrt{\sum_{i=1}^N b_i^2} }$

Lovely! A formula for the cosine distance that relies only on the known elements of the vectors.

🔎 Want a refresher on the dot product and vector magnitudes? Check out our dot product calculator and the vector magnitude calculator.

The cosine similarity, $\rm S_C$, falls within the range $[-1, 1]$, which of course, are the limits of the cosine function.

• When the two vectors are in the same direction, $\theta = 0^\circ$ and so $\rm S_C = 1$.

• When the two vectors are orthogonal, $\theta = 90^\circ$ and $\rm S_C = 0$.

• When the two vectors are in opposite directions, $\theta = 180^\circ$ and so the cosine similarity is -1.

🔎 Don't care about the cosine similarity and only want the angle between two vectors? Perhaps you'd like to visit our angle between two vectors calculator.

Note that we say "similar" and not "identical"$\rm S_C$ only measures the angle and is not influenced by the comparative magnitudes. $\rm S_C = 1$ only means the two vectors' angles are the same, not that the two vectors are equal. See if you can prove this mathematically!

The cosine similarity is not defined when either vector is a zero-vector — a vector with all elements as zeroes and thus zero magnitude.

How do I calculate the cosine similarity?

To calculate the cosine similarity between two vectors, follow these steps:

1. If you know the angle between the vectors, the cosine similarity is the cosine of that angle.

2. If you don't know the angle, calculate the dot product of the two vectors.

3. Calculate both vectors' magnitudes.

4. Divide the dot product by the product of the magnitudes.

5. The result is the cosine similarity.

An example of the cosine similarity

Let's look at an example of two 2D vectors and their cosine similarity. Let's use:

• $\vec{a} = [1, 5]$ and
• $\vec{b} = [-1, 3]$.

Already, we can visualize that the two vectors point in the same general direction, i.e., up. We can guess that $\theta < 90^\circ$ and therefore that ${\rm S_C} > 0$, but let's calculate it properly using the formula we learned above.

1. The dot product is:

$\vec{a}\cdot\vec{b} = 1\cdot (-1) + 5 \cdot 3 = 14$

2. The vectors' magnitudes are:

$\Vert\vec{a}\Vert = \sqrt{1^2+5^2} = 5.099$

and

$\Vert\vec{b}\Vert = \sqrt{(-1)^2+3^2} = 3.162$

3. The cosine similarity is, therefore:

${\rm S_C} = (\vec{a}\cdot\vec{b}) / (\Vert\vec{a}\Vert\ \Vert\vec{b}\Vert)$
$\textcolor{transparent}{\rm S_C} = 14 / (5.099 \cdot 3.162)$
$\textcolor{transparent}{\rm S_C} = 0.868$

Our guesses were right!

How to calculate the cosine similarity with Python

As it's arguably the best language for data science, you might need to calculate the cosine similarity in Python. If you're implementing it yourself, you can use NumPy's dot function for the dot product and the norm function from the numpy.linalg submodule for the vector magnitude. Here's how it might be done:

from numpy import dot
from numpy.linalg import norm

def calc_cosine_similarity(a, b):
return dot(a,b)/(norm(a)*norm(b))


Then you can call the function as:

a = [1, 1, 1]
b = [3, 4, 5]
calc_cosine_similarity(a, b)
# delivers 0.9797958971132713


What is the cosine distance?

The cosine distance is used to measure the dissimilarity between two vectors. It's simply the complement of the cosine similarity, i.e.,

$\small {\rm D_C}(\vec{a},\vec{b}) = 1 - {\rm S_C}(\vec{a},\vec{b})$

However, the cosine distance is not a true distance metric, because it does not have the triangle inequality property, i.e., the inequality:

$\small {\rm D_C}(\vec{a},\vec{c}) \le {\rm D_C}(\vec{a},\vec{b}) + {\rm D_C}(\vec{b},\vec{c}),$

does not hold for all possible values of $\vec{a}$, $\vec{b}$, and $\vec{c}$.

FAQ

Can cosine similarity be negative?

Yes, cosine similarity can be negative because the cosine of some angles can be negative. A negative cosine similarity means that the two vectors are more dissimilar than similar and that the angle between them is greater than 90°.

What does a cosine similarity of -1 mean?

A cosine similarity of -1 means that the two vectors point in opposite directions. This does not mean that their magnitudes are equal, but simply that their angle is 180°.

Rijk de Wet
SC(a,b) = (a·b) / (‖a‖ × ‖b‖)
Input the vectors a and b below. More fields will appear as you need them. The vectors will always have the same lengthempty fields are treated as zeros.
Vector a = [a₁, ..., aₙ]
a₁
a₂
a = [0]
Vector b = [b₁, ..., bₙ]
b₁
b₂
b = [0]
People also viewed…

Absolute value inequalities

The absolute value inequalities calculator can show you step-by-step how to deal with various inequalities that involve the absolute value of a linear expression - with conversion between inequality and interval notation included!

Coffee kick

A long night of studying? Or maybe you're on a deadline? The coffee kick calculator will tell you when and how much caffeine you need to stay alert after not sleeping enough 😀☕ Check out the graph below!

SVD

This SVD calculator will help you discover what the singular value decomposition of matrices is all about.

Secretary problem (Valentine's day)

Use the dating theory calculator to enhance your chances of picking the best lifetime partner.