The cosine similarity calculator will teach you all there is to know about the cosine similarity measure, which is widely used in machine learning and other fields of data science.

Read on to discover:

  • What the cosine similarity is;
  • What the formula for the cosine similarity is;
  • Whether the cosine similarity can be negative; and
  • How to calculate the cosine similarity in Python.

How to use the cosine similarity calculator

Here's how to use this cosine similarity calculator:

  1. Enter your vectors a\vec{a} and b\vec{b} into the calculator, one element at a time.

    • More fields will appear as you need them.

    • Empty fields are treated as zeroes.

    • The vectors will automatically be extended to matching lengths.

  2. The cosine similarity SC\rm S_C (and derivative values, like the angle between the vectors, θ\theta, and the cosine distance, DC\rm D_C) are displayed below the vector inputs.

  3. The calculations for finding the cosine similarity are shown below the results so that you may understand your specific result.

What is the cosine similarity?

The cosine similarity measure indicates how similar two vectors are using the cosine of the angle between them. It gives no information on the comparative magnitudes of the vectors.

Cosine similarity is widely used in data analysis and data science, particularly in the field of natural language processing.

🔎 Remember what the cosine is? No? Then head on over to our cosine calculator.

The cosine similarity formula

It helps to know what the cosine similarity is conceptually, but how do we calculate it? Let's explore the formula.

The cosine similarity between two NN-dimensional vectors a\vec{a} and b\vec{b}, which is denoted as SC(a,b){\rm S_C}(\vec{a}, \vec{b}), is defined as the cosine of the angle between the two vectors, θ\theta:

SC(a,b)=cosθ\small {\rm S_C}(\vec{a}, \vec{b}) = \cos \theta

However, we don't always know the angle θ\theta — then what? Well, a more complex yet more helpful formula can be derived from the dot product. Let's investigate!

The dot product of the two vectors is denoted as ab\vec{a}\cdot\vec{b}, and is defined as:

ab=a b cosθ,\small \vec{a} \cdot \vec{b} = \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert\ \cos\theta,

where a\Vert\vec{a}\Vert is the magnitude of the vector a\vec{a} (and similar for b\Vert\vec{b}\Vert). We can rearrange this equation to become:

cosθ=aba b\small \cos\theta = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }

And so we have a handy formula for the cosine similarity that doesn't rely on the angle directly:

SC=aba b\small {\rm S_C} = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }

And what's more, we can rewrite the formula with sums:

SC=i=1Naibii=1Nai2i=1Nbi2\small {\rm S_C} = \frac{ \sum_{i=1}^N a_i b_i }{ \sqrt{\sum_{i=1}^N a_i^2} \sqrt{\sum_{i=1}^N b_i^2} }

Lovely! A formula for the cosine distance that relies only on the known elements of the vectors.

🔎 Want a refresher on the dot product and vector magnitudes? Check out our dot product calculator and the vector magnitude calculator.

The cosine similarity, SC\rm S_C, falls within the range [1,1][-1, 1], which of course, are the limits of the cosine function.

  • When the two vectors are in the same direction, θ=0\theta = 0^\circ and so SC=1\rm S_C = 1.

  • When the two vectors are orthogonal, θ=90\theta = 90^\circ and SC=0\rm S_C = 0.

  • When the two vectors are in opposite directions, θ=180\theta = 180^\circ and so the cosine similarity is -1.

🔎 Don't care about the cosine similarity and only want the angle between two vectors? Perhaps you'd like to visit our angle between two vectors calculator.

Note that we say "similar" and not "identical"SC\rm S_C only measures the angle and is not influenced by the comparative magnitudes. SC=1\rm S_C = 1 only means the two vectors' angles are the same, not that the two vectors are equal. See if you can prove this mathematically!

The cosine similarity is not defined when either vector is a zero-vector — a vector with all elements as zeroes and thus zero magnitude.

How do I calculate the cosine similarity?

To calculate the cosine similarity between two vectors, follow these steps:

  1. If you know the angle between the vectors, the cosine similarity is the cosine of that angle.

  2. If you don't know the angle, calculate the dot product of the two vectors.

  3. Calculate both vectors' magnitudes.

  4. Divide the dot product by the product of the magnitudes.

  5. The result is the cosine similarity.

An example of the cosine similarity

Let's look at an example of two 2D vectors and their cosine similarity. Let's use:

  • a=[1,5]\vec{a} = [1, 5] and
  • b=[1,3]\vec{b} = [-1, 3].

Already, we can visualize that the two vectors point in the same general direction, i.e., up. We can guess that θ<90\theta < 90^\circ and therefore that SC>0{\rm S_C} > 0, but let's calculate it properly using the formula we learned above.

  1. The dot product is:

    ab=1(1)+53=14\vec{a}\cdot\vec{b} = 1\cdot (-1) + 5 \cdot 3 = 14

  2. The vectors' magnitudes are:

    a=12+52=5.099\Vert\vec{a}\Vert = \sqrt{1^2+5^2} = 5.099

    and

    b=(1)2+32=3.162\Vert\vec{b}\Vert = \sqrt{(-1)^2+3^2} = 3.162

  3. The cosine similarity is, therefore:

    SC=(ab)/(a b){\rm S_C} = (\vec{a}\cdot\vec{b}) / (\Vert\vec{a}\Vert\ \Vert\vec{b}\Vert)
    SC=14/(5.0993.162)\textcolor{transparent}{\rm S_C} = 14 / (5.099 \cdot 3.162)
    SC=0.868\textcolor{transparent}{\rm S_C} = 0.868

Our guesses were right!

How to calculate the cosine similarity with Python

As it's arguably the best language for data science, you might need to calculate the cosine similarity in Python. If you're implementing it yourself, you can use NumPy's dot function for the dot product and the norm function from the numpy.linalg submodule for the vector magnitude. Here's how it might be done:

from numpy import dot
from numpy.linalg import norm

def calc_cosine_similarity(a, b):
    return dot(a,b)/(norm(a)*norm(b))

Then you can call the function as:

a = [1, 1, 1]
b = [3, 4, 5]
calc_cosine_similarity(a, b)
# delivers 0.9797958971132713

What is the cosine distance?

The cosine distance is used to measure the dissimilarity between two vectors. It's simply the complement of the cosine similarity, i.e.,

DC(a,b)=1SC(a,b)\small {\rm D_C}(\vec{a},\vec{b}) = 1 - {\rm S_C}(\vec{a},\vec{b})

However, the cosine distance is not a true distance metric, because it does not have the triangle inequality property, i.e., the inequality:

DC(a,c)DC(a,b)+DC(b,c),\small {\rm D_C}(\vec{a},\vec{c}) \le {\rm D_C}(\vec{a},\vec{b}) + {\rm D_C}(\vec{b},\vec{c}),

does not hold for all possible values of a\vec{a}, b\vec{b}, and c\vec{c}.

🔎 Visit our triangle inequality theorem calculator for more information on this theorem.

FAQ

Can cosine similarity be negative?

Yes, cosine similarity can be negative because the cosine of some angles can be negative. A negative cosine similarity means that the two vectors are more dissimilar than similar and that the angle between them is greater than 90°.

What does a cosine similarity of -1 mean?

A cosine similarity of -1 means that the two vectors point in opposite directions. This does not mean that their magnitudes are equal, but simply that their angle is 180°.

Rijk de Wet
SC(a,b) = (a·b) / (‖a‖ × ‖b‖)
Input the vectors a and b below. More fields will appear as you need them. The vectors will always have the same lengthempty fields are treated as zeros.
Vector a = [a₁, ..., aₙ]
a₁
a₂
a = [0]
Vector b = [b₁, ..., bₙ]
b₁
b₂
b = [0]
Check out 46 similar coordinate geometry calculators 📈
Average rate of changeBilinear interpolationCatenary curve… 43 more
People also viewed…

Bilinear interpolation

The bilinear interpolation calculator helps you estimate the value of an unknown function based on the method of bilinear interpolation.

Car crash force

With this car crash calculator, you can find out how dangerous car crashes are.

Ideal egg boiling

Quantum physicist's take on boiling the perfect egg. Includes times for quarter and half-boiled eggs.

Irregular trapezoid area

Use our irregular trapezoid area calculator to find the area of any trapezoid!
Copyright by Omni Calculator sp. z o.o.
Privacy, Cookies & Terms of Service