Omni Calculator logo

How to Decode the Vigenère Cipher Without a Key

Hiding information away from undesired eyes and ears has been a constant need of humanity since at least the dawn of written language. And as sure as eggs are eggs, with attempts at hiding a message, inevitably come attempts at illicitly discovering its content.

Cryptography, the study and practice of altering information to prevent its interception, has been, since its inception, a quiet arms race between Alice and Bob, who try to transmit information securely, and Eve, the eavesdropper, who tries to find out what is not meant for her. This competition caused ciphers to become increasingly efficient and more and more difficult to break, prompting codebreakers to continually improve, refine their technologies, and develop new techniques. This is not dissimilar from the process of evolution, where predator and prey are inexorably bound in a cycle of improvements to maintain the upper hand.

The Vigenère cipher is no exception to this rule. Although it had been conceived in the 15th century, it gained prominence only in the 19th century, coinciding with a more rigorous approach to cryptography that was emerging in the Western world. With its fame came its damnation, and by the second half of the century, Charles Babbage (of Lovelace and Babbage fame) and — independently — Friedrich Kasiski developed a rigorous attack that made the cipher quite vulnerable.

Keep reading to learn how to decode the Vigenère cipher without a key: we will discover:

  • How to use the Vigenère cipher to decrypt and encrypt a message;
  • How to use brute force and dictionary attacks to decrypt the Vigenère cipher without the key;
  • How to use the properties of the cipher to discover the length of the key;

and much more! A simple example will help us to understand how to decode the Vigenère cipher​.

Before breaking something, it's better to get a good look. The Vigenère cipher is a textbook example of a polyalphabetic cipher (read more about them in our article about this cryptographic technique), a type of cipher that performs encryption by changing each letter of a message using individual substitution ciphers. You can imagine the Vigenère cipher as a series of Caesar ciphers with varying shifts.

To perform encryption using the Vigenère cipher, you need both the plaintext message and a key. From here on, we will assume that your key is shorter than the message itself (or we'd get dangerously close to the one-time pad cipher, the only method of encryption mathematically safe from breaking). Repeat the key (e.g., if your key is "CAT", use "CATCATCATCA...") as many times as it's needed to cover your entire message, and you have a set of letter pairs that will enable you to define:

  • The shift of the "base" alphabet; and
  • The shift of each individual substitution cipher.

This process means, in other words, that if you have a pair "H-C" (for example, if your message is "hello" and your key is "cat"), you can see the encryption process as a simple substitution cipher where the first letter of your encrypted alphabet becomes "H", which means H, I, J, (for A, B, C). Thus, the pair "H-C" becomes "J".

Decrypting an encoded message knowing the key is as easy as pie. Again, you have a pair of letters, one from the key, one from the cryptogram. This is another substitution cipher, where we start from the encrypted letter and shift this time backward by the key. If the message is "R" and the key is "C", that's R, Q, P (as in C, B, A), and we get our plaintext letter P. You can try encrypting and decrypting a text using the Vigenère cipher with our Vigenère cipher calculator, or learn more about how it works at our article about its operation!

It's not that complex, yet an encrypted message will appear as absolute gibberish at first glance. Not for long!

🙋 In our attempts at breaking the Vigenère cipher, we will assume that the plaintext is written in correct English. This step solves a lot of problems, trust us.

As per many ciphers, you can use brute force to break the Vigenère cipher. Brute force is a technical term that means simply trying without any logic until you find a solution. By the way, when you break the Vigenère cipher without a key, you may discover multiple solutions for a single ciphertext. For example, the ciphertext "regiubp" is either "penguin" if the key is "cat", or "article" if the key is "rnnasql". As a rule of thumb, this uncertainty decreases as the length of the ciphertext increases (and, more precisely, if at the same time the ratio between the lengths of the text and the key increases).

How to decrypt the Vigenère cipher using brute force? Start with one-letter keys, there are 26 of them, then move to two-letter combinations (technically, permutations with repetition of a sample of 2 over a total of 26; they amount to 676), and so on. Eventually, you'll find your plain text in front of you, but that can require quite a bit of time. Trying all five letter keys, one key per minute, would take you more than 22 years of uninterrupted work. A computer can significantly speed things up, and brute force is effective for keys up to a certain length, which depends on the computing power of your machine.

However, there are ways to significantly shorten the time needed, even for this simple attack. Let's see how to decode the Vigenère cipher using he dictionary.

Humans are humans, and humans are not good at being random (at least, scientists are pretty sure we are not). This trait means that, statistically, we may end up choosing keys that are not random sequences of characters; for example, words from a dictionary.

This knowledge serves as the foundation for a standard dictionary attack. If you know that the plain text is in English, it won't harm starting your brute force attack for four-letter keys from aahs, the first four-letter word that is not an acronym.

This attack can significantly shorten the time needed to successfully decrypt the Vigenère cipher without a key. The Merriam-Webster dictionary lists 6089 four-letter words. Pure permutations would give us 264=45697626^4=456976. That's 75 times more than the first number. Depending on the dictionary used, the list may be even shorter, but we'd pay the price of a heightened risk of the key missing from there.

This strategy, however, requires running through all the one-letter, two-letter, and so on, keys before reaching the correct key length. Any way to improve on this? Continue reading to discover how to decrypt the Vigenère cipher when the key length is known.

The Vigenère cipher has a weak spot: repeating patterns. As you repeat the key to cover the message, you run the risk of defining the same letter pairs again and again (note that each letter pair and the matching encoding is unique: "T" and "C" will always encode to "V", and decrypting "V" using "C" will always return "T". This means that **for long enough messages, you will eventually – thanks to the law of very large numbers – end up with the key covering the same letters of the plain text with the same letters of the repeated key. This happens more often as we consider:

  • Short, repeating group of letters (for example, "the", "and", "-ing"); and
  • Short keys (the number of repetitions goes to 0 as the length of the key matches the length of the message).

Take this ciphertext:

Example of a ciphertext

Do you notice something interesting? If you didn't, we've got you covered: here is the same ciphertext where every sequence of letters more than two characters long that repeats more than once is highlighted. That's quite a lot, isn't it?

Repeating patterns in a ciphertext

Since we don't know which letters of the key produced the first and second sequences "sui", we can't be sure that "sui" is decoded using the same three letters in the plain text.

However, we can count the number of characters between each instance of the same letters in a repeating sequence. That is to say, the number of letters between each "s" in the two "sui" sequences.

Repeat this enough times, and you'll find something interesting: the intervals are, by an overwhelming amount, multiples of four. If we had looked into shorter sequences (for example, two letters), there would have been exceptions. However, all the longer sequences don't suffer from this problem.

Here is a chart showing the spaces between repeating sequences.

Distance between repetition of sequences in the ciphertext

What is this number telling us? That it is very likely that the key length is four. This is to say that repeating words (or letter sequences) are being encoded by the same key letters. Imagine taking the classic "the quick brown fox jumps over the lazy dog", and using the key "refur" ("fox" in Icelandic). Here's the ciphertext you get:

klj klzgp vifas zfo nzggj sayi klj frqc iix

You can see that the word "the" and the first letters "ref" of the key match in both occurrences klj.

🙋 Read our article about the Kasiski examination to learn more about this analysis of ciphertexts.

Any language is vulnerable to this: English maybe even more so, as it's rich in letter combinations that repeat instead of single characters (we should thank the printing press for this, and the Normans): think of "th" in "this", "that", "there", etc. (originally the runic character thorn was used instead, þ), "ch" in "child", "chill", etc. (in old English often expressed by only "c", as in "cild", but gaining an "h" later on). Or even just combinations of letters as "sh" in "should", "she", etc.; or the ending "-ing" (this section alone contains 14 up to this point).

Shorter keys increase the risk of this being successful, as there are more chances of it repeating the same way on the same characters.

We can use this vulnerability in a straightforward manner: reduce the search time of brute force and dictionary attacks. Finding the key length of a ciphertext allows us to skip all the other possible lengths. In our case, we jump straight to four-letter keys.

In the case of our ciphertext, the brute force attack returned 6399 four-letter words that produced the sequence "the" in the decrypted plaintext. Using a dictionary attack reduced this number to 118.

Yes, this is a problem: how do we know when we succeeded at decrypting the message?

What would you think if we told you "etaoin shrdlu"? Probably nothing much. However, this sequence of apparently random letters is one of the many quirks of linguistics that can be used in attacking a cipher.

We are talking letter frequency. Take a sufficiently long English text, and start counting the number of occurrences of each letter: after not so long, it will be apparent that there's a clear hierarchy, with "E" and "T" being at the top ("E" with a significant lead), followed closely by "A", "O", "I", "N", "H", "S", and "R".

💡 Fun fact: this peculiarity of languages has been used by Samuel Morse when creating the Morse code: "E" and "T", being the most common letters, were awarded the shortest encoding (a dot and a dash respectively), with the six following letters in terms of frequency having two-symbol codes. Read more about this method of encoding at our Morse code translator.

We can use the letter frequency distribution to assess whether a decoded text is close to our target language. Simply take the decrypted text, count the frequency, and compute the Euclidean distance between the two arrays.

In our case, the best fit occurs for the word "bing" in a dictionary attack and "rtng" in a brute force attack. However, neither "bing" nor "rtng" returns English at all. On the other hand, "bing" gives us the word "the", which can suggest to us that at least part of the key is correct. This would work much better for a longer ciphertext, where the frequency of letters would more closely match the one computed on a larger corpus.

Continuing our dictionary attack, we can make some tweaks and expand our search to keywords that are close enough to the current best fit, let's say up to 25% larger. This returns a much smaller list of possible keys, 23 words, which we can quickly examine. Most of them end in "-ing", so let's focus on these.

Success!
The key is "ring", which returns a readable plain text: the poem that opens The Lord of the Rings.

Highlighting where key and text match

In the text above, you can see where sequences of letters as "the", "them", "these", and even longer groups like "oneringto", repeat nicely with the key.

Here is the plain text, without highlights.

Successful decryption of the ciphertext

Counting letters is not the only way to break the Vigenère cipher; we can also try comparing groups of four letters, for example, or other measures of fitness compared to the English language.

Note that all these methods, at one point or another, require you to know what is the target language.

To decrypt the Vigenère cipher without a key, you can use:

  • Brute force attacks (slow but never missing the key);
  • Dictionary attacks (faster, but if the key is not a word, they won't work); or
  • Statistics-based approaches (very efficient, but only on longer ciphertexts).

As the Vigenère cipher encodes a message by repeating the key multiple times as needed, on sufficiently long messages, it will likely happen that the same letters of the plain text are encoded by the same letters of the key. The distance between repeating sequences of letters encoded this way is a multiple of the key length.

To improve your chances against attempts to decrypt the Vigenère cipher without a key, you can increase your key length. In the ideal case of a key as long as the text, the decryption becomes virtually impossible as there would be no repetition of the key's letters, and attacks based on the Kasiski examination would fail.

This article was written by Davide Borchia and reviewed by Steven Wooding.