Kasiski Examination: Unlocking the Vigenère Cipher
Before the mid-19th century, the Vigenère cipher held a fearsome reputation in cryptanalysis. It was explicitly designed to defeat the significant weakness of simple substitution ciphers: frequency analysis. Using a repeated keyword to implement different Caesar shifts on the text resulted in a polyalphabetic substitution, which effectively covered the actual statistical letter frequency, earning it the name "le chiffre indéchiffrable", or "the indecipherable cipher", for nearly three centuries.
Nevertheless, the essential framework of the Vigenère cipher had one critical flaw: it was based on a relatively short key that repeated. This periodicity introduced a subtle, repeating pattern into the ciphertext itself. The specific cryptanalytic technique developed to expose and exploit this vulnerability is the Kasiski examination (sometimes referred to as Kasiski's method or Kasiski's test). This application of the Kasiski method in cryptography transformed the supposedly unbreakable cipher into a solvable problem.
🙋 If you're interested in the Kasiski method and cryptography, don't forget also to check out our Vigenère cipher calculator!
The fundamental concept underpinning the Kasiski test was first successfully used by Charles Babbage, the pioneer of computing, in 1846. Babbage successfully applied the technique to break Vigenère ciphers, but due to his focus on other projects, he chose not to publish his work.
Credit for formalizing and publishing the complete cryptanalytic procedure went instead to Prussian army officer Friedrich W. Kasiski. He published his complete technique in 1863, marking the very first recognized solution to the polyalphabetic substitution cipher. Kasiski ensured that his complete solution became universally recognized under the name "Kasiski examination".
The principle behind Kasiski's analysis is based on the idea of synchronized key repetition. A Vigenère cipher utilizes the key repeatedly across the plaintext. If, for instance, the trigraph "THE" is repeated twice in the original message in such a way that it fits the same portion of the key each time, the resulting characters in the ciphertext will be identical.
For such a repetition to happen in synchronization, the distance (δ) between the initial points of the identical portions of the ciphertext needs to be an integer multiple of the unknown key length, m. In mathematical terms, this relationship is expressed as δ=k×m, where k is an integer. As a result, the actual key length must be a factor of the recorded distance. What is most significant, however, is that not all recurring ciphertexts automatically result from repeated plaintexts and key sequences, while others happen simply by random occurrence. It is on these premises that Kasiski examinations must go beyond absolute conclusions to probability.
The execution of the Kasiski examination for Vigenère ciphers involves a strict procedure designed solely to reveal the key length; the procedure consists of identifying recurring patterns. Before examining a precise Kasiski method example, let's review the steps together.
-
Find repeated sequences (n-grams): The first process in Kasiski analysis is to look for repeated sequences of letters in the ciphertext, most commonly beginning with trigraphs or larger n-grams. More extended sequences offer higher confidence because the probability of a lengthy segment repeating purely by chance is statistically low.
-
Measure the distance: For every identified repeated n-gram, the analyst notes the starting position of each occurrence. The difference between these positions yields the distance δ. This distance is the core data point for the cryptanalysis, representing a potential multiple of the key length.
-
Factor the distances: This is the heart of the Kasiski method. Each distance δ is completely factored. The factors represent all possible key lengths that could produce that distance. The analyst can now compile these factors to identify the number that occurs with the highest frequency across the different distances calculated. The highest frequency indicates that it has the highest probability of being the real key length.
-
Tabulate and tally: Gather all the factors from the repeated distances and see which one has the highest frequency. The factor with the highest frequency is probably the key length (m).
Practical Kasiski method example
Let's apply this Kasiski method to a fictional ciphertext segment to understand how the process reveals the hidden key length.
Suppose we analyze a long ciphertext and find three distinct repetitions, measuring the distance (δ) between the start of each pair:
Repeated string (n-gram) | Starting positions | Distance (δ) |
|---|---|---|
YSZWXA | 17, 353 | 336 |
MJEG | 103, 215 | 112 |
ELM | 292, 306 | 14 |
The next step in the Kasiski analysis is factoring these distances. We are hunting for a factor common to all three distances, as this will represent the actual keyword length m:
Distance (δ) | Key Length Factors (m) |
|---|---|
336 | 2, 3, 4, 6, 7, 8, 14, 16, 21, 24,... |
112 | 2, 4, 7, 8, 14, 16, 28, 56,... |
14 | 2, 7, 14 |
In the Kasiski examination above, the only factor common to the three distances was the number 14, so it became the most likely length for the keyword: m=14. The prominent occurrence of the numbers 2 and 7 should not be overlooked, as they are factors of 14; however, the highest common factor is 14 itself, which provides even stronger indicators for the periodicity of the key.
🔎 Did you know that you can also decode the Vigenère cipher without knowing the key? Check out our guide to discover how to do it!
Once the Kasiski examination yields a strong candidate for the key length, the result is typically confirmed using statistical measures, such as the Index of Coincidence (IoC), often associated with the Fisher-Yates shuffle or the Friedman test. The IoC is employed to validate the supposed length by rearranging the ciphertext in m columns. If the length is valid, every column should exhibit frequency patterns comparable to those of natural language text, such as the English language, thus validating the fact that the Vigenère cipher has been properly broken down into m independent Caesar ciphers.
Now that the length of the key is known to be m, the Vigenère cipher can be considered to be broken. The cryptanalyst can now perform frequency analysis on each column to determine which letter is used in each column to represent the key values. The resulting key values can now be used to unscramble the code. What was once considered a complex polyalphabetic cipher becomes simply m individual monoalphabetic cryptograms to be solved.
The Kasiski examination for Vigenère ciphers remains a foundational topic in cryptography because it formalized the concept that structural patterns are vulnerabilities. The entire premise of the Kasiski method in cryptography lies in reducing complexity by isolating repetition. It showed that machines can exploit even the most sophisticated substitution schemes if they rely on a repeated key.
For modern cryptography, it is imperative to grasp the Kasiski test. It is an essential historical reminder that any modern cryptography implementation must utilize key streams that are unpredictable, statistically random, and preferably of the same length as the message itself.
Friedrich Kasiski published the method in 1863. However, Charles Babbage discovered the technique independently much earlier, around 1846, but did not publish his findings.
The primary goal is to determine the length (m) of the keyword used in polyalphabetic substitution ciphers, such as the Vigenère cipher. Finding m allows the cipher to be broken into simpler monoalphabetic components.
This article was written by Agata Flak and reviewed by Steven Wooding.