Virtual Labs

Breaking the Mono-alphabetic Substitution Cipher

For a very brief theory of mono-alphabetic substitution ciphers and their cryptanalysis, click here

The mono-alphabetic substitution cipher is one of the classical encryption techniques where each letter in the plaintext is replaced by a corresponding letter from a fixed substitution alphabet. Unlike the shift cipher which uses a simple offset, substitution ciphers use a complete permutation of the alphabet, making them more complex but still vulnerable to cryptanalytic attacks.

How It Works

Key Space: The key is a permutation of the alphabet (26! possible keys for English)
Encryption: Each plaintext letter is replaced by its corresponding cipher letter
- Example: If a→J, b→I, c→B, then "cab" becomes "BJI"
Decryption: Each cipher letter is replaced back with its corresponding plaintext letter
- Example: "BJI" becomes "cab" using the reverse mapping

Mathematical Representation

Consider we have the plaintext "cryptography". By using the substitution table below, we can encrypt our plaintext as follows:

Plain:  a b c d e f g h i j k l m n o p q r s t u v w x y z
          Cipher: J I B R K T C N O F Q Y G A U Z H S V W M X L D E P

Encryption Process:

plaintext: c r y p t o g r a p h y
ciphertext: B S E Z W U C S J Z N E

Hence we obtain the ciphertext as "BSEZWUCSJZNE".

Security Analysis

The mono-alphabetic substitution cipher has several characteristics:

Large Key Space: 26! ≈ 4 × 10²⁶ possible keys (much larger than shift cipher)
Frequency Preservation: Letter frequencies are preserved in the ciphertext
Vulnerable to Statistical Analysis: Can be broken using frequency analysis
Pattern Preservation: Letter patterns and relationships are maintained

Cryptanalysis Techniques

Note: The frequency of occurrence of characters in the plaintext is "preserved" in the ciphertext. For instance, the most frequent character in the ciphertext is likely to be the encryption of the plaintext character "e" which is the most frequently occurring character in English.

The substitution cipher can be broken using:

Frequency Analysis:
- Compare letter frequencies with standard English
- Most frequent ciphertext letter likely maps to 'e'
- Second most frequent likely maps to 't' or 'a'
Pattern Analysis:
- Look for common English patterns (th, er, on, an, etc.)
- Identify repeated letter sequences
- Analyze word structure and length
Contextual Analysis:
- Use partial decryption to guess remaining letters
- Look for common short words (a, an, the, and, etc.)
- Apply linguistic knowledge and context clues
Dictionary Attack:
- Try common words and phrases
- Use known plaintext attacks if available

Breaking the Cipher

Unlike the shift cipher with only 25 possible keys, the substitution cipher requires more sophisticated analysis:

Statistical Attack: Use frequency analysis as the primary method
Hill Climbing: Gradually improve substitutions based on language patterns
Genetic Algorithms: Use computational methods for automated cryptanalysis

English Letter Frequencies

Standard frequencies used in cryptanalysis:

E: 12.70%, T: 9.06%, A: 8.17%, O: 7.51%, I: 6.97%
N: 6.75%, S: 6.33%, H: 6.09%, R: 5.99%, D: 4.25%

Historical Context

Mono-alphabetic substitution ciphers were widely used throughout history but became vulnerable once frequency analysis was developed. Modern cryptographic systems use polyalphabetic substitutions, block ciphers, and other advanced techniques to prevent statistical attacks.