Posts Tagged letter frequency

Basic Cryptanalysis

Posted September 15, 2008 at 6:42 pm in Encryption, Programming | No Comments

My example is very basic and is intended more as an interesting method to begin the complicated and often impossible task of deciphering encrypted messages/codes.

The following C++ program accepts character input from the keyboard or via file redirection. It will count each character instance and report the amount of times each character was used.

Why would anyone want to do this? Depending on the method the original message was encoded with, it may help to determine which characters in the ciphertext are representing specific characters in the plaintext. Certain letters and combinations of letters are used much more frequently than others in the English language. The top twenty most used words in English are: “the of to in and a for was is that on at he with by be it an as his”. The list of the most used letters in the English language in descending order are: “e t a o i n s r h l d c u m f p g w y b v k x j q z”. The letter frequency of the first letter of a word in descending order is “t o a w b c d s f m r h i y e g l n o u j k”, the second letter’s frequency in a word is “h o e i a u n r t” and the third letter’s frequency is “e s a r n i”.

By using this program to compute these letter frequencies and comparing them to known lists as presented above we can gain some insight into the message and possibly crack the code if the code is encoded poorly (simple substitution cipher).

A few limitations to the program include: it treats characters as case-insensitive but can easily be modified to treat characters as case-sensitive and it only works with English alphabetic characters (a-Z). Modifying the program to accept non-standard characters (#, !, etc.) could be added just as easily.

The code: Continue reading..

Page 1 of 11