Title : Monoalphabetic cipher cryptanalysis
Author : mythrandir
---[ Phrack Magazine Volume 7, Issue 51 September 01, 1997, article 13 of 17
-------------------------[ Monoalphabetic Cryptanalysis (Cyphers, Part One)
--------[ Jeff Thompson aka 'Mythrandir' <jwthomp@cu-online.com>
Written for Phrack and completed on Sunday, August 31st, 1997.
---------
First a quick hello to all of those I met at DefCon this year. It was
incredible fun to finally put faces to many of the people I have been talking
with for some time. It was truly was a treat to meet so many others who are
alive with the spirit of discovery.
----------
This is the first in a series of articles on Cryptology that I am writing.
The goals of these articles will be to attempt to convey some of the excitement
and fun of cyphers. A topic of much discussion in regards to cryptography
currently, is about computer based cyphers such as DES, RSA, and the PGP
implementation. I will not be discussing these. Rather, these articles will
cover what I will term classical cryptology. Or cryptology as it existed
before fast number crunching machines came into existance. These are the sorts
of cyphers which interested cryptographers throughout time and continue to be
found even to this very day. Even today, companies are producing software
whose encryption methods are attackable. You will find these commonly among
password protection schemes for software programs. Through the course of these
articles I will explain in practical terms several common cypher types and
various implementations of them as well as cryptanalytic techniques for
breaking these cyphers.
Creating cyphers is fun and all, but the real excitement and often times tedium
is found in Cryptanalysis. Many of the ideas presented in these articles will
based on three sources. The following two books: The Codebreakers by David
Kahn (ISBN: 0-684-83130-9) and Decrypted Secrets by F.L. Bauer
(ISBN: 3-540-60418-9). Both authors have put together wonderful books which
both cover the history and methods of Cryptology. Do yourself and the authors
a favor and purchase these books. You will be very pleased with the lot.
Finally, a miniscule amount of these articles will be written based on my own
personal experience.
The fun is in the journey and I welcome you on what is certain to be an
interesting trip. Please feel free to raise questions, engage me in
discussions, correct me, or simply offer suggestions at jwthomp@cu-online.com.
Please be patient with me as I am traveling extensively currently, and may be
away from the computer at length occasionally.
Out the door and into the wild...
--Monoalphabetic Cyphers
Monoalphabetic cyphers are often currently found in simple cryptograms in books
and magazines. These are just simple substitution cyphers. This does not
mean that they are always simple for the beginning amateur to solve.
Three common monoalphabetic cyphers which are used are substitution, cyclical,
and keyed cyphers.
-Substitution Cyphers
By taking an alphabet and replacing each letter with another letter in a
unique fashion you create a simple monoalphabetic cypher.
Plaintext Alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet Z I K M O Q S U W Y A C E B D F H J L N P R T V X G
Plaintext Message
The blue cow will rise during the second moon from the west field.
Cyphertext Message
nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm.
-Cyclical Cyphers
By taking an alphabet and aligning it with a rotated alphabet you get a
cyclical cypher. For example:
Plaintext Alphabet A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
Indeed, you may recognize this cypher as a ROT13 which is commonly used on
news groups to obscure messages.
-Keyed Cypher
Another way to create a monoalphabetic cypher is to choose a keyword or phrase
as the beginning of the cypher alphabet. Usually, only the unique letters from
the phrase are used in order to make sure the plaintext to cyphertext behaves
in a one to one fashion.
For example:
Plaintext Alphabet: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Cypher Alphabet L E T O S H D G F W A R B C I J K M N P Q U V X Y Z
The passphrase in this cypher is "Let loose the dogs of war" The advantage of
such a system is that the encryption method is easy to remember. Also, a
method of key change can be created without ever having to distribute the keys.
For example, one could use the 4 words at a time of some piece of literature.
Every message could use the next four words. Indeed, this change could occur
more frequently, but that is a subject for another article.
-Bipartite Substitution
Bipartite substition is the use of symbol pairs to represent plaintext. Later
we will see that this sort of substitution lends itself to be easily made more
difficult to analyze. Two examples of this are:
1 2 3 4 5 A B C D E
1 A B C D E A A B C D E
2 F G H I J B F G H I J
3 K L M N O C K L M N O
4 P Q R S T or D P Q R S T
5 U V W X Y E U V W X Y
6 Z 0 1 2 3 F Z 0 1 2 3
7 4 5 6 7 8 G 4 5 6 7 8
9 9 . - ? , H 9 . - ? ,
Obviously, the letters do not need to be placed in this order as their solutions
would not be that difficult to guess.
--Cryptanalysis
Previously we created a cyphered message:
nuo icpo kdt twcc jwlo mpjwbs nuo lokdbm eddb qjde nuo toln qwocm.
If one were to receive this message, figuring out its contents might seem
fairly daunting. However, there are some very good methods for recovering the
plaintext from the cyphertext. The following discussion will work under the
assumption that we know the cyphers with which we are dealing are
monoalphabetics.
-Frequency Analysis
The first method we will use is frequency analysis. Natural languages have
many qualities which are very useful for the analysis of cyphertext. Languages
have letters which occur more commonly in text, collections of letters which
are more frequent, patterns in words, and other related letter occurances.
Counting up the occurances of letters we find that there are...
letter occurances
b 3
c 4
d 5
e 2
i 1
j 3
k 2
l 3
m 3
n 4
o 8
p 2
q 2
s 1
t 3
u 3
w 4
The order of greatest frequency to least is:
8 5 4 3 2 1
{o} {d} {c n w} {b j l m t u} {e k p q} {i s}
If this sort of analysis were run on many volumes of english you would find that
a pattern would emerge. It would look like this:
{e} {t} {a o i n} {s r h} {l d} {c u m f} {p g w y b} {v k} {x j q z}
You will notice an immediate correlation between e and o. However, for the
rest of the letters we can not be very certain. In fact, we can not be very
certain about e either.
Since this text is short it is helpful to take a look at some of the other
behaviors of this text.
Counting up the first, second, third, and last letters of the words in this
text we find the following frequencies:
First Letter in word Occurances
e 1
i 1
j 1
k 1
l 1
m 1
n 3
q 2
t 2
Order:
n q t e i j k l m
Second letter in word Occurances
c 1
d 2
i 1
n 1
o 2
p 1
u 3
w 3
Order:
u w d o c i n p
Third letter in word Occurances
c 1
d 2
i 1
k 1
l 2
o 4
p 1
t 1
u 1
Order:
o d l c i k p t u
Last letter in word Occurances
b 1
c 1
e 1
m 1
n 1
o 5
s 1
t 1
English frequency for first letter:
t a o m h w
Second letter:
h o e i a u
Third letter:
e s a r n i
Last letter:
e t s d n r
Noticing the higher frequency count for 'o' in the third and last letters of
words in addition to its absence as a first letter in any words gives us strong
reason to believe that 'o' substitutes for 'e'. This is the first wedge into
solving this cypher.
However, do not be fooled by the apparent strengths of frequency analysis.
Entire books have been written without the use of some letters in the English
alphabet. For instance The Great Gatsby was written without using the letter
'e' in one word of the book.
Other items to analyze in cyphertext documents is the appearance of letters in
groups. These are called bigrams and trigrams. For example, 'th' is a very
common letter pairing in the english language. Also, as no surprise 'the' is
a very common trigram. Analysis of english documents will find these results
for you.
So now that that we have developed a simple way of starting to attack cyphers
lets examine a few ways to make them more difficult to break.
--Strengthening Cyphers
-Removing word and sentence boundaries
A simple way to complicate decypherment of a cyphertext is to remove all
spacing and punctuation. This makes it more difficult to perform a frequency
analysis on letter positions. However, it is possible to make reasonable
guesses as to word positions once yoy begin to study the document. Another
method is to break the cyphertext into fixed blocks. For example after every
four letters a space is placed.
The previous cypher text would appear as this:
nuoicpokdttwccjwlompjwbsnuolokdbmeddbqjdenuotolnqwocm.
or this:
nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc m
You will notice that the above line ends with a single character. This gives
away the end of the text and would be better served by the placement of nulls,
or garbage characters. The above line becomes:
nuoi cpok dttw ccjw lomp jwbs nuol okdb medd bqjd enuo toln qwoc mhew
'hew' will decypher to 'qmi' which will clearly appear to be nulls to the
intended recipient.
-Nulls
Nulls are characters used in messages which have no meanings. A message could
be sent which uses numbers as nulls. This makes decypherment more difficult as
part of the message has no meaning. Until the decypherer realizes this, he
may have a hard time of solving the message.
-Polyphony
Another method that can be applied is the use of polyphones. Polyphones are
simply using a piece of cyphertext to represent more than one piece of
plaintext. For example a cyphertext 'e' may represent an 'a' and a 'r'. This
does complicate decypherment and may result in multiple messages. This is
dangerous as these messages are prone to errors and may even decypher into
multiple texts.
A new cyphertext alphabet would be
Cyphertext alphabet A B C D E F G H I J L N P
Plaintext alphabet Z X U S Q O M K H N R V W
B D F G I A C E L P J T Y
Our old plaintext message becomes
nih aich gfp peii ledh bclejd nih dhgfjb gffj clfg nih phdn cehib
This decypherment becomes very tricky for someone to accomplish. Having some
knowledge of the text would be a great help.
If it appears that very few letters are being used in a document then you may
wish to suspect the use of polyphones within a document.
-Homophones
Homophones are similar to polyphones except that there is more than one
cyphertext letter for every plaintext letter. They are useful to use in that
they can reduce the frequencies of letters in a message so that an analysis
yields little information. This is very easy to do with bipartite
substitution cyphers. For example:
a b c d e
a a b c d e
b f g h i j
c k l m n o
d p q r s t
e u v w x y
f z * * * *
*(fb, fc, fd, fe are NULLS)
We can add homophones to the message like this:
a b c d e
i h g a a b c d e
k j b f g h i j
n l c k l m n o
o m d p q r s t
p e u v w x y
f z * * * *
The optimal way to set up these homophones is to calculate the frequency of
appearance in the natural language you are using of each row of letters.
Homophones should be added so that the cyphertext appearance of each homophone
is reduced to a level where frequency analysis would yield little information.
-Code Words
One final method which can be used is that of code words. Simply replace
important words in the plaintext with code words which represent another word.
For example the nonsense plaintext that has been chosen for this document could
actually mean:
The blue cow will rise during the second moon from the west field.
The king is angry and will attack in two weeks with the 1st calvary by way of
the foothills.
blue is angry
cow is king
rise is attack
second is two weeks
moon is 1st calvary
west field stands for some foothills on the west side of the kingdom.
Throughout this document I have mentioned frequency analysis of english
documents. This is a fairly tedious task to do by hand, and so I am
developing software to aid in frequency analysis of documents. I will be
making it available via my website at http://www.cu-online.com/~jwthomp/ on
Monday, September 8th. Please watch for it in the Cryptography section.
Ok, now to try your hand at a few cyphertexts..
This one has to do with war.
1)
kau noelb'd oerf xmtt okkopw ok qoxb euoqf kau kurhtoe wbmcakds, obq dkemwu amd
podktu xamtu xu altq amr
This one is an excerpt from a technical document.
2)
etdsalwqs kpjsjljdq gwur orrh frurdjkrf sj qtkkjps npjtk ljeethalwsajhq
sgrqr kpjsjljdq tqr w jhr sj ewhy kwpwfane ijp spwhqeaqqajh sykalwddy tqahn
ldwqq f ahsrphrs kpjsjljd wffprqqrq sj qkrlaiy qkrlaial etdsalwqs npjtkq
Mail me your answers and I'll put the first person who solves each cypher in
the next Phrack.
In fact, I would enjoy seeing some participation in this for the next Phrack.
After reading this, I welcome the submission of any "Monoalphabetic" cypher
based on the discussions of this article. Please do not yet submit any
polyalphabetic cyphers (Next article). When submitting to me, please send me
two letters. The first mail should include only the encyphered text. Make
sure it is enough so that a reasonable examination can be made of the cypher.
This first mail should have a subject "Cyphertext submission". If you are
using a method of encypherment not found in this article, please enclose a
brief description of the type of method you used. Follow this mail up with
another entitled "Cyphertext Solution" along with a description of the
encyphering method as well as the key or table used.
I will select a number of these texts to be printed in the next Phrack, where
readers may have a chance at solving the cyphers. The reason I ask for two
seperate mailing is that I will want to take a crack at these myself. Finally,
the names of individuals will be placed in the following phrack of the first
to solve each cypher, and whomever solves the most cyphers prior to the next
Phrack release (real name or pseudonym is fine).
Please mail all submissions to jwthomp@cu-online.com
I welcome any comments, suggestions, questions, or whatever at
jwthomp@cu-online.com
----[ EOF