Monome-Dinome Tutorial

Introduction

The Monome-Dinome cipher (Mo-Di) is an example of an alphanumeric substitution cipher. This cipher is similar to the Nihilist Substitution cipher in that a unique number represents each letter. In the case of the Nihilist, a two-digit number represents each letter. In the case of the Mo-Di, one third of the letters are represented by a single digit number (a Monome) and the remainder by a two digits number (a Dinome). Continue reading Monome-Dinome Tutorial

Myszkowski Tutorial

Introduction

The Myszkowski (Msyz) cipher is a transposition cipher similar to the Incomplete Columnar (IC) cipher. Like the IC, the Mysz uses a keyword to order the plain text columns for removal as cipher text. Unlike the IC, the Msyz gathers up the columns with identical keyletters at the same time, further mixing up the plaintext. Continue reading Myszkowski Tutorial

Playfair Tutorial

Introduction

The Playfair cipher is a digraphic substitution cipher based on a 5 x 5 Polybius square formed with a keyword (N.B., I/J are considered equivalent). The plain text is broken up into two letter combinations (digraphs). Double letters are not permitted and must be separated by inserting a null (Usually an ‘X’). If a single letter is left at the end of the plain text, a null is also inserted there to complete the digraph. Continue reading Playfair Tutorial

Slidefair Tutorial

Introduction

The Slidefair cipher is a bit different than the ciphers attacked in the past. It is a periodic substitution ciphers similar to Vigenère or Beaufort. However, the major difference between this cipher and those mentioned above is that encipherment is by digraphs (two letter combinations) as opposed to single letters. The Slidefair system can be used with any type of periodic system. Continue reading Slidefair Tutorial

1. SIMPLE SUBSTITUTION

INTRODUCTION

Cryptography is the science of writing messages that no one except the intended receiver can read. Cryptanalysis is the science of reading them anyway. “Crypto” comes from the Greek ‘krypte’ meaning hidden or vault and “Graphy” comes from the Greek ‘grafik’ meaning writing. The words, characters or letters of the original intelligible message constitute the Plain Text (PT). The words, characters or letters of the secret form of the message are called Cipher Text (CT) and together constitute a Cryptogram.

Cryptograms are roughly divided into Ciphers and Codes.

William F. Friedman defines a Cipher message as one produced by applying a method of cryptography to the individual letters of the plain text taken either singly or in groups of constant length. Practically every cipher message is the result of the joint application of a General System (or Algorithm) or method of treatment, which is invariable and a Specific Key which is variable, at the will of the correspondents and controls the exact steps followed under the general system. It is assumed that the general system is known by the correspondents and the cryptanalyst. [FR1]

A Code message is a cryptogram which has been produced by using a code book consisting of arbitrary combinations of letters, entire words, figures substituted for words, partial words, phrases, of PT. Whereas a cipher system acts upon individual letters or definite groups taken as units, a code deals with entire words or phrases or even sentences taken as units. We will look at both types of systems in this course.

The process of converting PT into CT is Encipherment. The reverse process of reducing CT into PT is Decipherment. Cipher systems are divided into two classes: substitution and transposition. A Substitution cipher is a cryptogram in which the original letters of the plain text, taken either singly or in groups of constant length, have been replaced by other letters, figures, signs, or combination of them in accordance with a definite system and key. A Transposition cipher is a cryptogram in which the original letters of the plain text have merely been rearranged according to a definite system. Modern cipher systems use both substitution and transposition to create secret messages.

SUBSTITUTION AND TRANSPOSITION CIPHERS COMPARED

The fundamental difference between substitution and transposition methods is that in the former the normal or conventional values of the letters of the PT are changed, without any change in the relative positions of the letters in their original sequences, whereas in the latter only the relative positions of the letters of the PT in the original sequences are changed, without any changes to the conventional values for the letters. Since the methods of encipherment are radically different in the two cases, the principles involved in the cryptanalyses of both types of ciphers are fundamentally different. We will look at the methods for determine whether a cipher has been enciphered by substitution or transposition.

SIMPLE SUBSTITUTION

Probably the most popular amateur cipher is the simple substitution cipher. We see them in newspapers. Kids use them to fool teachers, lovers send them to each for special meetings, they have been used by the Masons, secret Greek societies and by fraternal organizations. Current gangs in the Southwest use them to do drug deals. They are found in literature like the Gold Bug by Edgar Allen Poe, and death threats by the infamous Zodiak killer in San Francisco in the late 1960’s.

The Aristocrats (A1-A25) in the Aristocrats Column of “The Cryptogram” are all simple substitution ciphers in English. Each English plain text letter in all its occurrences in the message is replaced by a unique English ciphertext letter. The mathematical process is called one-to-one contour mapping. It is unethical (and a possible wedge for the analyst) to use the same ciphertext letter for substitution for a plaintext letter.

A recurring theme of my lectures is that all substitution ciphers have a common basis in mathematics and probability theory. The basis language of the cipher doesn’t matter as long as it can be characterized mathematically. Mathematics is the common link for deciphering any language substitution cipher. Based on mathematical principles, we can identify the language of the cryptogram and the break open its contents.

FOUR BASIC OPERATIONS OF CRYPTANALYSIS

William F. Friedman presents the fundamental operations for the solution of practically every cryptogram:

  1. The determination of the language employed in the plain text version.
  2. The determination of the general system of cryptography employed.
  3. The reconstruction of the specific key in the case of a cipher system, or the reconstruction of, partial or complete, of the code book, in the case of a code system or both in the case of an enciphered code system.
  4. The reconstruction or establishment of the plain text. In some cases, step (2) may proceed step (1). This is the classical approach to cryptanalysis. It may be further reduced to:
    1. Arrangement and rearrangement of data to disclose non-random characteristics or manifestations ( e. frequency counts, repetitions, patterns, symmetrical phenomena)
    2. Recognition of the nonrandom characteristics or manifestations when disclosed (via statistics or other techniques)
    3. Explanation of nonrandom characteristics when recognized. (by luck, intelligence, or perseverance)

Much of the work is in determining the general system. In the final analysis, the solution of every cryptogram involving a form of substitution depends upon its reduction to mono-alphabetic terms, if it is not originally in those terms. [FR1]

OUTLINE OF CIPHER SOLUTION

According to the Navy Department OP-20-G Course in Cryptanalysis, the solution of a substitution cipher generally progresses through the following stages:

  1. Analysis of the cryptogram(s)
    1. Preparation of a frequency table.
    2. Search for repetitions.
    3. Determination of the type of system used.
    4. Preparation of a work sheet.
    5. Preparation of individual alphabets (if more than one)
    6. Tabulation of long repetitions and peculiar letter distributions.
  2. Classification of vowels and consonants by a study of:
    1. Frequencies
    2. Spacing
    3. Letter combinations
    4. Repetitions
  3. Identification of letters.
    1. Breaking in or wedge process
    2. verification of assumptions.
    3. Filling in good values throughout messages
    4. Recovery of new values to complete the solution.
  4. Reconstruction of the system.
    1. Rebuilding the enciphering table.
    2. Recovery of the key(s) used in the operation of the system
    3. Recovery of the key or keyword(s) used to construct the alphabet sequences.

All steps above to be done with orderly reasoning. It is not an exact mechanical process. [OP20]

Since this is a course in Cryptanalysis, lets start cracking some open.

EYEBALL

While reading the newspaper you see the following cryptogram. Train your eye to look for wedges or ‘ins’ into the cryptogram. Assume that we dealing with English and that we have simple substitution. What do we know? Although short, there are several entries for solution. Number the words. Note that it is a quotation (12, 13 words with * represent a proper name in ACA lingo).

A-1. Elevated thinker. K2 (71) LANAKI

1      2        3     4  5   5    6    7

FYV YZXYVEF ITAMGVUXV ZE FA ITAM FYQF MV

8 9 10 11 12

QDV EJDDAJTUVU RO HOEFVDO. *QGRVDF

13

*ESYMVZFPVD

ANALYSIS OF A-1.

Note words 1 and 6 could be: ‘ The….That’ and words 3 and 5 use the same 4 letters I T A M . Note that there is a flow to this cryptogram The _ _ is? _ _ and? _ _. Titles either help or should be ignored as red herrings. Elevated might mean “high” and the thinker could be the proper person. We also could attack this cipher using pattern words (lists of words with repeated letters put into thesaurus form and referenced by pattern and word length) for words 2, 3, 6, 9, and 11.

Filling in the cryptogram using [ The… That] assumption we have:

1      2        3     4  5   5    6    7
the h hest e e s t that e
FYV YZXYVEF ITAMGVUXV ZE FA ITAM FYQF MV

8 9 10 11 12
a e e te a e t
QDV EJDDAJTUVU RO HOEFVDO. *QGRVDF

13
h e t e
*ESYMVZFPVD

Not bad for a start. We find the ending e_t might be ‘est’. A two letter word starting with t_ is ‘to’. Word 8 is ‘are’. So we add this part of the puzzle. Note how each wedge leads to the next wedge. Always look for confirmation that your assumptions are correct. Have an eraser ready to start back a step if necessary. Keep a tally on which letters have been placed correctly. Those that are unconfirmed guesses, signify with ? Piece by piece, we build on the opening wedge.

1      2        3     4  5   5    6    7
the h hest o e e s to o that e
FYV YZXYVEF ITAMGVUXV ZE FA ITAM FYQF MV

8 9 10 11 12
are s rr e ster a ert
QDV EJDDAJTUVU RO HOEFVDO. *QGRVDF

13
s h e t er
*ESYMVZFPVD

Now we have some bigger wedges. The s_h is a possible ‘sch’ from German. Word 9 could be ‘surrounded.’ Z = i. The name could be Albert Schweitzer. Lets try these guesses. Word 2 might be ‘highest’ which goes with the title

1      2        3     4  5   5    6    7
the highest nowledge is to now that we
FYV YZXYVEF ITAMGVUXV ZE FA ITAM FYQF MV

8 9 10 11 12
are surrounded ster albert
QDV EJDDAJTUVU RO HOEFVDO. *QGRVDF

13
schweitzer
*ESYMVZFPVD

The final message is: The highest knowledge is to know that we are surrounded by mystery. Albert Schweitzer. Ok that’s the message, but what do we know about the keying method.

KEYING CONVENTIONS

Ciphertext alphabets are generally mixed for more security and an easy pneumonic to remember as a translation key. ACA ciphers are keyed in K1, K2, K3, K4 or K()M for mixed variety. K1 means that a keyword is used in the PT alphabet to scramble it. K2 is the most popular for CT alphabet scrambling. K3 uses the same keyword in both PT and CT alphabets, K4 uses different keywords in both PT and CT alphabets. A keyword or phrase is chosen that can easily be remembered. Duplicate letters after the first occurrence are deleted.

Following the keyword, the balance of the letters are written out in normal order. A one-to-one correspondence with the regular alphabet is maintained. A K2M mixed keyword sequence using the word METAL and key DEMOCRAT might look like this:

4  2  5  1  3
M E T A L
=============
D E M O C
R A T B F
G H I J K
L N P Q S
U V W X Y
Z

the CT alphabet would be taken off by columns and used:

CT: OBJQX EAHNV CFKSY DRGLUZ MTIPW

Going back to A-1. Since it is keyed as a K-2, we set up the PT alphabet as a normal sequence and fill in the CT letters below it. Do you see the keyword LIGHT?

PT      abcdefghijklmnopqrstuvwxyz
CT QRSUVWXYZLIGHTABCDEFJKMNOP
----------
KW = LIGHT

In tough ciphers, we use the above key recovery procedure to go back and forth between the cryptogram and keying alphabet to yield additional information.

To summarize the eyeball method:

  1. Common letters appear frequently throughout the message but don’t expect an exact correspondence in popularity.
  2. Look for short, common words (the, and, are, that, is, to) and common endings (tion, ing, ers, ded, ted, ess,
  3. Make a guess, try out the substitutions, keep track of your progress. Look for readability.

GENERAL NATURE OF ENGLISH LANGUAGE

A working knowledge of the letters, characteristics, relations with each other, and their favorite positions in words is very valuable in solving substitution ciphers.

Friedman was the first to employ the principle that English Letters are mathematically distributed in a unilateral frequency distribution:

13 9 8 8 7 7 7 6 6 4 4 3 3 3 3 2 2 2 1 1 1 - - - - -
E T A O N I R S H L D C U P F M W Y B G V K Q X J Z

That is, in each 100 letters of text, E has a frequency (or number of appearances) of about 13; T, a frequency of about 9; K Q X J Z appear so seldom, that their frequency is a low decimal.

Other important data on English ( based on Hitt’s Military Text):

6 Vowels: A E I O U Y = 40 %
20 Consonants:    
  5 High Frequency (D N R S T) = 35 %
10 Medium Frequency (B C F G H L M P V W) = 24 %
5 Low Frequency (J K Q X Z) = 1 %
  ====
100 %

The four vowels A, E, I, O and the four consonants N, R, S, T form 2/3 of the normal English plain text. [FR1]

Friedman gives a Digraph chart taken from Parker Hitts Manual on p22 of reference. [FR2]

The most frequent English digraphs per 20,000 letters are:

TH 50   AT 25   ST 20
ER 40 EN 25 IO 18
ON 39 ES 25 LE 18
AN 38 OF 25 IS 17
RE 36 OR 25 OU 17
HE 33 NT 24 AR 16
IN 31 EA 22 AS 16
ED 30 TI 22 DE 16
ND 30 TO 22 RT 16
HA 26 IT 20 VE 16

The most frequent English trigraphs per 200 letters are:

THE 89   TIO 33   EDT 27
AND 54 FOR 33 TIS 25
THA 47 NDE 31 OFT 23
ENT 39 HAS 28 STH 21
ION 36 NCE 27 MEN 20

Frequency of Initial and Final Letters

Letters-- A B C D  E F G H I J K L M N  O P Q R S  T U V W X Y Z
Initial-- 9 6 6 5 2 4 2 3 3 1 1 2 4 2 10 2 - 4 5 17 2 - 7 - 3 -
Final -- 1 - 10 17 6 4 2 - - 1 6 1 9 4 1 - 8 9 11 1 - 1 - 8 -

Relative Frequencies of Vowels.

A 19.5% E 32.0% I 16.7% O 20.2% U 8.0% Y 3.6%

Average number of vowels per 20 letters, 8.

Becker and Piper partition the English language into 5 groups based on their Table 1.1 [STIN], [BP82]

Table 1.1
Probability of Occurrence of 26 Letters
Letter Probability   Letter Probability
A .082 N .067
B .015 O .075
C .028 P .019
D .043 Q .001
E .127 R .060
F .022 S .063
G .020 T .091
H .061 U .028
I .070 V .010
J .002 W .023
K .008 X .001
L .040 Y .020
M .024 Z .001

Groups

  1. E, having a probability of about 0.127
  2. T, A, O, I, N, S, H, R, each having probabilities between 0.06 – 0.09
  3. D, L, having probabilities around 0.04
  4. C, U, M, W, F, G, Y, P, B, each having probabilities between 015 – 0.023.
  5. V, K, J, X, Q, Z, each having probabilities less 0.01.

LETTER CHARACTERISTICS AND INTERACTIONS

ELCY gives Data for English, German, French, Italian, Spanish, Portuguese in her Appendices, p218 ff. She also give tables of letter contact data. [ELCY]

LANAKI published data on English and 10 different languages as well as expanded work on Chinese. It is available at the CDB. [NIC1] [NIC2]

S-TUCK gives detailed English, French and Spanish letter characteristics in her book. [TUCK]

Friedman in his Military Cryptanalytics Part I – Volume 1 gives charts showing the lower and upper limits of deviation from theoretical (random) for the number of vowels, high, low, medium frequency consonants, blanks in distributions for plain text and random text for messages of various lengths. [FR1]

Friedman in his Military Cryptanalytics Part I – Volume 2 give a veritable pot puree of statistical data on letter frequencies, digraphs, trigraphs, tetragraphs, grouped letters, relative log data, special purpose data, pattern words, idiomorphic data, standard endings, initials, foreign language data [German, French, Italian, Spanish, Portuguese and Russian], classification of systems used in concealment, nulls and literals. [FR2]

Sinkov assigns log frequencies to digraphs to aid in identification. The procedure is explained by Friedman. [FR1] [SINK]

“ACA and You” presents general properties of English letters. [ACA]

Foster presents detail letter characteristics based on the Brown Corpus. [CCF]

Don L. Dow puts out a clever computer cryptogram game which does frequency analysis and is user friendly for very simple Aristocrats. {Available as shareware} [DOW]

Depending the basis text we choose, we find variations in the frequency of letters. For example, literary English gives slightly different results than frequencies based on military or ordinary English text.

Hagn presented Literary English Letter Usage Statistics based on “A Tale of Two Cities” by Charles Dickens as follows:

Total letter count = 586747   Total doubled letter count = 14421
Letter use frequencies: Doubled letter frequencies:
E 72881 12.40% LL 2979 20.60%
T 52397 8.90% EE 2146 14.80%
A 47072 8.00% SS 2128 14.70%
O 45116 7.60% OO 2064 14.30%
N 41316 7.00% TT 1169 8.10%
I 39710 6.70% RR 1068 7.40%
H 38334 6.50% PP 628 4.30%
S 36770 6.20% FF 430 2.90%
R 35946 6.10% NN 301 2.00%
D 27487 4.60% CC 243 1.60%
L 21479 3.60% MM 207 1.40%
U 16218 2.70% DD 201 1.30%
M 14928 2.50% GG 99 0.60%
W 13835 2.30% BB 41 0.20%
C 13223 2.20% ZZ 13 0.00%
F 13152 2.20% AA 2 0.00%
G 12121 2.00% HH 1 0.00%
Y 11849 2.00%  
P 9452 1.60%
B 8163 1.30%
V 5044 0.80%
K 4631 0.70%
Q 655 0.10%
X 637 0.10%
J 623 0.10%
Z 213 0.00%

[HAGN]

Total initial letters = 135664   Total ending letters = 135759
Initial letter frequencies: Ending letter frequencies:
T 20665 15.2% E 26439 19.4%
A 15564 11.4% D 17313 12.7%
H 11623 8.5% S 14737 10.8%
W 9597 7.0% T 13685 10.0%
I 9468 6.9% N 10525 7.7%
S 9376 6.9% R 9491 6.9%
O 8205 6.0% Y 7915 5.8%
M 6293 4.6% O 6226 4.5%
B 5831 4.2% F 5133 3.7%
C 4962 3.6% G 4463 3.2%
F 4843 3.5% H 3579 2.6%

Top digraphs:

TH 17783   RE 8139   ED 6217   IS 5566
HE 17226 ND 7793 AT 6200 NG 5564
IN 10783 HA 6611 EN 5849 IT 5559
ER 10172 ON 6464 HI 5730 OR 4915
AN 9974 OU 6418 TO 5703 AS 4836

POSITION AND FREQUENCY TABLE

Time to put to good use the barrage of data presented. Given the next slightly harder cryptogram, and ignoring again a pattern word attack, we can develop some useful tools. [Much of what I am covering can be done automatically by computer but then your brain goes mushy for failure to understand the process.]

A-2. [no clue] S-TUCK

VWHAZSJXIH SKIMF MWCGMV WOJSIFAGFJAQ

QMNRJKZMGRSWMF. JATW XHAWF. FIQQWFFXIH

FKHBAOZ JSMAHHF. TGAHPKD XMAWOVFSARF

XHKIMAFS.

First we perform a CT Frequency Count.

 F  A H M W S I J K X G Q O R V Z T B C D N P
13 11 9 9 8 7 6 6 5 5 4 4 3 3 3 3 2 1 1 1 1 1

We have 106 letters. 20% are considered low frequency. 20% of 106 = 21. Counting from right to left we have O, R, V, Z, T, B, C, D, N, P. We mark A-2. with a dot over each appearance. We also enter the frequency data under the CT.

Vowels contact the low frequency letters more often than do consonants. About 80% of the time. We use S-TUCK method combined with our text. [ELCY] [TUCK]

We go thru A-2. writing down the contact letters on both sides, for low frequency CT. We tally one for each contact. If a CT letter is between two low frequency letters we tally 2. Contacts for low frequency letters touching each other = 0. We do not count N o R in word 2, and in word 1, W contacts V, so W is tallied with 1. A an S contact Z, so both A and S are credited. We get:

/////  ////  //  ///  ///  //  ///  //  //
W A S G M J K H F

Low Frequency Contacts for A-2.

From the Brown Corpus, vowel contact as percentage of total number of digrams is low: [CCF]

               Second
A E I O U Y
A 0 0 .4 0 .1 .3 Total nonpairs = 5.1%
F E .7 .4 .2 .1 0 .2 pairs = 0.7%
I I .2 .4 0 .7 0 0
R O .1 .1 .1 .3 1.0 0
S U .1 .1 .1 0 0 0
Y 0 .1 0 .2 0 0

ELCY tells us quite a bit about vowel behavior.

  1. A, E, I, O, are normally high frequency, U is moderate and Y is low frequency.
  2. Letters contacting low frequency letters are usually vowels.
  3. Letters showing a wide variety of contact-letters are usually vowels.
  4. In repeated digrams, one letter is usually a vowel.
  5. In reversed digrams, one letter is usually a vowel.
  6. Doubled consonants are usually flanked by vowels, and visa versa. ( cvvc or vccv)
  7. It is unusual to find more than 5 consonants in succession.
  8. Vowels do not often contact each other.
  9. If the CT letter with highest frequency is assumed E, any other high frequency letter which never touches E, can be assumed a vowel. A letter that contacts it very often can not be a vowel.
  10. E is most frequent vowel and rarely touches O. Both double freely.
  11. The vowel that follows and rarely precedes E is A.
  12. The vowel that reverse with E is I.
  13. Observations 11 and 12 apply to the vowel O. However, finding U it precedes E and follows O.
  14. The only vowel-vowel digrams of consequence are OU,EA,IO.
  15. Three vowels in sequence may be IOU, EOU, UOU, EAU.

NYPHO’s Robot says that the first four or last four letters of a word contain a vowel. [TUCK]

ELCY defines high frequency letter behavior.

About 70% of the language is made up of E, T, A, O, N, I, R, S, H. This high frequency group has three cliques.

  • Class I. T, O, S appear frequently both as Initials and Finals; terminal O in short words like to. All double freely
  • Class II. A, I, H appear frequently as initials, but rare as finals, especially A, I. They do not readily double.
  • Class III. E, N, R, appear frequently as finals, less frequently as initials, frequently double, especially E, N and R not so often.

When one of these letters changes its class, the least likely exchange is one occurring between Class II and III.

ELCY gives us tips for identifying consonants:

  1. Those letters still remaining in the high frequency section will usually include T, N, R, S, H. H is the easiest to identify, it precedes all vowels, and forms TH, HE, HA.
  2. R is also recognizable with it reverses openly with all vowels, and links with the class I club.
  3. T is usually found by frequency, precedes vowels rather than follow them, precedes consonants. S has a similar pattern to a lesser degree. N confuses this picture.
  4. ST -TS AND RT -TR are the only frequent consonant reversals.
  5. TT and SS are most frequent doubles in language.

Having all this information, we are well armed against even the most resistant Aristocrat. We return now to solution of A-2.

From the number of their contacts, W and A are most likely vowels. G, K, M are next most likely. We look at these letters in the position table.

W. has the looks of E even though it is not the most frequent.

A cannot be A so it might be I. but frequency may be too

G. and K. have inside positions and look like vowels but can not be identified

M. might be O by frequency but is confused with R.

A study of A-2. shows that W and A reverse which might be ei and ie. AG reverses which might be io or ia. M repeats, and reverses with W and G. It most likely is R not O. K does not contact W A G or M. We mark the cipher with W A G K as vowels and M as a consonant, putting in the assumed values.

A-2. [no clue] S-TUCK

     1       2      3          4
delightful hours re ard e th siastic
.vcv. cvc vvcc cv.vc. v. vcvvc vc
VWHAZSJXIH SKIMF MWCGMV WOJSIFAGFJAQ
389+376569 7569* 981493 83676*+4*6+4

5 6 7 8
cr togr hers ti flies successful
cc.. v.cv. vcc v.v ccvvc cvccvcccvc
QMNRJKZMGRSWMF. JATW XHAWF. FIQQWFFXIH
4913653943789* 6+28 59+8* *6448**569

9 10 11 12
sol i g thrills ail o frie dshi s
cvc.v.. cvccc .vvc.v. ccvv..c v.c
FKHBAOZ JSMAHHF. TGAHPKD XMAWOVFSARF
*591+33 679+99* 24+9151 59+833*7+3*

13
fl u ish
ccvvcvc
XHKIMAFS.
59569+*7 (two digit figures F=13=* ; a=11=+)

Using Nympho’ robots rule, in Word 1, J X I H, one must be a vowel. Word 8 shows F X I H contains a vowel. Word one suggest the ending ‘ful’. X = f and H = l. Examine X I H and the I is in the vowel positions. (inner positions). So the vowels are now W E G K I. From its end position F =s. In words 4 and 11, GA reverses so G cannot be a u for ui is not a reversal. We try KI=ou, therefore G = A. Put into the above cipher tableaus. Word 5 breaks the two c’s, so Q = c. Word 1 might be delightful, so V=d, ZSJ = ght. Remember the second letter position favors vowels.[ROBO]

The message reads: Delightful hours reward enthusiastic cryptographers. Time flies. Successful solving thrills. Mailbox friendships flourish. KW =K1=salutory.

PATTERN WORD ATTACK

Pattern words are words for which one or more letters are repeated such as awkward, successful, interesting, unusually. Aegean Park Press publishes pattern word books from 3 – 16 letters. Pattern words lists are indexed by key letters or figures or by vowel consonant relationships. [BARK] Pattern words give a quick wedge into the cryptogram. One of the best Pattern Word Dictionaries is the Cryptodyct. [GODD]

The Crypto Drop Box has the TEA computer program which gives automated pattern searching and anagraming up to 20 words. It is a very effective tool.

In A-2. We find a prize in word 8. Using a key letter approach:

A B C C D A A E B F
F I Q Q W F F X I H

or

1 2 3 3 4 1 1 5 2 6     =  (334)   11526  [10L]
F I Q Q W F F X I H

The first pattern found on page 310 Appendix of [CCF] is successful. The Cryptodyct uses the latter indexing method and under 10 letter words we find that the 334 11526 pattern equals successful.

Cryptographers generate their own special lists:

  • Transposals: from, form; night, thing; mate, meat;
  • Queer words: adieu, crwth, eggglass, giaour, meaow
  • Consonant sequences: dths, lcht, ncht, rids, ngst, rths
  • Favorite ins: people, crypt, success,

Using the TEA model, it was necessary to assume the vowels at u and e for a 1u22e445u6 template to get successful and juggernaut on the first try.

Non Pattern word lists are those with words that do not have even one repeated letter, such as come, wrath, journey. They are very useful in attacking Patristrocrats and very difficult Risties.

OMAR gave us this fine list in order of frequency:

CRYPT WORDS ABOUT KNOWS BELOW OKAPI SWORD
BLACK ALONG AFTER NEGRO EXTRA PLACE THREW
WATCH CRAZY CAUSE UNDER FIRST SIXTY WRONG
WHILE CROWD DRUNK UPSET FOUND STUDY
ANGRY PLUMB EMPTY YIELD

We will come back to it in the Patty section.

Also in the CDB is a program called ASOLVER which automates the Digram solution method to get the best fit.

MORE ABOUT VOWEL POSITION PREFERENCES

Dr. Raj Wal summarized Barkers Vowel Preferences data. He also developed cross correlation coefficients for each letter. Foster details this work in his book. [CCF]

This handy little table gives us an entry when needed. It is correct more times than it fails.

Word Length Position Preferences
one 1  
V
 
two 1 2  
V C
 
three 1 2 3  
C C
 
four 1 2 3 4  
C V C
 
five 1 2 3 4 5  
C C V C C
 
six 1 2 3 4 5 6  
C V C C
 
seven 1 2 3 4 5 6 7  
C V C C C
 
eight
plus
1 2 3 4 5 . . Final
C C C

Variety of Contact Table (VOC):

Freq:  8 7 6 5 4 4 6 5 4 7  /  3 3 6 3  /  2 1 1 1 1 1
VOC: 10 9 8 8 7 7 7 6 6 6 / 5 5 5 5 / 4 2 1 1 1 1
CT: X Y B V W K A U H Z / M R C S / T N E O D P

We start with the position that 20% of the text represented by variety count are consonants. 20% of 104 = about 21. The line of demarcation is between R and C but 4 letters have the same VOC of 5, M,R,S,C. If we take one , we must take all and one of these most likely is a vowel. The key to solution is the VOC “step up” versus “step down” observation. Vowels tend to step up and Consonants tend to step down. [i.e. 3M5 is a step up of 2 points and 6C5 is a step down of one point.]

M, R, S all step up, C steps down 1 point and most likely is a consonant. We develop a separation line and place the contacts on each side of the consonant line starting from the right of the VOC table.

First Consonant Line
C T N E O D P
--------------------
V |
X | XXXX
YY | YYY
K |
S |
Z |
| W
| R
M |
| H
| B

If any letter does not appear at all below the line, that letter is most likely a consonant. A and U fall into this catagory. We add these to analysis:

Second Consonant Line
C T N E O D P A U
---------------------
VV | V mark X and Y as Vowels
X | XXXX (vowel) both step up
YYYY | YYY (vowel) with high VOC
KKK |
S |
Z | ZZ consonant (step down)
| WWW test as h
R | R
MM |
| HHH
B | B
| U
A |
|

We shift to A-3 and mark in the suspected consonents.

A-3. No clue. Author Bosley No. 19. CM. June 1936. cont

    1      2       3          4
UWYMNXKA EHXRBZ UVXMUWBZ OYZTWHVCXYA
--o--o-- -oo-o- --o---o- -o---o--oo-

5 6 7 8
CYAUZ DBRAHVKBA; ZWSVAHKUZBKC, MSCX
-o--- -o--o--o- --o--o---o-- -o-o

9 10
CYXBS, XVZYTRYCXP. (104L)
-oooo o--o--o-o-

n and h turn up on the right and left side of the consonant line freely. w and h are candidates. Since h=H, then w might equal h. Digrams such as sh or ch are prevalent. W is the second position in word 7 which tentatively confirms the PT h and suggests that Z is a consonant (step down). B is astep up as well as S. The third word confirms but the 9 word has four vowels. Hmm? K and H are both possibilities for vowels. Word 4 tends to favor the H. So:

Final Consonant Line
C T N E O D P A U W Z
---------------------
VVV | V mark X and Y as Vowels
X | XXXX (vowel) both step up
YYYYY | YYYYY (vowel) with high VOC
KKK |
S | S vowel low freq? =u?
ZZ | ZZ consonant (step down)
| WWWW test as h
R | R
MM |
| HHHH
BBB | BBB vowel
UUUU | U consonant
A | consonant
T | T consonant

Let me fill in where ELCY stops. A-3 has vowels and consonants separated. We have the PT letter h. Word 9 is either clever or wrong. Using Barkers Pattern List on p39, we find bayou and miaou. The same reference gives us thunderclaps for word 7. Although not correct we find thunderstorm matching the pattern under 819710/12W and word 8 suggests puma. The final message reads: shipyard zealot snapshot kitchenmaid midst goldenrod; thunderstorm, puma miaou, anticlimax.

The TEA database yields words: thunderstorm and anticlimax. The reader is invited to reconstruct the keywords, if any.

NON-PATTERN WORD ATTACK

Try this Aristocrat.

A-4. Fire, fire burning bright. by Ah Tin Dhu.

  1     2     3     4     5     6     7
ABCDE ACFGH ICJFH KCIBL KFBHL KCMJN OMJPI

8 9 10 11 12 13 14
BHLMC MRSPE BCAIH TIAUH. KUMCE VDUHP. SCFGD

15 16 17 18 19
JWBIL JSUML DUVNP, VEOML CFGLE.

To solve by using non-pattern words, 3 or 4 words in the cipher having several letters in common. Under one of these write 5 or 6 words from the pattern list. We will use OMAR’s list given previously. Note the initials and final letters and letter positions of the trial words. In A-4. K is an initial and L is a terminal. Choose the non-pattern words to conform with this requirement. We write the common letters under the trial word and try to make clear message out of the balance of CT. Word 5 has K, BHL and F.

   K F B H L    A C F G H    K C I B L    B H L M C
1 b l a c k l c b a k a c k
2 c r a z y r z c a y a z y
3 w r o n g r n w o g o n g
4 c r o w d r w c o d o w d
5 d r u n k r n d u k u n k
6 f o u n d o n f u d u n d

Line 6 arson, fraud, under. Putting this into the risties we get:

  1     2     3     4     5     6     7
bur y brown arson fraud found fre e a
ABCDE ACFGH ICJFH KCIBL KFBHL KCMJN OMJPI

8 9 10 11 12 13 14
under e y urban cabin fiery in row
BHLMC MRSPE BCAIH TIAUH. KUMCE VDUHP. SCFGD

15 16 17 18 19
uad ied i y ed rowdy
JWBIL JSUML DUVNP, VEOML CFGLE.

All the vowels are id’ed and r, n. The message is “Burly brown arson fraud found fresh vesta under empty cabin. Fiery glint. Prowl squad spied light, gyved rowdy.”

RECAP

  1. Common letters appear frequently in a message but not necessarily in exact correspondence to the uniform frequency distribution.
  2. Start working with shorter words, common endings.
  3. Look for repetitions of bigrams, trigrams, reversals.
  4. Go with the flow of the cipher text and extract all the information on frequency, position and contacts.
  5. Eliminate all but few possibilities. Test and confirm. Test and Confirm.
  6. Work back and forth from the cryptogram and the keyword alphabets. Expect the message to make some kind of sense.
  7. Look for patterns or non patterns. Separate vowels and consonants. Try brute force. Use lists.
  8. Persevere.

CM REFERENCES

PHOENIX has compiled a list of articles (page 2) concerning ARISTOCRATS between 1932 – 1993 in “The Cryptogram Index,” available through the ACA. On page 27, he lists additional references on simple substitution. Articles by B.NATURAL and S-TUCK are especially useful. [INDE]

HOMEWORK PROBLEMS

Solve these cryptograms, recovery the keywords, and send your solutions to me for credit. Be sure to show how you cracked them. If you used a computer program, please provide “gut” details. Answers do not need to be typed but should be generously spaced and not in RED color. Let me know what part of the problem was the “ah ha”, i e. the light of inspiration that brought for the message to you.

A-1. Bad design. K2 (91) AURION

VGS EULZK WUFGZ GON GM VDGXZAJUXUVBZ

HBUKNDW VON DK XDKUHHGDFNZX UK YDK VGUN

AJUXOUBBS XDKKGBPZK DF NYZ BULZ.

A-2. Not now. K1 (92) BRASSPOUNDER

KDCY LQZKTLJQX CY MDBCYJQL: "TR HYD FKXC,

FQ MKX RLQQIQ HYDL MKL DXCTW RDCDLQ

JQMNKXTMB PTBMYEQL K FKH CY LQZKTL TC."

A-3. Ms. Packman really works! K4 (101) APEX DX

*ZDDYYDQT QMARPAC,*QAKCMK *TDVSVK. BP WVG

QNVOMCMVB: LDXV KQAMSPDLVQU, LDBZI UVKQF

PO WAMUXV, EMUVP XQNV, UAMOZ NQKLMOV

(SAPZVO).

A-4. Money value. K4 (80) PETROUSHKA

DVTUWEFSYZ CVSHWBDXP UYTCQPV EVZFDA ESTUWX

QVSPFDBY PQYVDAFS, HYBPQ PFYVCD QSFITX PXBJ

DHWYZ.

A-5. Zoology lesson. K4 (78) MICROPOD

ASPDGULW, JYCR SKUQ NBHYQI XSPIN

OCBZAYWN=OGSJQ OSRYUW, JNYXU OBZA (BCWS

DURBC) TBGAW UQESL. *CBSW

REFERENCES

[ACA] ACA and You, Handbook For Members of the American Cryptogram Association, 1995.

[BARK] Barker, Wayne G., “Cryptanalysis of The Simple Substitution Cipher with Word Divisions,” Aegean Park Press, Laguna Hills, CA. 1973.

[BAR1] Barker, Wayne G., “Course No 201, Cryptanalysis of The Simple Substitution Cipher with Word Divisions,” Aegean Park Press, Laguna Hills, CA. 1975.

[B201] Barker, Wayne G., “Cryptanalysis of The Simple Substitution Cipher with Word Divisions,” Course #201, Aegean Park Press, Laguna Hills, CA. 1982.

[BP82] Beker, H., and Piper, F., ” Cipher Systems, The Protection of Communications”, John Wiley and Sons, NY, 1982.

[CCF] Foster, C. C., “Cryptanalysis for Microcomputers”, Hayden Books, Rochelle Park, NK, 1990.

[DOW] Dow, Don. L., “Crypto-Mania, Version 3.0”, Box 1111, Nashua, NH. 03061-1111, (603) 880-6472, Cost $15 for registered version and available as shareware under CRYPTM.zip on CIS or zipnet.

[ELCY] Gaines, Helen Fouche, Cryptanalysis, Dover, New York, 1956.

[GODD] Goddard, Eldridge and Thelma, “Cryptodyct,” Marion, Iowa, 1976

[FR1] Friedman, William F. and Callimahos, Lambros D., Military Cryptanalytics Part I – Volume 1, Aegean Park Press, Laguna Hills, CA, 1985.

[FR2] Friedman, William F. and Callimahos, Lambros D., Military Cryptanalytics Part I – Volume 2, Aegean Park Press, Laguna Hills, CA, 1985.

[FRE] Friedman, William F. , “Elements of Cryptanalysis,” Aegean Park Press, Laguna Hills, CA, 1976.

[HA] Hahn, Karl, ” Frequency of Letters”, English Letter Usage Statistics using as a sample, “A Tale of Two Cities” by Charles Dickens, Usenet SCI.Crypt, 4 Aug 1994.

[INDE] PHOENIX, Index to the Cryptogram: 1932-1993, ACA, 1994.

[NIC1] Nichols, Randall K., “Xeno Data on 10 Different Languages,” ACA-L, August 18, 1995.

[NIC2] Nichols, Randall K., “Chinese Cryptography Part 1,” ACA-L, August 24, 1995.

[OP20] “Course in Cryptanalysis,” OP-20-G’, Navy Department, Office of Chief of Naval Operations, Washington, 1941.

[ROBO] NYPHO, The Cryptogram, Dec 1940, Feb, 1941.

[SINK] Sinkov, Abraham, “Elementary Cryptanalysis”, The Mathematical Assoc of America, NYU, 1966.

[STIN] Stinson, D. R., “Cryptography, Theory and Practice,” CRC Press, London, 1995.

[TUCK] Harris, Frances A., “Solving Simple Substitution Ciphers,” ACA, 1959.

Notes

Throughout my lectures, PT will be shown in lower case. CT will be shown in upper case. As a convention, Plain text will generally be shown above the Cipher text equivalent.

A = Aristocrats, P = Patristrocrats, X = Xenocrypts

Any typo errors are my responsibility. I probably fell asleep at the keyboard. Please advise and I will correct them as well as put out an erratum sheet at the end of the course. Students may want to start a 3″ permanent binder with separators for the various lectures and materials.

OUTLINE

  1. Intro – First Principles – Global Mathematical Nature
  2. Keyword Systems and Conventions Used
  3. Simple Substitution Cryptanalysis without/with Complexities
    1. Eyeball
    2. Frequency Distributions – General Nature of English Letters
    3. Friedman Techniques – Random vs Expected -Spaces and a Wealth of Tables: Digram, Trigram, and more
    4. C. C. Foster Techniques
    5. S-Tuck Techniques
    6. Pattern Words
    7. ELCY : Consonant Line Attack
    8. Sinkov Techniques
    9. Barker’s Vowel Separation and Position Table
    10. Non Pattern Words: “Dooseys”
    11. SI SI Patterns
    12. CM References for Risties
    13. Relationship to XENOS:French and German Solutions
    14. Computer Program Aids – TEA Database, CDB, ABACUS, Computer Supplement
    15. References
  4. Homework Problems
  5. Variant Substitution Systems
    1. Friedman
    2. Waxton

Next lecture we will cover the balance of the outline material and jump into Patristocrats.

Chapters

  1. Caesar Cipher (The Beginning)
  2. Substitution Ciphers
  3. Steganography
  4. Cipher Keys (Keyboard Cipher)
  5. Keyword Alphabet
  6. Aristocrat Cipher
  7. Null Cipher
  8. Construction Principles
  9. Keyword Alphabet as a Solving Tool
  10. Patristocrat Cipher
  11. Baconian Cipher
  12. Xenocrypt Cipher
  13. Polybius Square
  14. Checkerboard Cipher
  15. Foursquare Cipher
  16. Railfence & Redefence Cipher
  17. Polyalphabetic Cipher (Quagmire)
  18. Period Determination
  19. Vigenere Cipher Type: Vigenere, Beaufort, Gronsfeld, Variant
  20. Cryptarithms
  21. Affine & Hill Ciphers
  22. Fractionated Ciphers: Fractionated Morse, Morbit, Pollux
  23. Ragbaby Cipher
  24. Route Transposition Cipher
  25. Monome-Dinome Cipher
  26. Porta Cipher
  27. Polyominoes Congruent Squares

Tyro Tutorial

An Indexed Accumulation of Fifteen Years of

Cm Tyro Grams Columns

For

The Young at Heart

And

Cryptogram Cipher Tips For

Seasoned Solvers as well as Tyro Novices

LIONEL

2015

Table of Contents

Acknowledgements

Foreword

Introduction

  1. Caesar Cipher (The Beginning)
  2. Substitution Ciphers
  3. Steganography
  4. Cipher Keys (Keyboard Cipher)
  5. Keyword Alphabet
  6. Aristocrat Cipher
  7. Null Cipher
  8. Construction Principles
  9. Keyword Alphabet as a Solving Tool
  10. Patristocrat Cipher
  11. Baconian Cipher
  12. Xenocrypt Cipher
  13. Polybius Square
  14. Checkerboard Cipher
  15. Foursquare Cipher
  16. Railfence & Redefence Cipher
  17. Polyalphabetic Cipher (Quagmire)
  18. Period Determination
  19. Vigenere Cipher Type: Vigenere, Beaufort, Gronsfeld, Variant
  20. Cryptarithms
  21. Affine & Hill Ciphers
  22. Fractionated Ciphers: Fractionated Morse, Morbit, Pollux
  23. Ragbaby Cipher
  24. Route Transposition Cipher
  25. Monome-Dinome Cipher
  26. Porta Cipher
  27. Polyominoes Congruent Squares

Appendix

  1. Aristocrat Solving Tools
  2. Patristocrat Solving Techniques
  3. Baconian Concealment Cipher
  4. Railfence Template
  5. Null Variables
  6. Affine & Hill Ciphers
  7. Foursquare CT Frequency
  8. Algorithms
  9. Google As A Solving Tool

Index

Solutions

Acknowledgements

I extend a most grateful thank you to all of the ACA Krewe through the years who have provided my intellect with all of the crypto knowledge it has been capable of absorbing and in lending their wisdom, counsel, tutelage, review and editing in support of material contained within the pages of this manuscript – Special thanks to AAJHU, BECASSE, BION, FIZZY, HONEYBEE, LEDGE, MSCREP, PHOTON, QUIPOGAM, REAL NEO and my *personal mentor, RISHU.

LIONEL                           

Foreword

The following crypto tutorial is an updated extraction from the American Cryptogram Association’s Cryptogram Tyro Grams column (initially titled Kiddee Korner) from 2000 to present. The column and these extractions have been inspired by one of the ACA’s foremost crypto mentors and educators, Gerhard Linz (LEDGE).

The American Cryptogram Association (ACA) is a non-profit organization, founded in 1929, devoted to the cultivation of cryptologic knowledge with members all over the world. It publishes a bimonthly magazine, The Cryptogram, full of hundreds of cipher types contributed by members for members’ solving pleasure. ACA cryptologist members (Krewe) mirror an image of all walks of life, representing ages from five to ninety and all trades, professions and educational levels. Nom de Plumes (Noms) bring a degree of anonymity to all members. It is fun and cryptology that counts. The Kiddee Krewe/ Young Tyros is a division of the ACA that provides a cryptology learning experience for cipher solving aspirants and has no age limitations. More information and membership details can be found at ACA’s Web site www.cryptogram.org
The Kiddee Korner had its Cryptogram (Cm) journal inception in January of 2000, changing its name to Tyro Grams in the Cm JF edition of 2003. Its intention was to provide an opening to those interested in pursuing the solving of codes and ciphers and was written to serve the Young at Heart Tyros of all ages.

Webster defines “tyro” as one who is in “the preliminary stage or rudiments of any study or occupation.” As we watched our Kiddee Krewe grow in number and skill, observe its work in solving, constructing, authoring of articles in the Cm, and watched it take part in all phases of ACA conventions, we realized that these young achievers were far removed from Kiddee Land. (Two finished with scores in the top ten at our Chicago Cipher Contest.)

We felt that we performed an injustice by labeling these youngsters and “Young at Heart” adults, eager to work at mastering the principles of cryptology as “Kiddees.”  We discouraged the young and the mature to peer at what lies beyond the Kiddee label. We also discouraged the interest of youthful membership ACA recruitment.

All of these reasons prompted us to elevate our Kiddee Krewe name to Young Tyros (tip of the hat to QUIPOGAM and Grandson, QUAZAR for their suggestion) and Kiddee Korner column to Tyro Grams. Our column objective will remain the same, that of reducing cryptology principles to their simplest terms thru a most understandable format. Appendices, cipher solution pages and an Index follow the body of this material.

LIONEL – 2015 (Lee Melair)