Pigeon Code Cipher Identitifed ?


As mentioned in an earlier post, Nick Pelling of Cipher Mysteries and Stuart Rutter did some excellent research on the Pigeon Code, or as I'm now a bit more confident to call it, the Pigeon Cipher.

I've just been getting caught up on Stuart's research into the nature of the cipher used to encrypt the message, and while it's to early to be sure, I wouldn't be at all surprised to find out thath he has pretty much nailed it as being the 'LINEX Low Grade Cipher' which he describes in some detail in a fascinating post here.

Some of the interesting characteristics of the LINEX cipher - at least from a very brief look - are that it appears to result in a ciphertext with the full compliment of 26 letters, as seen in the Pigeon Cipher, the procedure involved ends up as a very complicated polyaplhabetic substitution, which would produce odd looking frequency, digram and trigram counts - as seen in the Pigoen Cipher, and indeed an absurdly low Index of Coincidence - as seen in the Piegon Cipher.

Time will tell, but at first blush, this appears to be an excellent candidate!

Pigeon Code Almost Certainly Not Broken


Sometime on Sunday evening, as I was wending my way home on the Metro, I started to get messages from people asking if I’d heard that the Pigeon Code had been cracked. As the cell reception along that particular stretch of line is not great, I had to wait until I’d got home to find out what all the excitement was about.

Sadly, as it turns out, the excitement seems to have been about not very much. The story seems to have originated in the Dorset Echo (“It’s a real coo as ‘unbreakable’ war code found on pigeon in Portland is cracked”) on Saturday 15 December 2012, and was subsequently picked up by the Daily Mail (“Hit Jerry’s panzers here’… code on dead wartime pigeon is cracked”) and then by the BBC (“Has World War II carrier pigeon message been cracked?”) on Sunday 16 December .

Pigeon Code – Fantastic Scholarship from Cipher Mysteries


I've mentioned the Cipher Mysteries website before as an excellent source of historical research with regards to the Pigeon Code, and today there is a fantastic post up there detailing the research done by Nick Pelling and Stu Rutter who have been getting dug in to the National Archives at Kew to dig up as much historical information as they can. And boy have they come up trumps.

Nick and Stu appear to have been able to identify the pigeon, the army unit, the sender (one Lance Serjeant William Stout, of the Royal Engineers) as well as the operation the message likely relates to.

This provides much fertile material for cribs, always assuming of course that the code or cipher is amenable to such things. On which note, Nick and Stu have also come up with some possibilities for the type of cipher. Apparently the RE used a type of double transposition cipher and possibly a Syllabic Cipher. I;m not familiar with the second type, which makes it all the more interesting. Do go and have a read, it is historical research of a very high quality.

I'm going to be travelling the next few days but will try to keep up with the cracking pace Nick is setting.

Pigeon Code : Coincidence ? Or Something Far More Sinister ?


Computing a simple Index of Coincidence for the Pigeon Code ciphertext yields a very low result, but take heart! Our N is very small!

Pigeon Code – More Speculation – Are Full Stops Structural ?


Pigeon Code – AOAKN! Are the full stops real ? Are they structural ? Will we find out in the next exciting instalment ?

Cryptanalyst’s Bookshelf – Books To Get You Started

First Class Travel

When I first started wittering about the Pigeon Code I said that I would post up some tutorials on basic cryptanalysis. This post is basically the preface to that forthcoming series. A teaser, if you will. Before we get into the theory and start looking at some ciphers, here are a few books that provide coverage of the subject of cryptanalysis at varying levels of technicality.

Ladybird Learnabout Codes and Ciphers

Ladybird Learnabout Codes And Ciphers

I include this only partly as a joke. I recently obtained a copy of this book that I owned and cherished as a child but that had gone the way of all things - probably via a jumble sale - and it is a surprisingly thorough introduction to a variety of ciphers, starting with the skytale and proceeding through simple substitution and transposition ciphers to polyaplhabetic ones including the Porta and Vigenère ciphers. It also includes instructions for making a variety of cipher machines from paper, cardboard tubes and old cans. Excellent for younger readers and lots of fun for older ones as well. Now sadly out of print, but there seem to be a few knocking about via Amazon.

The Code Book - Simon Singh

Simon Singh's Code Book

A constant companion in the days when I first started to break ciphers, my tattered copy has been dragged around a lot and has even survived being accidentally left under a hotel bed in Milton Keynes where I was staying while making a pilgrimage to Bletchey Park - a trip I can't recommend highly enough if you are into computers or code breaking in any way, truly awesome.

Singh's account of the history of codes and ciphers and the arts of breaking them is accessible and very engaging. He also sets the reader a number of challenges, presenting ciphers to be broken. Some of which are quite fiendish, but all of which are well worth the attempt.

If you're interested in cryptography and cryptanalysis, this is really the first book you should read as it provides an excellent background in both the history and the techniques.

Cryptanalysis - Helen Fouche Gaines

Cryptanalysis - A Study Of Ciphers And Their Solutions

If you want to break classical (i.e. non machine) ciphers, buy this book. First published in 1939 it remains an absolute classic in its field. Fouche Gaines covers practically every type of cipher you can make using a pen and paper in enough depth for the budding cryptanalyst to get their teeth into actually breaking them. I don't think there is a more complete reference on classical cryptanalysis, and if there is, I would very much like to have a copy!

Elementary Cryptanalysis - Abraham Sinkov

Elementary Cryptanalysis

Another absolute classic. Written by legendary code breaker Abraham Sinkov and first published in 1966, this book provides a solid mathematical model for analysing classical ciphers as well as thorough explanations of the math involved and why it works. This isn't quite as much of a must have as Cryptanalysis, but is pretty essential if you want to write computer code to do cipher analysis. It even includes an appendix with some sample programs in BASIC for computing things like trigram frequency and index of coincidence which were written in 1979 and added as supplementary material to the 1980 reprint.


Applied Cryptanalysis - Mark Stamp and Richard M. Low

Applied Cryptanalysis

Classical cryptanalysis is a fun hobby, but sooner or later you'll want to know about more modern ciphers, and this book is an excellent place to get started. It covers classic ciphers and WW2 machine ciphers in a scant 70 pages before going on describe cryptanalytic attacks on modern stream, block and public key ciphers. They aren't kidding about the applied part either, the text presents real attacks that break real cipher systems. Modern cryptanalysis is a huge hairy beast full of hard maths, but this book manages to give the reader a fairly gentle introduction to the field.

All of these books can be bagged from Amazon, find the listmania list here. Or you can have them ordered by your local bookshop. Be prepared for a long wait though!

More Pigeon Code Speculation – A Mistranscription and Other Issues – Some Better Numbers


Having been back through the various released images of the Pigeon Code, particularly the GCHQ image, I have tweaked my transcription a bit. Only by one letter to be sure, but this can make a huge difference.

So currently, the transcription I'm using for the cipher is this.


And given previous speculation with regards to AOAKN, the groups I'm using for any actual analysis are


That makes the frequency distribution look like this

Frequency Counts

I also have some better digram counts. When I hacked up the python scripts I used last time, I had my computer programmer head on, rather than my cryptanalyst head. The code does a good job, but it's not counting like a person, and is counting things it oughtn't. I hope to develop some better and more modular code to update my usual pencil and paper cryptanalysis workflow.

A better set of n-grams counts is

gringo2:pigeon_wip steve$ ./ngrams.py no_aoakn.txt
Matched : FN at 6 is FN at 29 stride 22
Matched : FN at 6 is FN at 111 stride 104
Matched : DJ at 21 is DJ at 86 stride 64
Matched : JR at 52 is JR at 106 stride 53
Matched : RZ at 53 is RZ at 107 stride 53
Matched : GH at 99 is GH at 103 stride 3
Matched : JRZ at 52 is JRZ at 106 stride 52

Which contains some interesting numbers. The 'stride' is the number of positions in the ciphertext after which the sequence is repeated (a.k.a the period) Really, we should discount RZ and JR as digrams if we are to count them as a trigram, which leaves us with a count like this.

Matched : FN at 6 is FN at 29 stride 22
Matched : FN at 6 is FN at 111 stride 104
Matched : DJ at 21 is DJ at 86 stride 64
Matched : GH at 99 is GH at 103 stride 3
Matched : JRZ at 52 is JRZ at 106 stride 52

Note that JRZ is the longest repeated sequence in the ciphertext, there are no larger ones.

Via the excellent Cipher Mysteries site, there has apparantly been some speculation over at 'WW2 Talk' about the use of a Playfair type cipher on ther grounds of the repeated digrams. Certainly, such ciphers were in use at the time, at least by Axis forces (this whole Pigeon Code affair has really driven hme the fact that I know very little about Allied forces), and there is a fascinating account of how the German Doppelkastenschliissel ot 'Two Box' cipher was broken available from the NSA archives (PDF)

On the face of it, the cipher text lacks several features that we might expect to find in a Playfair, or any kind of cipher based on rectangular digram substitution. In particular, the most noteworthy feature of this particular cipher text is that it contains all 26 letters of the alphabet, where box digram types tend to contain only numbers whose square root is an integer. 25 is the number most usually encountered in the texts, with J, a very low frequency letter in English, missing from the cipher rectangle and being represented as I, though the rules are by no means fixed. Digrammatic ciphers also tend to result in ciphertexts of even length, where this text is either 125 or 135 letters long, depending on whether one counts the AOKAN group

As Helen Fouche Gaines wisely puts it in Cryptanalysis :

It is sometimes said of the Playfair that it can be distinguished from other ciphers by (1) the fact that cryptograms contain an even number of letters, (2) the fact that only 25 letters are represented in its general frequency count, (3) the fact that when the cryptogram is marked into pairs, no pairs will be a doubled letter, and (4) the presence of long repeated sequences at irregular intervals. As conclusive evidence, these are debatable points, but all are good supporting evidence.

Prima facie, the Pigeon Cipher - as it ought to be called - fails all of these tests. We can readily see if we attempt to simply mark out the pairs on the first line that we get a doubled letter right away.


If we ignore the leading H and pair off, we encounter a repeat by the third line


As Helen Fouche Gaines says though, this need not be conclusive. There are enough variations on the basic idea of a Playfair cipher that none of these have to be true. There could for instance be further elements of transposition, perhaps designed very specifically to disguise these characteristic features.

One thing that is noted by both Fouche Gaines in Cryptanalysis and Abe Sinkov in Elementary Cryptanalysis, a mathematical approach is that when a digrammatic cipher such as the Playfair is used, the common factors of the spaces between repetitions (known as the period) will mostly tend be 2.

Interestingly, this seems to be very much the case with the ciphertext in question. And the repeats

Matched : FN at 6 is FN at 29 stride 22
Matched : FN at 6 is FN at 111 stride 104
Matched : DJ at 21 is DJ at 86 stride 64
Matched : GH at 99 is GH at 103 stride 3
Matched : JRZ at 52 is JRZ at 106 stride 52

Give us the factors

gringo2:pigeon_wip steve$ ./factors.py 3 22 52 64 104
3 []
22 [11, 2]
52 [26, 13, 4, 2]
64 [32, 16, 8, 4, 2]
104 [52, 26, 13, 8, 4, 2]
Common Factor : []
Most Common : [(2, 4), (4, 3), (8, 2), (13, 2), (26, 2)]

In which we can see that 2 is by far the commonest factor. With the exception of the period 3 for the repeat of GH all the periods are multiples of 2.

As has been the case with everything in the mysterious matter of the Pigeon Code, this is very far from conclusive, but it does suggest some interesting possibilities for further investigation and analysis. My gut feeling is still that we're looking at a one time pad enciphered code message, but I won't be happy to hang my hat on that until I've got around to throwing the whole bag of tricks at the ciphertext, and it's a fairly sizeable bag. Keep watching the skies.

Further Pigeon Code Scuttlebutt And Some Things I missed


The other day I wrote a post about the 'Pigeon Code', a WW2 era coded message discovered attached to a pigeon in a chimney.Fortunately, I labelled it as baseless speculation that was likely to be wrong, and as it turns out, at least some of it is.

After finally getting around to reading through some more of the press, I discovered a link to the actual GCHQ press release. The sort of thing I should really have looked up right away. Then again, who imagines GCHQ issuing press releases ?

In the comments on the last piece, Koala raised the interesting possibility that that the two non cipher group strings at the bottom of the pigeon code document were identifiers for pigeons, and it seems GCHQ are of the same mind. In a press release that, much to my chagrin, was published four and a bit days before I got round to poking the 'publish' button, they wrote :

Each pigeon in service was given an identity number. Two such numbers, NURP.40.TW.194 and NURP.37.OK.76, have been identified on the Bletchingley message. Either of these could be the identity of the pigeon in the chimney. The Curator of the Pigeon Museum at Bletchley Park is trying to trace these numbers, and if they are identified and their wartime service established, it could help to decode the message ...

They also have a much better - and clearly enhanced - image of the original message.

There is one slightly odd thing about the press release though, which could of course be down to a misunderstanding between a cryptanalyst and a press officer - something not at all difficult to imagine. The press release states

During the war, the methods used to encode messages naturally needed to be as secure as possible and various methods were used. The senders would often have specialist codebooks in which each code group of four or five letters had a meaning relevant to a specific operation, allowing much information to be sent in a short message. For added security, the code groups could then themselves be encrypted using, for example, a one-time pad.

The message found at Bletchingley had 27 five-letter code groups, and the GCHQ experts believe its contents are consistent with this method. This means that without access to the relevant codebooks and details of any additional encryption used, it will remain impossible to decrypt.

The reason I say this is odd is that the message being in five letter groups is not necessarily relevant to the question of whether it uses a code based on four or five letter code groups. Five letter groups were standard for most morse code transmissions whether encoded, encrypted, or in the clear, and usually for cryptograms. Helen Fouche Gaines' classic 1939 treatise, 'Elemenary Cryptanalysis' states (pp9, new Dover edition)

The operations of of writing in and taking off may be goverened by any agreed ruling, though the second of these must result in five-letter groups if the cryptogram is to be transmitted by wire or radio.

Granted, this cryptogram was sent by pigeon(s). But the breaking up into five letter groups is done not just for wireless transmision (though this is no doubt where the convention arose) but because transcribing apparantly random text is difficult, and breaking into chunks ensures far better accuracy.

It would be odd, to say the least, if the cryppies at GCHQ were unaware of this convention.

Lastly, and by no means leastly, there is some much more informed speculation than mine over at the rather wonderful Cipher Mysteries website (which I only became aware of today, and will certainly become a frequent visitor to) which suggests, among many other fascinating things, that this may be a message to Bomber Command, that being the identifier 'X02'.

I'm off there now, in fact, to join in the fun!

Pigeon Code : Some Idle Speculation – With Graphs


So, a chap in darkest Surrey pulled a mouldy old pigeon out of his chimney and found what would appear to be a WW2 era cipher message in a canister attached to its leg. Here's the bit of paper in question.


As of the now, GCHQ 'Britain's top code breakers' claim to be stumped by it. Not surprisingly, when you have a good look at it, because its tricky. Everybody seems to have their theory.

So, without further ado, I shall engage in rampant speculation, with some graphs but very little science. If I get time later, I might add some science, but don't hold your breath.

Here's how I think the message breaks down, and why, with the caveat that this is no more likely to be accurate than anyone else's wild and largely baseless speculation.


Indicator of some kind. Possibly identifies the originator of the message or any code books in use. Possibly both. There has been some speculation that this indicates a poem code, but I have some doubts about that, of which more later. Inclusion as message preamble and postamble makes it seem to me more like a call sign. Though this one appears to have been attached to a pigeon, it was more usual to send cipher traffic by radio. Sign on, sign off. As such, I have left it out of the frequency analysis. Of which more in a moment.

27 1525/6
Routing information. Because, well, look at the 'TO' field. "XO2". Really ? In the midst of the chaos of world war two, sellotaping a message to a pigeon, throwing it towards old Blightly and then hoping that when it arrives "XO2" will be sufficient for it to navigate the labyrinthine bureaucracy of British military intelligence strikes me as unlikely. Just as data packets in a modern communications network must always contain routing information in the clear, so must a bit of paper strapped onto a bird. Only more so. Maybe 1525/6 is a directorate index, 27, a room number ?

NURP 40TW 194 / NURP 37DK 76
Coordinates. Coded eastings and northings on some grid system. They look like coordinates. They have numbers in them. They aren't five grouped. Coordinates are another thing you would expect to find in a military communication. Also, you don't especially want to stuff coordinates in your ciphertext because then the enemy cryptanalysts have a crib, a piece of actual plain text or a characteristic pattern that they can expect to find in your message. Then they break your codes. Then you die.

Frequency Count
Every language, English, French, German, Python, has a characteristic distribution of the frequency with which the letters of its alphabet occur. As such, frequency analysis has long been the foundational tool of cipher analysis.

This is the frequency histogram displayed on the Wikipedia page relating to letter frequency. This particular frequency histogram is based on the words in the Concise Oxford English Dictionary.

Here's a histogram of the text of the pigeon code story (stripped of javascript, CSS and HTML tags) found on the BBC news website on Sunday 25 November 2012 produced with the following command.

gringo2:crypto steve$ ./scrape.py http://www.bbc.co.uk/news/uk-20456782 | ./freq.py \
| ./histo.py -o bbcpigeonstory.count -s 20

(All the scripts I used to generate data for this post are available as Gists.)

Close, isn't it ? you may be thinking that message length is going to have an effect. And you'd be right. Here's a short sentence, only 100 characters long.

Even a short sentence is given away by frequency counting, this is how cryptogrpahers break messages

And here is its frequency histogram

gringo2:crypto steve$ ./freq.py -i short.txt | ./histo.py -o short.txt.count.png

Not quite so similar, but still, you can clearly see the characteristic shapes and groupings. It also has another characteristic, one common to short texts. It is missing some letters. The sentence contains no J, L, X or Z

This characteristic distribution can remain even if we jumble up the alphabet, a fact which cryptographers have been using to unpick ciphers for the better part of a thousand years, and once it became widely known, much of the aim of code and cipher design was to prevent this. So we ought not to be too surprised by the histogram for the pigeon code.

gringo2:crypto steve$ ./freq.py -i pigeoncode/pigeon_code_no_aoakn.txt \
| ./histo.py -o pigeon_no_aoakn 

pigeon code frequency histogram with AOKAN group removed

This histogram shows none of the characteristics of English. In fact, it appears to have rather more of the characteristics of random text. This is to be expected if we are dealing with a one time pad or a machine cipher, or even a decent polyalphabetic cipher.

Update - Thu 29 Nov 2012 0851
Koala points out in the comments that the BBC have possibly mistranscribed some letters, I think that's right, as did many other sources. That made me realise that I hadn't published the actual transcription I was using, so here it is. It differs from some of the published transcriptions in that I have U where some have a W. Here is the transcription


NURP 40TW 194
MURP 37DK 76

End Of Update - Thu 29 Nov 2012 0851
Update - Thu 29 Nov 2012 0938
And the actual groups I used for the frequency count.


End Of Update - Thu 29 Nov 2012 0938

For a short message, it has remarkably high frequency of all letters. Indeed all 26 letters appear at least twice. In an ordinary English message of this length, this would be unusual, however there are several factors that could skew even a fairly simple cipher away from the norm. In enciphering a message into blocks of five like this, X is often used to indicate a space between words and Q and Z are often used as 'nulls' to pad out a message to the required length, and if this is only a cipher (as opposed to also using a code), the underlying language won't be conversational English but WW2 era military, which is likely to contain abbreviations and jargon. Even so, if we were looking at something like a simple substitution or transposition, we'd expect to be able to see the basic pattern, and to firm things up when we count digraphs and trigraphs.

Significant Trigraphs, Significant Digraphs, Repeats
Where 'significant' simply indicates that they appear more than once. We can't take word boundaries into account, because we don't know where they are.

gringo2:crypto steve$ ./countseq.py -i pigeoncode/pigeon_code_no_aokan.txt

FN 3
JR 2
RZ 2
GH 2
DJ 2
GG 1
EE 1
KK 1
DD 1
YY 1

Those are low. To put it mildly. With the exception of repeats, which feels high-ish. Polyalpha high-ish, although with only one repeated trigraph we're not going to get a standard Kasiski test, although there are other methods to try, there's not much to go on there.

By comparison, our short message from earlier has the following counts.

gringo2:crypto steve$ ./countseq.py -i short.txt

EN 5
IS 3
HO 2
NA 2
NC 2
RE 2
ES 2
NT 2
VE 2
SH 2
SS 1

Then again, perhaps our cryptic Tommy is "Sitting down to look at sheep, all effortlessly swimming" which only contains 49 letters and has 7 repeats. It seems fairly unlikely you'd want to cipher that and entrust to it to the beasts of the air though, unless you were just trying to wind up a a cryptographer. Not something we should rule out.

Why I don't Think It Is a Poem Code
From the moment that the pigeon code hit the internet, people were getting terribly excited about it being an SOE message on the way to Bletchley. Even though that never happened. And even though it is unlikely that an SOE operative would be using UK official headed note paper, what with the whole being behind enemy lines thing. The problem is that the SOE poem code was essentially a transposition cipher. Transposition ciphers can be hard to break - unless you know the keying system, which is why Leo Marks at SOE spent some of the war writing original poetry for SOE agents to use. To prevent the German cryptanalysts from getting hold of a load of cribs from published sources.

Be that as it may, a transposition cipher, even given some padding and a few more Xs than you might expect, is very, very easy to spot. Lets take an extreme example. Let's take our short message and transpose its characters entirely at random.

When I ran that, I got this.
iibcnr tsran gnh n vi ocpqi amyrtfeEeha n tesoertss enc gsruee hecg,y e iaypurboeessaakg vwswtnhsyo

Or, if your prefer

gringo2:crypto steve$ ./grid.py -i short_shuffled.txt


Which I think you'll agree is pretty inscrutable. However, if we look at a frequency histogram of this text next to the original, it looks like this.

Even with a completely random transposition of the characters, the frequency is exactly the same because, well, because the letters didn't change, they just moved. That's the thing about transpositions, you just can't hide what you did. Even I did allow myself to be carried away for a moment by the romantic notion that this might somehow be an SOE message, I'd have to point out that based on the testimony of the man who wrote the poems SOE used for poem codes around the time this message was sent, the dual indicator suggests it was part of "Operation Gift Horse", a scheme to make the German code breakers think SOE was using a poem code and waste their time trying to decrypt messages, when in fact SOE were using a 'WOK', a Worked Out Key, basically a one time pad.

And In Any Case It's Too Short
142 characters and some (possibly) coordinates. That's about the same as a geo-tagged tweet. Have you tried to convey complicated information in a tweet ? [ We should pause for a moment and consider whether this is, in fact, this years quirky GCHQ hiring stunt. Considering how much time I burned on the last one, I'm going to disregard that possibility, much as it pains me.]

Quite possibly we're looking at both a cipher and a code. Though often used interchangeably, they are quite different. Here's a snippet of the 'Acme code', a commercial code consisting of 100,000 code words that was in use in the 1920s, largely to reduce the cost of transmitting telegrams.

As you can see, you can fit a whole lot of information into a five letter code group. The only limit is the imagination of the code maker.

Of course, if #pigeoncode was constructed in this way, it will be almost impossible to break. For very large values of 'almost'. I've broken some short messages myself, shorter than this, and with pencil and paper at that, but those were contrived examples taken from text books. If this is a coded message, then without some knowledge of the code groups in use, there's very little leverage even for a good attempt.

The one thing we can be certain of is that it isn't something simple. That much is reasonably obvious simply from the context, but it never hurts to measure obvious. It is, of course, impossible to make any firm asessment from such a short message, but I will throw caution to the wind and guess, in the hope that someday I will proved wrong. I guess : Code enciphered with one time pad, code enciphered with polyalphabetic cipher of some type, perhaps even a machine [It has been suggested that pigeon was used as a transmission medium due to lack of the requisite time to errect an aerial. If that was the case, I can't really see a field unit stopping for long enough to get a TypeX up and running either ] or a very short message encoded via one time pad.

If you've read this far down, and haven't lost the will to live, come back soon. I'll be running an occasional series on cryptanalysis for the curious.

Built With Bootstrap
Powered By Wordpress
Coded By Enigmatic Ape