## Pigeon Code Cipher Identitifed ?

As mentioned in an earlier post, Nick Pelling of Cipher Mysteries and Stuart Rutter did some excellent research on the Pigeon Code, or as I'm now a bit more confident to call it, the Pigeon Cipher.

I've just been getting caught up on Stuart's research into the nature of the cipher used to encrypt the message, and while it's to early to be sure, I wouldn't be at all surprised to find out thath he has pretty much nailed it as being the 'LINEX Low Grade Cipher' which he describes in some detail in a fascinating post here.

Some of the interesting characteristics of the LINEX cipher - at least from a very brief look - are that it appears to result in a ciphertext with the full compliment of 26 letters, as seen in the Pigeon Cipher, the procedure involved ends up as a very complicated polyaplhabetic substitution, which would produce odd looking frequency, digram and trigram counts - as seen in the Pigoen Cipher, and indeed an absurdly low Index of Coincidence - as seen in the Piegon Cipher.

Time will tell, but at first blush, this appears to be an excellent candidate!

## Pigeon Code Almost Certainly Not Broken

Sometime on Sunday evening, as I was wending my way home on the Metro, I started to get messages from people asking if I’d heard that the Pigeon Code had been cracked. As the cell reception along that particular stretch of line is not great, I had to wait until I’d got home to find out what all the excitement was about.

Sadly, as it turns out, the excitement seems to have been about not very much. The story seems to have originated in the Dorset Echo (“It’s a real coo as ‘unbreakable’ war code found on pigeon in Portland is cracked”) on Saturday 15 December 2012, and was subsequently picked up by the Daily Mail (“Hit Jerry’s panzers here’… code on dead wartime pigeon is cracked”) and then by the BBC (“Has World War II carrier pigeon message been cracked?”) on Sunday 16 December .

## Pigeon Code – Fantastic Scholarship from Cipher Mysteries

I've mentioned the Cipher Mysteries website before as an excellent source of historical research with regards to the Pigeon Code, and today there is a fantastic post up there detailing the research done by Nick Pelling and Stu Rutter who have been getting dug in to the National Archives at Kew to dig up as much historical information as they can. And boy have they come up trumps.

Nick and Stu appear to have been able to identify the pigeon, the army unit, the sender (one Lance Serjeant William Stout, of the Royal Engineers) as well as the operation the message likely relates to.

This provides much fertile material for cribs, always assuming of course that the code or cipher is amenable to such things. On which note, Nick and Stu have also come up with some possibilities for the type of cipher. Apparently the RE used a type of double transposition cipher and possibly a Syllabic Cipher. I;m not familiar with the second type, which makes it all the more interesting. Do go and have a read, it is historical research of a very high quality.

I'm going to be travelling the next few days but will try to keep up with the cracking pace Nick is setting.

## Pigeon Code : Coincidence ? Or Something Far More Sinister ?

Computing a simple Index of Coincidence for the Pigeon Code ciphertext yields a very low result, but take heart! Our N is very small!

## Pigeon Code – More Speculation – Are Full Stops Structural ?

Pigeon Code – AOAKN! Are the full stops real ? Are they structural ? Will we find out in the next exciting instalment ?

## Cryptanalyst’s Bookshelf – Books To Get You Started

When I first started wittering about the Pigeon Code I said that I would post up some tutorials on basic cryptanalysis. This post is basically the preface to that forthcoming series. A teaser, if you will. Before we get into the theory and start looking at some ciphers, here are a few books that provide coverage of the subject of cryptanalysis at varying levels of technicality.

I include this only partly as a joke. I recently obtained a copy of this book that I owned and cherished as a child but that had gone the way of all things - probably via a jumble sale - and it is a surprisingly thorough introduction to a variety of ciphers, starting with the skytale and proceeding through simple substitution and transposition ciphers to polyaplhabetic ones including the Porta and Vigenère ciphers. It also includes instructions for making a variety of cipher machines from paper, cardboard tubes and old cans. Excellent for younger readers and lots of fun for older ones as well. Now sadly out of print, but there seem to be a few knocking about via Amazon.

The Code Book - Simon Singh

A constant companion in the days when I first started to break ciphers, my tattered copy has been dragged around a lot and has even survived being accidentally left under a hotel bed in Milton Keynes where I was staying while making a pilgrimage to Bletchey Park - a trip I can't recommend highly enough if you are into computers or code breaking in any way, truly awesome.

Singh's account of the history of codes and ciphers and the arts of breaking them is accessible and very engaging. He also sets the reader a number of challenges, presenting ciphers to be broken. Some of which are quite fiendish, but all of which are well worth the attempt.

If you're interested in cryptography and cryptanalysis, this is really the first book you should read as it provides an excellent background in both the history and the techniques.

Cryptanalysis - Helen Fouche Gaines

If you want to break classical (i.e. non machine) ciphers, buy this book. First published in 1939 it remains an absolute classic in its field. Fouche Gaines covers practically every type of cipher you can make using a pen and paper in enough depth for the budding cryptanalyst to get their teeth into actually breaking them. I don't think there is a more complete reference on classical cryptanalysis, and if there is, I would very much like to have a copy!

Elementary Cryptanalysis - Abraham Sinkov

Another absolute classic. Written by legendary code breaker Abraham Sinkov and first published in 1966, this book provides a solid mathematical model for analysing classical ciphers as well as thorough explanations of the math involved and why it works. This isn't quite as much of a must have as Cryptanalysis, but is pretty essential if you want to write computer code to do cipher analysis. It even includes an appendix with some sample programs in BASIC for computing things like trigram frequency and index of coincidence which were written in 1979 and added as supplementary material to the 1980 reprint.

Applied Cryptanalysis - Mark Stamp and Richard M. Low

Classical cryptanalysis is a fun hobby, but sooner or later you'll want to know about more modern ciphers, and this book is an excellent place to get started. It covers classic ciphers and WW2 machine ciphers in a scant 70 pages before going on describe cryptanalytic attacks on modern stream, block and public key ciphers. They aren't kidding about the applied part either, the text presents real attacks that break real cipher systems. Modern cryptanalysis is a huge hairy beast full of hard maths, but this book manages to give the reader a fairly gentle introduction to the field.

All of these books can be bagged from Amazon, find the listmania list here. Or you can have them ordered by your local bookshop. Be prepared for a long wait though!

## More Pigeon Code Speculation – A Mistranscription and Other Issues – Some Better Numbers

Having been back through the various released images of the Pigeon Code, particularly the GCHQ image, I have tweaked my transcription a bit. Only by one letter to be sure, but this can make a huge difference.

So currently, the transcription I'm using for the cipher is this.

AOAKN HVPKD FNFJU YIDDC
RQXSR DJHFP GOVFN MIAPX
PABUZ WYYNP CMPNW HJRZH
NLXKG MEMKK ONOIB AKEEQ
UAOTA RBQRH DJOFM TPZEH
LKXGH RGGHT JRZCQ FNKTQ
KLDTS GQIRU AOAKN

And given previous speculation with regards to AOAKN, the groups I'm using for any actual analysis are

HVPKD FNFJU YIDDC RQXSR
DJHFP GOVFN MIAPX PABUZ
WYYNP CMPNW HJRZH NLXKG
MEMKK ONOIB AKEEQ UAOTA
RBQRH DJOFM TPZEH LKXGH
RGGHT JRZCQ FNKTQ KLDTS
GQIRU

That makes the frequency distribution look like this

I also have some better digram counts. When I hacked up the python scripts I used last time, I had my computer programmer head on, rather than my cryptanalyst head. The code does a good job, but it's not counting like a person, and is counting things it oughtn't. I hope to develop some better and more modular code to update my usual pencil and paper cryptanalysis workflow.

A better set of n-grams counts is

gringo2:pigeon_wip steve$./ngrams.py no_aoakn.txt Matched : FN at 6 is FN at 29 stride 22 Matched : FN at 6 is FN at 111 stride 104 Matched : DJ at 21 is DJ at 86 stride 64 Matched : JR at 52 is JR at 106 stride 53 Matched : RZ at 53 is RZ at 107 stride 53 Matched : GH at 99 is GH at 103 stride 3 Matched : JRZ at 52 is JRZ at 106 stride 52 Which contains some interesting numbers. The 'stride' is the number of positions in the ciphertext after which the sequence is repeated (a.k.a the period) Really, we should discount RZ and JR as digrams if we are to count them as a trigram, which leaves us with a count like this. Matched : FN at 6 is FN at 29 stride 22 Matched : FN at 6 is FN at 111 stride 104 Matched : DJ at 21 is DJ at 86 stride 64 Matched : GH at 99 is GH at 103 stride 3 Matched : JRZ at 52 is JRZ at 106 stride 52 Note that JRZ is the longest repeated sequence in the ciphertext, there are no larger ones. Via the excellent Cipher Mysteries site, there has apparantly been some speculation over at 'WW2 Talk' about the use of a Playfair type cipher on ther grounds of the repeated digrams. Certainly, such ciphers were in use at the time, at least by Axis forces (this whole Pigeon Code affair has really driven hme the fact that I know very little about Allied forces), and there is a fascinating account of how the German Doppelkastenschliissel ot 'Two Box' cipher was broken available from the NSA archives (PDF) On the face of it, the cipher text lacks several features that we might expect to find in a Playfair, or any kind of cipher based on rectangular digram substitution. In particular, the most noteworthy feature of this particular cipher text is that it contains all 26 letters of the alphabet, where box digram types tend to contain only numbers whose square root is an integer. 25 is the number most usually encountered in the texts, with J, a very low frequency letter in English, missing from the cipher rectangle and being represented as I, though the rules are by no means fixed. Digrammatic ciphers also tend to result in ciphertexts of even length, where this text is either 125 or 135 letters long, depending on whether one counts the AOKAN group As Helen Fouche Gaines wisely puts it in Cryptanalysis : It is sometimes said of the Playfair that it can be distinguished from other ciphers by (1) the fact that cryptograms contain an even number of letters, (2) the fact that only 25 letters are represented in its general frequency count, (3) the fact that when the cryptogram is marked into pairs, no pairs will be a doubled letter, and (4) the presence of long repeated sequences at irregular intervals. As conclusive evidence, these are debatable points, but all are good supporting evidence. Prima facie, the Pigeon Cipher - as it ought to be called - fails all of these tests. We can readily see if we attempt to simply mark out the pairs on the first line that we get a doubled letter right away. HV PK DF NF JU YI DD C If we ignore the leading H and pair off, we encounter a repeat by the third line H VP KD FN FJ UY ID DC RQ XS RD JH FP GO VF NM IA PX PA BU ZW YY NP CM PN WH JR ZH As Helen Fouche Gaines says though, this need not be conclusive. There are enough variations on the basic idea of a Playfair cipher that none of these have to be true. There could for instance be further elements of transposition, perhaps designed very specifically to disguise these characteristic features. One thing that is noted by both Fouche Gaines in Cryptanalysis and Abe Sinkov in Elementary Cryptanalysis, a mathematical approach is that when a digrammatic cipher such as the Playfair is used, the common factors of the spaces between repetitions (known as the period) will mostly tend be 2. Interestingly, this seems to be very much the case with the ciphertext in question. And the repeats Matched : FN at 6 is FN at 29 stride 22 Matched : FN at 6 is FN at 111 stride 104 Matched : DJ at 21 is DJ at 86 stride 64 Matched : GH at 99 is GH at 103 stride 3 Matched : JRZ at 52 is JRZ at 106 stride 52 Give us the factors gringo2:pigeon_wip steve$ ./factors.py 3 22 52 64 104
3 []
22 [11, 2]
52 [26, 13, 4, 2]
64 [32, 16, 8, 4, 2]
104 [52, 26, 13, 8, 4, 2]
Common Factor : []
Most Common : [(2, 4), (4, 3), (8, 2), (13, 2), (26, 2)]

In which we can see that 2 is by far the commonest factor. With the exception of the period 3 for the repeat of GH all the periods are multiples of 2.

As has been the case with everything in the mysterious matter of the Pigeon Code, this is very far from conclusive, but it does suggest some interesting possibilities for further investigation and analysis. My gut feeling is still that we're looking at a one time pad enciphered code message, but I won't be happy to hang my hat on that until I've got around to throwing the whole bag of tricks at the ciphertext, and it's a fairly sizeable bag. Keep watching the skies.

## Enigmatic Ape Goes Social Bananas

It’s been a busy week here in the Enigmatic Ape basement command bunker, and let me tell you, Office Cat is a harsh taskmaster. This week we’ve been starting to lay the foundations for our ‘social media marketing strategy’, a potent blend of secret techniques contributed by an elite team of social media and marketing mercenaries, and this one bloke I met down the pub who says he runs a record label.

## WordPress Shortcode <p> tags (and HTML escape plugins)

So, you’ve decided that you want to add a shortcode, perhaps as a plugin, to your Wordpress setup that will escape HTML so you can quote some HTML code snippet or other in a post. “Why this will be easy!” you’ve cried, knowing that PHP provides some very useful functions specifically for dealing with HTML escaping issues.

And you looked at the shortcode API, and you wrote, almost exactly, this very short piece of code to simply call htmlentities() on an enclosing shortcode’s content.

And it didn’t work. It spewed unwanted and untyped <br /> and <p> tags all over the place, especially if you used any <pre> tags in or near the enclosed content. And it broke your page HTML. And you were cross.

## Further Pigeon Code Scuttlebutt And Some Things I missed

The other day I wrote a post about the 'Pigeon Code', a WW2 era coded message discovered attached to a pigeon in a chimney.Fortunately, I labelled it as baseless speculation that was likely to be wrong, and as it turns out, at least some of it is.

After finally getting around to reading through some more of the press, I discovered a link to the actual GCHQ press release. The sort of thing I should really have looked up right away. Then again, who imagines GCHQ issuing press releases ?

In the comments on the last piece, Koala raised the interesting possibility that that the two non cipher group strings at the bottom of the pigeon code document were identifiers for pigeons, and it seems GCHQ are of the same mind. In a press release that, much to my chagrin, was published four and a bit days before I got round to poking the 'publish' button, they wrote :

Each pigeon in service was given an identity number. Two such numbers, NURP.40.TW.194 and NURP.37.OK.76, have been identified on the Bletchingley message. Either of these could be the identity of the pigeon in the chimney. The Curator of the Pigeon Museum at Bletchley Park is trying to trace these numbers, and if they are identified and their wartime service established, it could help to decode the message ...

They also have a much better - and clearly enhanced - image of the original message.

There is one slightly odd thing about the press release though, which could of course be down to a misunderstanding between a cryptanalyst and a press officer - something not at all difficult to imagine. The press release states

During the war, the methods used to encode messages naturally needed to be as secure as possible and various methods were used. The senders would often have specialist codebooks in which each code group of four or five letters had a meaning relevant to a specific operation, allowing much information to be sent in a short message. For added security, the code groups could then themselves be encrypted using, for example, a one-time pad.

The message found at Bletchingley had 27 five-letter code groups, and the GCHQ experts believe its contents are consistent with this method. This means that without access to the relevant codebooks and details of any additional encryption used, it will remain impossible to decrypt.

The reason I say this is odd is that the message being in five letter groups is not necessarily relevant to the question of whether it uses a code based on four or five letter code groups. Five letter groups were standard for most morse code transmissions whether encoded, encrypted, or in the clear, and usually for cryptograms. Helen Fouche Gaines' classic 1939 treatise, 'Elemenary Cryptanalysis' states (pp9, new Dover edition)

The operations of of writing in and taking off may be goverened by any agreed ruling, though the second of these must result in five-letter groups if the cryptogram is to be transmitted by wire or radio.

Granted, this cryptogram was sent by pigeon(s). But the breaking up into five letter groups is done not just for wireless transmision (though this is no doubt where the convention arose) but because transcribing apparantly random text is difficult, and breaking into chunks ensures far better accuracy.

It would be odd, to say the least, if the cryppies at GCHQ were unaware of this convention.

Lastly, and by no means leastly, there is some much more informed speculation than mine over at the rather wonderful Cipher Mysteries website (which I only became aware of today, and will certainly become a frequent visitor to) which suggests, among many other fascinating things, that this may be a message to Bomber Command, that being the identifier 'X02'.

I'm off there now, in fact, to join in the fun!

## Enigmatic Ape To Get Designer Makeover

This week, the Enigmatic Ape website will be a whole month old. Really, we're still getting the bugs out of the CMS code, getting a feel for what's required to take the site forward as a key part of the business, and generally fiddling with things.

As part of the site's evolution, the Ape has recruited a crack team of designers to engage in a branding and graphic design exercise that will lift the site out of its current 'under construction' minimalist programmer art doldrums and up to the heady heights of, well, height.

As I write this, an elite cadre of design ninjas at a secret location somewhere in the UK are sharpening their cyber-crayons, lighting up the joss sticks and spinning up the whale song CDs and preparing to, er, well, do whatever it is that designers do. Given it's Monday morning, this probably involves grunting and caffeine. Certainly that's what office cat and I have spent our time on.

Office cat eats crayons, and is not very good at design. Moar coffee!

## The little tokenizer that could.

I wanted to write a fairly minimal text parser, little more than a tokenizer, in fact. This happens, when you're a programmer. Depending on what you're doing it can hapen a lot. In fact the problem is so common that there are a huge number of tools and libraries to help you.

Sometimes though, such a solution is too heavyweight, or doesn't fit well with what you're trying to do, or the set up costs are high. Setting up grammar for a table driven lexer or an RDP parser can be quite a chore.

The task in question, my partiucular task, is to create a little plugin for wordpress that I can use to do syntax highlighting on a variety of different kinds of structured text on this site. There are quite a number of these already, many of which are really very good. For what I wanted to do however, most of the one's I've tried so far seemed to me to be attempting to solve the wrong problem at least some of the time.

I hope what I mean by that will become clearer later as I build up the rest of the plugin and demo it. I have some very picky requirements for flexibility and usage that are a bit, well, odd.

So, anyway, the first thing one must do when processing text for such purposes is to tokenise the input, that is to say that it must be broken up into pieces somehow that we can later classify and tag. I chose to do the first run like this.

PHP's preg_split() function makes this particularly easy as it allows us to split up a string using a regex as the delimeter, and most importantly, will also return the delimeters, if we ask it nicely with the PREG_SPLIT_DELIM_CAPTURE flag.

The regex I used does pretty much excatly what it says on the tin. It splits on one or more spaces, or a member of PHP's punct regex charater class, which matches "printing characters, excluding letters and digits". This includes, e.g. all the math and logic operators, and the various braces and quotes as well as commas, full stops, etc.

Later passes of the tokeniser now have three basic classes to deal with, spaces, punctuation and 'everything else'. This is a very lightweight approach to tokenisation, which typically trys to make much finer distinctions, but it has some benefits in terms of flexibility, which I'll demonstrate later.

Sample output follows.

array(34) {
[0]=>
string(1) "["
[1]=>
string(7) "various"
[2]=>
string(1) " "
[3]=>
string(5) "words"
[4]=>
string(1) "!"
[5]=>
string(1) "]"
[6]=>
string(1) " "
[7]=>
string(1) "2"
[8]=>
string(1) " "
[9]=>
string(1) "+"
[10]=>
string(1) " "
[11]=>
string(1) "("
[12]=>
string(6) "thirty"
[13]=>
string(1) " "
[14]=>
string(7) "thirsty"
[15]=>
string(1) "_"
[16]=>
[17]=>
string(1) ")"
[18]=>
string(1) ","
[19]=>
string(1) " "
[20]=>
string(1) "'"
[21]=>
string(1) "9"
[22]=>
string(1) "^"
[23]=>
string(1) "7"
[24]=>
string(1) "'"
[25]=>
string(1) " "
[26]=>
string(1) "|"
[27]=>
string(1) " "
[28]=>
string(1) "6"
[29]=>
string(1) " "
[30]=>
string(1) ">"
[31]=>
string(1) "="
[32]=>
string(1) "2"
[33]=>
string(1) " "
}

Built With Bootstrap