## Pigeon Code : Some Idle Speculation – With Graphs

So, a chap in darkest Surrey pulled a mouldy old pigeon out of his chimney and found what would appear to be a WW2 era cipher message in a canister attached to its leg. Here's the bit of paper in question.

As of the now, GCHQ 'Britain's top code breakers' claim to be stumped by it. Not surprisingly, when you have a good look at it, because its tricky. Everybody seems to have their theory.

So, without further ado, I shall engage in rampant speculation, with some graphs but very little science. If I get time later, I might add some science, but don't hold your breath.

Here's how I think the message breaks down, and why, with the caveat that this is no more likely to be accurate than anyone else's wild and largely baseless speculation.

AOAKN
Indicator of some kind. Possibly identifies the originator of the message or any code books in use. Possibly both. There has been some speculation that this indicates a poem code, but I have some doubts about that, of which more later. Inclusion as message preamble and postamble makes it seem to me more like a call sign. Though this one appears to have been attached to a pigeon, it was more usual to send cipher traffic by radio. Sign on, sign off. As such, I have left it out of the frequency analysis. Of which more in a moment.

27 1525/6
Routing information. Because, well, look at the 'TO' field. "XO2". Really ? In the midst of the chaos of world war two, sellotaping a message to a pigeon, throwing it towards old Blightly and then hoping that when it arrives "XO2" will be sufficient for it to navigate the labyrinthine bureaucracy of British military intelligence strikes me as unlikely. Just as data packets in a modern communications network must always contain routing information in the clear, so must a bit of paper strapped onto a bird. Only more so. Maybe 1525/6 is a directorate index, 27, a room number ?

NURP 40TW 194 / NURP 37DK 76
Coordinates. Coded eastings and northings on some grid system. They look like coordinates. They have numbers in them. They aren't five grouped. Coordinates are another thing you would expect to find in a military communication. Also, you don't especially want to stuff coordinates in your ciphertext because then the enemy cryptanalysts have a crib, a piece of actual plain text or a characteristic pattern that they can expect to find in your message. Then they break your codes. Then you die.

Frequency Count
Every language, English, French, German, Python, has a characteristic distribution of the frequency with which the letters of its alphabet occur. As such, frequency analysis has long been the foundational tool of cipher analysis.

This is the frequency histogram displayed on the Wikipedia page relating to letter frequency. This particular frequency histogram is based on the words in the Concise Oxford English Dictionary.

Here's a histogram of the text of the pigeon code story (stripped of javascript, CSS and HTML tags) found on the BBC news website on Sunday 25 November 2012 produced with the following command.

gringo2:crypto steve$./scrape.py http://www.bbc.co.uk/news/uk-20456782 | ./freq.py \ | ./histo.py -o bbcpigeonstory.count -s 20 (All the scripts I used to generate data for this post are available as Gists.) Close, isn't it ? you may be thinking that message length is going to have an effect. And you'd be right. Here's a short sentence, only 100 characters long. Even a short sentence is given away by frequency counting, this is how cryptogrpahers break messages And here is its frequency histogram gringo2:crypto steve$ ./freq.py -i short.txt | ./histo.py -o short.txt.count.png

Not quite so similar, but still, you can clearly see the characteristic shapes and groupings. It also has another characteristic, one common to short texts. It is missing some letters. The sentence contains no J, L, X or Z

This characteristic distribution can remain even if we jumble up the alphabet, a fact which cryptographers have been using to unpick ciphers for the better part of a thousand years, and once it became widely known, much of the aim of code and cipher design was to prevent this. So we ought not to be too surprised by the histogram for the pigeon code.

gringo2:crypto steve$./freq.py -i pigeoncode/pigeon_code_no_aoakn.txt \ | ./histo.py -o pigeon_no_aoakn  This histogram shows none of the characteristics of English. In fact, it appears to have rather more of the characteristics of random text. This is to be expected if we are dealing with a one time pad or a machine cipher, or even a decent polyalphabetic cipher. Update - Thu 29 Nov 2012 0851 Koala points out in the comments that the BBC have possibly mistranscribed some letters, I think that's right, as did many other sources. That made me realise that I hadn't published the actual transcription I was using, so here it is. It differs from some of the published transcriptions in that I have U where some have a W. Here is the transcription  AOAKN HVPKD FNFJU YIDDC RQXSR DJHFP GOVFN MIAPX PABUZ WYYNP CMPNW HJRZH NLXKG MEMKK ONOIB AKEEQ UAOTA RBQRH DJOFM TPZEH LKXGH RGGHT JRZCQ FNKTQ KLDTS FQIRU AOAKN 27 1525/6   NURP 40TW 194 MURP 37DK 76  End Of Update - Thu 29 Nov 2012 0851 Update - Thu 29 Nov 2012 0938 And the actual groups I used for the frequency count.  HVPKD FNFJU YIDDC RQXSR DJHFP GOVFN MIAPX PABUZ WYYNP CMPNW HJRZH NLXKG MEMKK ONOIB AKEEQ UAOTA RBQRH DJOFM TPZEH LKXGH RGGHT JRZCQ FNKTQ KLDTS FQIRU  End Of Update - Thu 29 Nov 2012 0938 For a short message, it has remarkably high frequency of all letters. Indeed all 26 letters appear at least twice. In an ordinary English message of this length, this would be unusual, however there are several factors that could skew even a fairly simple cipher away from the norm. In enciphering a message into blocks of five like this, X is often used to indicate a space between words and Q and Z are often used as 'nulls' to pad out a message to the required length, and if this is only a cipher (as opposed to also using a code), the underlying language won't be conversational English but WW2 era military, which is likely to contain abbreviations and jargon. Even so, if we were looking at something like a simple substitution or transposition, we'd expect to be able to see the basic pattern, and to firm things up when we count digraphs and trigraphs. Significant Trigraphs, Significant Digraphs, Repeats Where 'significant' simply indicates that they appear more than once. We can't take word boundaries into account, because we don't know where they are. gringo2:crypto steve$ ./countseq.py -i pigeoncode/pigeon_code_no_aokan.txt
 Trigraphs JRZ 2 Digraphs FN 3 JR 2 RZ 2 GH 2 DJ 2 Repeats GG 1 EE 1 KK 1 DD 1 YY 1 

Those are low. To put it mildly. With the exception of repeats, which feels high-ish. Polyalpha high-ish, although with only one repeated trigraph we're not going to get a standard Kasiski test, although there are other methods to try, there's not much to go on there.

By comparison, our short message from earlier has the following counts.

gringo2:crypto steve$./countseq.py -i short.txt  Trigraphs ENA 2 ENC 2 VEN 2 SHO 2 Digraphs EN 5 IS 3 HO 2 NA 2 NC 2 RE 2 ES 2 NT 2 VE 2 SH 2 Repeats SS 1  Then again, perhaps our cryptic Tommy is "Sitting down to look at sheep, all effortlessly swimming" which only contains 49 letters and has 7 repeats. It seems fairly unlikely you'd want to cipher that and entrust to it to the beasts of the air though, unless you were just trying to wind up a a cryptographer. Not something we should rule out. Why I don't Think It Is a Poem Code From the moment that the pigeon code hit the internet, people were getting terribly excited about it being an SOE message on the way to Bletchley. Even though that never happened. And even though it is unlikely that an SOE operative would be using UK official headed note paper, what with the whole being behind enemy lines thing. The problem is that the SOE poem code was essentially a transposition cipher. Transposition ciphers can be hard to break - unless you know the keying system, which is why Leo Marks at SOE spent some of the war writing original poetry for SOE agents to use. To prevent the German cryptanalysts from getting hold of a load of cribs from published sources. Be that as it may, a transposition cipher, even given some padding and a few more Xs than you might expect, is very, very easy to spot. Lets take an extreme example. Let's take our short message and transpose its characters entirely at random. When I ran that, I got this. iibcnr tsran gnh n vi ocpqi amyrtfeEeha n tesoertss enc gsruee hecg,y e iaypurboeessaakg vwswtnhsyo Or, if your prefer gringo2:crypto steve$ ./grid.py -i short_shuffled.txt

 IIBCN RTSRA NGNHN VIOCP QIAMY RTFEE EHANT ESOER TSSEN CGSRU EEHEC GYEIA YPURB OEESS AAKGV WSWTN HSYO 

Which I think you'll agree is pretty inscrutable. However, if we look at a frequency histogram of this text next to the original, it looks like this.

Even with a completely random transposition of the characters, the frequency is exactly the same because, well, because the letters didn't change, they just moved. That's the thing about transpositions, you just can't hide what you did. Even I did allow myself to be carried away for a moment by the romantic notion that this might somehow be an SOE message, I'd have to point out that based on the testimony of the man who wrote the poems SOE used for poem codes around the time this message was sent, the dual indicator suggests it was part of "Operation Gift Horse", a scheme to make the German code breakers think SOE was using a poem code and waste their time trying to decrypt messages, when in fact SOE were using a 'WOK', a Worked Out Key, basically a one time pad.

And In Any Case It's Too Short
142 characters and some (possibly) coordinates. That's about the same as a geo-tagged tweet. Have you tried to convey complicated information in a tweet ? [ We should pause for a moment and consider whether this is, in fact, this years quirky GCHQ hiring stunt. Considering how much time I burned on the last one, I'm going to disregard that possibility, much as it pains me.]

Quite possibly we're looking at both a cipher and a code. Though often used interchangeably, they are quite different. Here's a snippet of the 'Acme code', a commercial code consisting of 100,000 code words that was in use in the 1920s, largely to reduce the cost of transmitting telegrams.

As you can see, you can fit a whole lot of information into a five letter code group. The only limit is the imagination of the code maker.

Of course, if #pigeoncode was constructed in this way, it will be almost impossible to break. For very large values of 'almost'. I've broken some short messages myself, shorter than this, and with pencil and paper at that, but those were contrived examples taken from text books. If this is a coded message, then without some knowledge of the code groups in use, there's very little leverage even for a good attempt.

The one thing we can be certain of is that it isn't something simple. That much is reasonably obvious simply from the context, but it never hurts to measure obvious. It is, of course, impossible to make any firm asessment from such a short message, but I will throw caution to the wind and guess, in the hope that someday I will proved wrong. I guess : Code enciphered with one time pad, code enciphered with polyalphabetic cipher of some type, perhaps even a machine [It has been suggested that pigeon was used as a transmission medium due to lack of the requisite time to errect an aerial. If that was the case, I can't really see a field unit stopping for long enough to get a TypeX up and running either ] or a very short message encoded via one time pad.

If you've read this far down, and haven't lost the will to live, come back soon. I'll be running an occasional series on cryptanalysis for the curious.

## ‘Better’ performSelector – Evil Fun With va_args

In the second part of this now dragging on a bit mini-series on building a better performSelector, we saw how we could leverage variadic functions, provided we were happy to wrap everything non objecty in NSValue type objects. Certainly this approach works, and is really quite flexible if you can live with all the boxing and unboxing of values. But what if you can't, or, more likely, just don't want to ? Is there a flexible way to do it without wrapping every non object ?

To answer that question, we're going to need to take a slightly deeper look at how variadic functions do their stuff. Here's an example of a simple (practically canonical, in fact) variadic function to jog your memory.

As we can see from that, variadic functions in C, and thus ObjC, are all about those three macros : va_start, va_arg and va_end. Their rough definitions are basically : 'Do whatever set up is necessary to make va_arg work', 'return the value of an argument and get ready to do the next one' and 'do any clearing up that may be necessary afterward', respectively. Their precise definitions vary by platform and compiler, they are 'implementation defined' meaning that the C standard specifies their existence and what they should do, but not how they should go about it. More specific guidelines are generally laid down in a platform's Application Binary Interface documentation, on which more later.

Let's call that function and get a look at how ObjC - or rather the llvm compiler - does go about it. There is a big screaming caveat coming up. This is from an iPhone 3GS running iOS 5.1.

 int sum = sumSomeInts( 3, 1,2,3 );

As we can clearly see, there are our 3 ints all neatly lined up on the stack.

At this point, you may be thinking "Yay! We can determine the number and the type and size of the arguments that a method takes, so we can just do what va_arg does and move a pointer along the stack. Something like this, in fact.

And you'd be right, but only sort of, and only some of the time.

And here is the big screaming caveat : THIS DOES NOT HAPPEN ON OS X 64 BIT. DO NOT ATTEMPT TO USE THIS CODE, OR THIS TECHNIQUE ON OS X 64 BIT, OR INDEED ON ANY OTHER 64 BIT PLATFORM OR IT WILL END IN TEARS. BITTER, SALTY, CRASH BUG TEARS.

As it turns out, on OS X 32 bit and indeed on iOS - even though iOS is running on an ARM core - the way that this works is simply to follow the standard x86 calling convention of pushing all the arguments onto the stack. va_start then sets a pointer to the first variadic argument, and va_arg simply derefernces and increments the pointer. va_end is effectively a no op. As a result of this, va_list is a char * on OS X 32 bit and the iOS simulator, and a void * on iOS devices.

This is emphatically not the case on 64 bit intel variety chips, because the calling convention on x86 64 bit is very different and tends to use registers, especially for floating point values. On x86 64, va_list is not a simple pointer to to some stack data, it is a struct containing pointers to several different memory areas, and va_arg is a slightly more complicated affair. The algorithm given in the x86 64 ABI is eleven steps. More than the two of x86 32's increment the pointer and return the value. Incidentally, you can find the 32 bit ABI here. Yes, that's a SCO link, don't shoot the messenger, it is the doc linked by Apple's OSX ABI reference.

Here's what NSLog(@"%s", @encode( typeof( va_list ) )); looks like on a few build combinations.


iOS 5.1 (device)        : ^v
iOS 5.1 (simulator)     : *
iOS 6   (simulator)     : *

OS X 10.7.5 (32 bit)    : *
OS X 10.7.5 (32/64 bit) : *
OS X 10.7.5 (64 bit)    : [1{__va_list_tag=II^v^v}]



That last one is an array containing one of these

So in summary, in the form which follows, this only works on iOS and OS X binaries compiled with a 32 bit target. Implementing a 64 bit version is left as an exercise for the intrepid reader. Or I might get around to it one day. Keep watching the skies. Or the Ape, whatever.

With that out of the way, let's have a look at some slightly more complicated examples. We define a variadic function

We'll call it with, amongst other things, a float value. Recall that in the first post of this series, we had difficulty with floats being passed to method IMPs ? This is why. Let's take a float value, say 123.00. We'll also define some other values that will come in handy later.

If we have a look at that float in the llvm debugger console, we can see its hex value

## Simple Python script for simple CSV to HTML table rows

'Simple' as in it's a simple script, and also 'simple' in the sense of it won't work for anything that isn't simple. I have encountered a gloriously diverse range of weird and wonderous things in CSV like format over the years that would confuse this code.

Python has a pretty decent CSV module for when you encounter a big data set dumped by a careless MS Access programmer, but I used this very simple script to generate tables from a very simple ASCII CSV spreadsheet dump that had no issues with quoting, etc.

Published in the hope that someone may find it helpful.

## Some notes gathered on PHP session security

Often, on the various codemonkey forums and websites you'll come accross when googling around PHP session security, you see people doing some really weird ass things in php session auth code that everyone seems to accept is useful, but often possibly isn't.

In particular, 'fingerprinting' a session by checking against a browser's User Agent String. This seems to be fairly pointless. Also the checking of the user's IP address, this also seems to be somewhat questionable.

There appear to be, broadly speaking, two types of attacks that can compromise PHP session security. Well, two that people get hot under the collar about, anyway. Session fixation, where the attacker is able to supply, force or guess a user's session ID, and session hijacking where an attacker is able to compromise the session ID by some other means.

Session fixation attacks, as a class, can be foiled by regenerating the session ID when a user authenticates, so in these situations, checking the User Agent/IP is redundant, as the attacker's supplied session is no longer valid.

Then there's session hijacking. This is trickier. For a network based hijack, the attacker has access to the User Agent string, and is, in all likelyhood either presenting as from the same IP address (behind the same router) or is able to pretend to be using TCP/IP spoofing, redirection, packet rewriting, or some other nifty trick evil hack. This is inherent in the access required to peform network based hijacking, be it sniffing or MITM attacks. The only mitigation for this is to never send a cookie or other form of session identifier over an unsecured link.

So, for either of the above scenarios, checking the User Agent/IP is either entirely redundant, or almost entirely worthless. I say 'almost', because it is just barely possible that some class of very amateur attacker executing a sniffing based attack will hit a speed bump while they figure out the User Agent might be being checked, but about ten seconds worth of googling will get them past it.

There is, however, a further scenario which is likely to be of concern to many PHP developers, which is having session IDs stolen by malicious actors on shared hosting platforms.

By default, PHP writes its session data to whatever is the local equivalent of /tmp. My current OS X web development box is using /var/tmp and my web host is using /tmp.

PHP ession data is stored in files with long hard to read names, something like sess_g23hhjh5rrce3laeuib8t5utl3 where 'g23hhjh5rrce3laeuib8t5utl3' is the session ID as stored in the cookie on the user's browser.

It is therefore entirely conceivable that either a malicious or compromised account on the same server can obtain your session info.

If you're on a shared host, you can clearly see this situation by running the following PHP code on your server. You probably shouldn't, but you can.

In this scenario, a session ID can be stolen by someone who doesn't have access to the user's User Agent string, unless it is stored in the session. If it is stored in the session, but is hashed, it is computationally relatively trivial to run a dictionary attack since User Agents are finite and predictable.

There seems to be a folk belief among programmers that salting the stored hash will prevent this attack, this is unlikely, even if you are doing it right because salted hashes are a defence against rainbow table based attacks, not dictionary ones. No amount of salt will help you cope with a trivial dictionary space.

Checking the UA in this scenario might be helpful, because if you check every GET/POST and deauthorise the session on a mismatch, the dictionary attack is thwarted. Unless it gets lucky the first time. The probability of this is going to be a function of the number of user sessions and the distribution of UA strings, I haven't done the math, but it should be fairly obvious that the more users you have, and the fewer browsers they are using, the higher the probability is of scoring a hit.

It should also be fairly obvious that deauthing your users based on a single incorrect user agent string is a pretty good way of allowing an attacker to DoS your entire user base. The relative risk and cost of this is going to depend what you're doing, and what your options for mitigation are (we'll get to that in a moment).

This is akin to the classic asymetrical warfare scenario, in that the attacker only has to get lucky once to compromise your site, and has multiple opportunities to be so, but your defensive strategy relies on the attacker never being lucky at all. And while the attacker is not being lucky, they are still able to cause widespread disruption.

That leaves us with the IP address. In this scenario, checking the IP address presents a barrier to the attacker provided that she is located behind a different router to the intended victim(s). While this is certainly possible, perhaps even likely, it is certainly possible to imagine scenarios where it isn't; large university and corporate campuses, and mobile networks come to mind.

Checking the IP also has several drawbacks. Firstly, it breaks the stateless model of http, and in so doing, creates an abstraction leak between protocol layers. HTTP is an application layer protocol and it isn't supposed care about the transport layer at all, and it certainly isn't supposed to care about any state therein. This may seem trivial, and once upon a time it may very well have been so, but it isn't, and it increasinlgy won't be.

Expecting a user's IP address to remain constant over the course of a session is somewhat wishful thinking. Mobile users, for instance, may be skipping from WiFi hotspot to wifi hotspot via brief sojurns on 3G mobile networks using load balancing proxy servers, for instance. That's a whole heap of possible stuff that your web app needs not care about if you don't want to keep reauthenticating your users every time the state of the underlying network changes.

Fortunately, this entire class of attacks can be mitigated by storing your session information somewhere else. Somewhere that isn't world readable. Whether or not you have some space that is not world readable is dependent on the configuration of the host you're working with, it should be fairly obvious that having such is very much the lower bound for providing any kind of session security.

## SICP Exercise 1.13 – Fun with fibonacci

Exercise 1.13 is really interesting, for 'break out the algebra, oooh Fibonacci!' values of interesting, anyway. I had loads of fun with this, having only a passing awareness of the Fibonacci numbers. By the time I had satisfied myself that I understood the question and could venture an answer of my own, I had been on a fascinating journey through the mathematics of Fibonacci numbers and the Golden Ratio. Anyway, on with the exercise.

Prove that $$Fib(n)$$ is the closest integer to $$\phi^n / \sqrt{5}$$, where $$\phi = (1+\sqrt{5})/2$$.

Hint: Let $$\psi = (1-\sqrt{5})/2$$. Use induction and the definition of the Fibbonaci numbers (see section 1.22) to prove that $$Fib(n) = (\phi^n - \psi^n)/\sqrt{5}$$.

Here's the given definition of the Fib function :

2, 3, 5, 8. Who do do we appreciate ?

$$Fib(n) = \begin{cases} 0 & n = 0\\ 1 & n = 1\\ Fib(n-1) + Fib(n-2) & \text{otherwise} \end{cases}$$

We're also given $$\phi = \frac{\left(1 + \sqrt{5} \right)}{ 2 } \approx 1.6180$$ and $$\phi^2 = \phi + 1$$.

We'll start by proving out the theorem from the hint, and see where we get.

Theorem 1.
$$Fib(n) = \frac{ \phi^n - \psi^n }{\sqrt{5}}$$

Base Cases.

\begin{align*} Fib(0) = 0 = \frac{ \phi^0 - \psi^0 }{ \sqrt{ 5 } } = \frac{ 1 - 1}{ \sqrt{5} } = 0 \\ Fib(1) = 1 = \frac{ \phi^1 - \psi^1 }{ \sqrt{ 5 } } = \frac{ \phi - \psi }{ \sqrt { 5 } } = \frac{ \sqrt{5} }{ \sqrt{ 5 } } = 1 \end{align*}

Induction Step.

There are a couple of ways to approach this step, probably the single simplest is to note that both $$\phi$$ and $$\psi$$ have powers that satisfy the Fibonacci recurrence, which is to say that :

$$\phi^n = \phi^{n-1} + \phi^{n-2}$$ $$\psi^n = \psi^{n-1} + \psi^{n-2}$$

Taking that as a given we can simply do this :

\begin{align*} Fib(n-1) + Fib(n-2) = \frac{ \left( \phi^{n-1} - \psi^{n-1} \right) + \left( \phi^{n-2} - \psi^{n-2} \right)}{\sqrt{5}} \\ = \frac{ \left( \phi^{n-1} + \phi^{n-2} \right) - \left(\psi^{n-1} + \psi^{n-2} \right) }{\sqrt{5}} \\ = \frac{\phi^n - \psi^n}{\sqrt{5}} \quad \quad \square \end{align*}

Neat, huh ? "But wait!", I hear you cry, "That's a very short proof and everything, especially given the lengths of some answers to this exercise, but it rests on an assertion you got off Wikipedia and haven't proved! That's cheating!"

And you've got me there. To get to Q.E.D without relying on the assertion, we note that the Golden Ratio has the property $$\phi^2 = \phi + 1$$ given in section 1.22 and we note that $$\psi^2 = \psi + 1$$ is also the case because they're the postive and negative roots of the polynomial $$\phi^2 - \phi - 1 = 0$$. Given which, we can do an induction thusly :

\begin{align*} Fib(n-1) + Fib(n-2) = \frac{ \left( \phi^{n-1} - \psi^{n-1} \right) + \left( \phi^{n-2} - \psi^{n-2} \right)}{\sqrt{5}} \\ = \frac{ \left( \phi^{n-1} + \phi^{n-2} \right) - \left(\psi^{n-1} + \psi^{n-2} \right) }{\sqrt{5}} \\ = \frac{ \left[\left(\phi+1\right)\phi^{n-2}\right] - \left[ \left(\psi+1\right)\psi^{n-2} \right]}{\sqrt{5}} \\ = \frac{ \left[\left(\phi^2\right)\phi^{n-2}\right] - \left[ \left(\psi^2\right)\psi^{n-2} \right]}{\sqrt{5}}\\ = \frac{\phi^n - \psi^n}{\sqrt{5}} \quad \quad \square \end{align*}

Not only does that get us to Q.E.D without relying on an unproven assertion about reccurrence raltions, it also actually proves the relation. Well, at any rate, the proofs drop out of it :

$$\phi^n = \phi^{n-1} + \phi^{n-2} = \left(\phi+1\right )\phi^{n-2} = \left(\phi^2\right)\phi^{n-2} = \phi^n\\ \psi^n = \psi^{n-1} + \psi^{n-2} = \left(\psi+1\right )\psi^{n-2} = \left(\psi^2\right)\psi^{n-2} = \psi^n$$

Smashing. We've proved three things before breakfast and we're not even done yet.

Since we now know that is the case that

$$Fib(n) = \frac{ \phi^n - \psi^n }{\sqrt{5}}$$

We can state that the relationship between $$Fib(n)$$ and $$\phi^n / \sqrt{5}$$ looks like this :

$$\left| Fib(n) \quad - \quad \frac{\phi^n}{\sqrt{5}} \right| \quad = \quad \left| \frac{ \psi^n }{ \sqrt{5}} \right|$$

Clearly as $$n$$ gets really big, $$\psi^n$$ is going to get really small, so $$\phi^n/\sqrt{5}$$ is definitely a good approximation to $$Fib(n)$$, but what about the "closest integer" ? How are we to define that ? Let's throw caution to the wind and decide that it means

Theorem 2.

$$\left| Fib(n) \quad - \quad \frac{\phi^n}{\sqrt{5}} \right| \quad = \quad \left| \frac{ \psi^n }{ \sqrt{5}} \right| \quad \lt \quad \frac{1}{2}$$

Because if we used $$\leq$$ we could have two equidistant integers, and that doesn't fulfil "closest". Can we stand that up ? Let's look at some values :

 $$n$$ $$\psi^n$$ $$\left|\frac{\psi^n }{\sqrt{5}} \right|$$ $$0$$ $$1$$ $$0.44721...$$ $$1$$ $$\psi$$ $$0.27639...$$ $$...$$ $$...$$ $$...$$

Excellent, $$\left|\frac{\psi^n }{\sqrt{5}} \right|$$ starts out $$\lt \frac{1}{2}$$, and it isn't going to get any bigger.

So yes, since $$\left|\frac{\psi^n }{\sqrt{5}} \right| \lt \frac{1}{2}$$ for $$n \geq 0$$ it follows that $$Fib(n)$$ is indeed the closest integer to $$\frac{ \phi^n}{\sqrt{5}} \quad \square$$.

However, since we've proved that, but not tested it, how about some code to look at the numbers ?

And some results* :

 $$n$$ (fib n) (golden-fib n) (closed-fib n) (psi-n-root-5 n) $$0$$ $$0$$ $$0.4472135955$$ $$0.0$$ $$0.4472135955$$ $$1$$ $$1$$ $$0.72360679775$$ $$1.0$$ $$0.27639320225$$ $$2$$ $$1$$ $$1.17082039325$$ $$1.0$$ $$0.17082039325$$ $$3$$ $$2$$ $$1.894427191$$ $$2.0$$ $$0.105572809$$ $$4$$ $$3$$ $$3.06524758425$$ $$3.0$$ $$0.0652475842499$$ $$5$$ $$5$$ $$4.95967477525$$ $$5.0$$ $$0.0403252247502$$ $$6$$ $$8$$ $$8.0249223595$$ $$8.0$$ $$0.0249223594996$$ $$7$$ $$13$$ $$12.9845971347$$ $$13.0$$ $$0.0154028652506$$ $$8$$ $$21$$ $$21.0095194942$$ $$21.0$$ $$0.00951949424901$$ $$9$$ $$34$$ $$33.994116629$$ $$34.0$$ $$0.0058833710016$$ $$10$$ $$55$$ $$55.0036361232$$ $$55.0$$ $$0.00363612324741$$ $$11$$ $$89$$ $$88.9977527522$$ $$89.0$$ $$0.00224724775419$$ $$12$$ $$144$$ $$144.001388875$$ $$144.0$$ $$0.00138887549323$$ $$13$$ $$233$$ $$232.999141628$$ $$233.0$$ $$0.000858372260957$$ $$14$$ $$377$$ $$377.000530503$$ $$377.0$$ $$0.000530503232271$$ $$15$$ $$610$$ $$609.999672131$$ $$610.0$$ $$0.000327869028685$$ $$16$$ $$987$$ $$987.000202634$$ $$987.0$$ $$0.000202634203586$$ $$17$$ $$1597$$ $$1596.99987477$$ $$1597.0$$ $$0.000125234825099$$ $$18$$ $$2584$$ $$2584.0000774$$ $$2584.0$$ $$7.73993784866e-05$$ $$19$$ $$4181$$ $$4180.99995216$$ $$4181.0$$ $$4.78354466128e-05$$ * I actually used a python script to generate the above table. as I haven't got quite so handy with Scheme yet and I was too lazy to type all the table HTML, but the results are within a seriously insignificant distance of one another, on the order of .00000000002 or thereabouts.

## SICP Exercise 1.12 – Pascal’s triangle recursively

Exercise 1.12. The following pattern of numbers is called Pascal's triangle.

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1


The numbers at the edge of the triangle are all 1, and each number inside the triangle is the sum of the two numbers above it. Write a procedure that computes elements of Pascal's triangle by means of a recursive process.

It took me a few minutes to grok this one, it helped to draw the triangle like this :

1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 

Which makes the "two numbers above" a bit clearer in terms of their actual relative positions. Once we've done that it is easy to see that :

$$P(col,row) = \begin{cases} 1 & col = 0 \\ 1 & col = row \\ P(row - 1, col - 1) + P(row - 1, col) & \text{otherwise} \end{cases}$$

Which translates nice and easy into the Scheme function :

N.B though that since we do absolutely no sanity checking on the input, it is easy to get garbage output, or even to end up with an infinite recursion. So while this is technically a correct answer, it lacks a certain rigour.

## SICP Exercise 1.11 – Iterative function f(n)

Exercise 1.11. A function f is defined by the rule that f(n) = n if $$n \lt 3$$ and $$f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3)$$ if $$n \geq 3$$. Write a procedure that computes f by means of a recursive process. Write a procedure that computes f by means of an iterative process.

OK, let's unwrap that a little so we can see what we're doing, grab your pencil (or throw in some Mathjax) and we can write that out a wee bit nicer :

$$F(n) = \begin{cases} \\ n, & n \lt 3 \\ F( n - 1) + 2F( n - 2) + 3F( n - 3 ), & n \geq 3 \end{cases}$$

That's a bit better. That looks very similar to the Fibonacci recurrence we were looking at earlier in the section, which is :

$$Fib(n) = \begin{cases} 0 & n = 0\\ 1 & n = 1\\ Fib(n-1) + Fib(n-2) & \text{otherwise} \end{cases}$$

If we recall that the authors gave two versions of the Fib function in Scheme, a recursive one, and an iterative one, that should give us the syntactical structure we need to solve this problem, so let's have a look at them. This is the recursive version :

And this is the iterative (by way of tail recursion) version :

Let's do the easy thing first, and write it out directly as a tree recursive function so that we can see the similarity with Fib.

If you're using Racket or a similar Scheme environment, you can confirm this is recursive by using a trace directive, (trace F), yielding the following massive trace, where the indentation level indicates the depth of recursion :

> (F 7)
>(F 7)
> (F 6)
> >(F 5)
> > (F 4)
> > >(F 3)
> > > (F 2)
< < < 2
> > > (F 1)
< < < 1
> > > (F 0)
< < < 0
< < <4
> > >(F 2)
< < <2
> > >(F 1)
< < <1
< < 11
> > (F 3)
> > >(F 2)
< < <2
> > >(F 1)
< < <1
> > >(F 0)
< < <0
< < 4
> > (F 2)
< < 2
< <25
> >(F 4)
> > (F 3)
> > >(F 2)
< < <2
> > >(F 1)
< < <1
> > >(F 0)
< < <0
< < 4
> > (F 2)
< < 2
> > (F 1)
< < 1
< <11
> >(F 3)
> > (F 2)
< < 2
> > (F 1)
< < 1
> > (F 0)
< < 0
< <4
< 59
> (F 5)
> >(F 4)
> > (F 3)
> > >(F 2)
< < <2
> > >(F 1)
< < <1
> > >(F 0)
< < <0
< < 4
> > (F 2)
< < 2
> > (F 1)
< < 1
< <11
> >(F 3)
> > (F 2)
< < 2
> > (F 1)
< < 1
> > (F 0)
< < 0
< <4
> >(F 2)
< <2
< 25
> (F 4)
> >(F 3)
> > (F 2)
< < 2
> > (F 1)
< < 1
> > (F 0)
< < 0
< <4
> >(F 2)
< <2
> >(F 1)
< <1
< 11
<142
142
>


Ouch. See how many times (F 0), (F 1) and (F 2) are getting called there ? That's a whole bucket load of redundant computation.

With our new function, we have three state variables instead of two, but we're still only accumulating into one and discarding the one at the other end of our input list. So to speak. In the iterative solution to the Fib function we performed some work on a and b and then we discarded b. With our new function the solution is exactly the same, onlythis time, we perform some computation on a,b, and c and then we discard c. The counter mechanism remains exactly the same.

We can confirm that this procedure is iterative by using a trace directive, (trace F-iter), yielding :

> (F 7)
>(F-iter 2 1 0 7)
>(F-iter 4 2 1 6)
>(F-iter 11 4 2 5)
>(F-iter 25 11 4 4)
>(F-iter 59 25 11 3)
>(F-iter 142 59 25 2)
>(F-iter 335 142 59 1)
>(F-iter 796 335 142 0)
<142
142
>


So we can see that even though (F-iter) calls itself repeatedly, the tail call is optimised and the depth of the call stack remains constant.

Built With Bootstrap