Sunday, January 27, 2013

On Tail Recursion and F#

"To iterate is human, to recurse divine."
-- L Peter Deutsch

Once a non-functional programmer starts really using declarative functional program one of the first questions they ask is "How do I iterate?".  This is very understandable, I mean most programmers spend a lot of time coming up with a list of steps that an automaton should do in a loop.  It is one of the very reason that we create programs.

Prepare to have your mind blown, in declarative programming you do not have iterators.  Instead what you have is recursion.  That right, that topic way back in introduction to college level mathematics is one of the corner stones of declarative functional programming, but not just any old recursion, no no, tail recursion!

What is tail recursion?  Why do we need it?  (Insert other questions here.)

Lets start with an example of recursion.  We'll create a recursion function in F# which sums up a list of numbers or fold the numbers if you prefer.  The function will simply read the head of the list and add the tail to it, passing the tail as the new list in a recursive call.

function (list of numbers)
  1. x <- head="" li="" list="" nbsp="" of="">
  2. return x + call function with tail of list


In F#

> let rec mySum x =
-   match x with
-   | [] -> 0
-   | head :: tail -> head + mySum tail
- ;;

val mySum : int list -> int

A very simple function which does the same thing as List.sum, lets test it with some sample list of numbers.

> mySum [1 .. 10000];;
val it : int = 50005000
> mySum [1 .. 10000000];;
Stacktrace:

  at (wrapper managed-to-native) System.Reflection.MonoMethod.InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&) < 0 x ffffffff >
  at System.Reflection.MonoMethod.Invoke (object,System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) < 0 x 0012b >
  at System.MonoType.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,string[]) < 0 x 00454 >
  at System.Reflection.Emit.TypeBuilder.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,string[]) < 0 x 00040 >
  at System.Type.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Globalization.CultureInfo) < 0 x 0002c >

... remove details to save space  

  at Microsoft.FSharp.Compiler.Interactive.Shell.main () < 0 x 0198b >
  at Microsoft.FSharp.Compiler.Interactive.Shell.MainMain (string[]) < 0 x 0000b >
  at (wrapper runtime-invoke) .runtime_invoke_int_object (object,intptr,intptr,intptr) < 0 x ffffffff >

Native stacktrace:

/usr/bin/cli() [0x80e16fc]
/usr/bin/cli() [0x81209fc]
/usr/bin/cli() [0x806094d]
[0xd1f40c]
/usr/bin/cli() [0x812099b]
/usr/bin/cli() [0x806094d]
[0xd1f40c]
[0x64c38bb]
[0x64c38c4]
... remove duplicates

[0x64c38c4]
[0x64c38c4]
[0x64c38c4]
[0x64c38c4]

Debug info from gdb:

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No threads.

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Aborted (core dumped)

Yep, something really bad happen, but what?  We'll looking at the stack trace.  It seems that we were in something like an endless loop.  When we have an endless loop what normally stops the program is a stack overflow and in fact that is the case this time.

How did this happen?  Lets see what the call stack would have looked like for the return part of the function.
  • 1 + mySum [2 .. 10000000]
  • 1 + 2 + mySum[3 .. 10000000]
  • 1+ 2 + 3 + mySum[4 .. 10000000]
  • 1+ 2 + 3 + 4 + mySum[5 .. 10000000]
  • ...
It seems that the call stack is being filled with the unresolved values of the next call to mySum.  What we would have liked is if the call stack could just hold on to the sum of all the previous calls.  To do that we would have to pass a working total along with each call then we could just add to the working total and remove that call from the call stack.  That is what is called a tail recursive function.

A tail recursive sum in F# would look like this:

> let tailSum x =
-   let rec aux total n = 
-     match n with
-     | [] -> total
-     | head :: tail -> aux (total + head) tail
-   aux 0 x
- ;;

val tailSum : int list -> int

Lets test this version.

> tailSum [1 .. 10000000];;
val it : int = -2004260032

Hmmm, well this time the issue is not a stack overflow, but the limitation of the data type int.  We can change the data to use random numbers that should even out close to 0 to test this out.  We'll do so by mapping a function which gives random values against a list of 10,000,000 values.

> let rnd = new System.Random()
- let test = List.init 10000000 (fun _ -> rnd.Next (-50, 51))
- ;;

val rnd : System.Random
val test : int list =
  [-28; 40; -4; -30; 6; -3; -10; 1; 4; -29; -6; 18; -47; 47; 45; 29; -14; 33;
   -20; -45; 37; 27; -50; 38; -37; 28; -30; 17; -39; 9; 23; 21; -48; 19; 4;
   -44; 33; -7; -1; 46; 37; -16; 15; 36; -1; 18; -41; 20; -37; -25; 26; -3; 3;
   -22; -1; 2; 38; 28; 17; -1; 14; 47; -48; 8; -16; -40; -48; -32; -3; -24; 19;
   16; 19; -45; -21; -11; -27; 22; -12; 12; 40; -7; -50; 12; 45; 26; 19; -6;
   -16; -7; -47; 15; 47; -27; -24; -29; 15; 10; 6; -41; ...]

Now we have our test data, lets test our function.

> tailSum [1 .. 10000];;
val it : int = 50005000
> tailSum test;;      
val it : int = -55789

For the price of passing along a working total, we now have the ability to process really large sets of data.