Sunday, January 27, 2013

On Tail Recursion and F#

"To iterate is human, to recurse divine."
-- L Peter Deutsch

Once a non-functional programmer starts really using declarative functional program one of the first questions they ask is "How do I iterate?".  This is very understandable, I mean most programmers spend a lot of time coming up with a list of steps that an automaton should do in a loop.  It is one of the very reason that we create programs.

Prepare to have your mind blown, in declarative programming you do not have iterators.  Instead what you have is recursion.  That right, that topic way back in introduction to college level mathematics is one of the corner stones of declarative functional programming, but not just any old recursion, no no, tail recursion!

What is tail recursion?  Why do we need it?  (Insert other questions here.)

Lets start with an example of recursion.  We'll create a recursion function in F# which sums up a list of numbers or fold the numbers if you prefer.  The function will simply read the head of the list and add the tail to it, passing the tail as the new list in a recursive call.

function (list of numbers)
  1. x <- head="" li="" list="" nbsp="" of="">
  2. return x + call function with tail of list


In F#

> let rec mySum x =
-   match x with
-   | [] -> 0
-   | head :: tail -> head + mySum tail
- ;;

val mySum : int list -> int

A very simple function which does the same thing as List.sum, lets test it with some sample list of numbers.

> mySum [1 .. 10000];;
val it : int = 50005000
> mySum [1 .. 10000000];;
Stacktrace:

  at (wrapper managed-to-native) System.Reflection.MonoMethod.InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&) < 0 x ffffffff >
  at System.Reflection.MonoMethod.Invoke (object,System.Reflection.BindingFlags,System.Reflection.Binder,object[],System.Globalization.CultureInfo) < 0 x 0012b >
  at System.MonoType.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,string[]) < 0 x 00454 >
  at System.Reflection.Emit.TypeBuilder.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,string[]) < 0 x 00040 >
  at System.Type.InvokeMember (string,System.Reflection.BindingFlags,System.Reflection.Binder,object,object[],System.Globalization.CultureInfo) < 0 x 0002c >

... remove details to save space  

  at Microsoft.FSharp.Compiler.Interactive.Shell.main () < 0 x 0198b >
  at Microsoft.FSharp.Compiler.Interactive.Shell.MainMain (string[]) < 0 x 0000b >
  at (wrapper runtime-invoke) .runtime_invoke_int_object (object,intptr,intptr,intptr) < 0 x ffffffff >

Native stacktrace:

/usr/bin/cli() [0x80e16fc]
/usr/bin/cli() [0x81209fc]
/usr/bin/cli() [0x806094d]
[0xd1f40c]
/usr/bin/cli() [0x812099b]
/usr/bin/cli() [0x806094d]
[0xd1f40c]
[0x64c38bb]
[0x64c38c4]
... remove duplicates

[0x64c38c4]
[0x64c38c4]
[0x64c38c4]
[0x64c38c4]

Debug info from gdb:

Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No threads.

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Aborted (core dumped)

Yep, something really bad happen, but what?  We'll looking at the stack trace.  It seems that we were in something like an endless loop.  When we have an endless loop what normally stops the program is a stack overflow and in fact that is the case this time.

How did this happen?  Lets see what the call stack would have looked like for the return part of the function.
  • 1 + mySum [2 .. 10000000]
  • 1 + 2 + mySum[3 .. 10000000]
  • 1+ 2 + 3 + mySum[4 .. 10000000]
  • 1+ 2 + 3 + 4 + mySum[5 .. 10000000]
  • ...
It seems that the call stack is being filled with the unresolved values of the next call to mySum.  What we would have liked is if the call stack could just hold on to the sum of all the previous calls.  To do that we would have to pass a working total along with each call then we could just add to the working total and remove that call from the call stack.  That is what is called a tail recursive function.

A tail recursive sum in F# would look like this:

> let tailSum x =
-   let rec aux total n = 
-     match n with
-     | [] -> total
-     | head :: tail -> aux (total + head) tail
-   aux 0 x
- ;;

val tailSum : int list -> int

Lets test this version.

> tailSum [1 .. 10000000];;
val it : int = -2004260032

Hmmm, well this time the issue is not a stack overflow, but the limitation of the data type int.  We can change the data to use random numbers that should even out close to 0 to test this out.  We'll do so by mapping a function which gives random values against a list of 10,000,000 values.

> let rnd = new System.Random()
- let test = List.init 10000000 (fun _ -> rnd.Next (-50, 51))
- ;;

val rnd : System.Random
val test : int list =
  [-28; 40; -4; -30; 6; -3; -10; 1; 4; -29; -6; 18; -47; 47; 45; 29; -14; 33;
   -20; -45; 37; 27; -50; 38; -37; 28; -30; 17; -39; 9; 23; 21; -48; 19; 4;
   -44; 33; -7; -1; 46; 37; -16; 15; 36; -1; 18; -41; 20; -37; -25; 26; -3; 3;
   -22; -1; 2; 38; 28; 17; -1; 14; 47; -48; 8; -16; -40; -48; -32; -3; -24; 19;
   16; 19; -45; -21; -11; -27; 22; -12; 12; 40; -7; -50; 12; 45; 26; 19; -6;
   -16; -7; -47; 15; 47; -27; -24; -29; 15; 10; 6; -41; ...]

Now we have our test data, lets test our function.

> tailSum [1 .. 10000];;
val it : int = 50005000
> tailSum test;;      
val it : int = -55789

For the price of passing along a working total, we now have the ability to process really large sets of data.

Monday, January 21, 2013

Finding Islands of Data

"But the universe, as a collection of finite things, presents itself as a kind of island situated in a pure vacuity to which time, regarded as a series of mutually exclusive moments, is nothing and does nothing."

When I first started using SQL Server 2008 one of the most interesting things I stumbled across was the OVER clause.  At first when I started using I enjoyed that I could use aggregate functions without the need for a GROUP BY clause, in fact I aggregate over different sets of data in the same SELECT statement!

Having grown more use to the idea of thinking in sets, I now see OVER clause and windowing functions in general are very valuable tools in the SQL tool belt.

Using the following:

CREATE TABLE #t1 (
  col1 int NOT NULL
);
GO

INSERT INTO #t1
  VALUES (2),(3),(4),(9),(11),(12),(13),(14),(22),(23),(45),(46);
GO

We create a table with one column and some integer values.

With just this simple data set we can demonstrate how islands of day can be found.  What is an island of data?  An island of data is simply a gap in the data.  Say you get a data feed of security prices, but some of the data is incomplete for some days (maybe a security is thinly traded or maybe whomever you are getting the feed from has some issues).  These gaps in the data form what is called data islands.

Looking at our simple data we can see a few data islands.

col1
2
3
4
9
11
12
13
14
22
23
45
46

The different data islands have been highlighted in different colors.  It would be nice if the data was a way that the different colors had different group identifiers so that we do a GROUP BY (or something similar) on the identifier.  As Polya shows us in How to Solve It, if we can solve an auxiliary problem perhaps we can solve this one.

If we use the ROW_NUMBER window function and ORDER BY the value in col1 we would get the following.

SELECT col1, ROW_NUMBER() OVER(ORDER BY col1) AS rowid
  FROM #t1;
GO

col1 rowid
2 1
3 2
4 3
9 4
11 5
12 6
13 7
14 8
22 9
23 10
45 11
46 12

All we did was simply return the ROW_NUMBER assigning the number based on the values in col1.  Thus 2 is assigned 1 since it is the lowest value, 9 is assigned 4 since there are three values lower than it, 46 is assigned 12 since all eleven other values are lower than it, and so on.  Each ROW_NUMBER is assigned in relation to all the other values it is based on, in this case the number is assigned based on the value in col1.

If we look closer we will find that the difference between the rowid and col1 holds constant for the whole group meaning that we can simple subtract the rowid and col1 to get a value that we can GROUP BY as an identifier.

SELECT col1, col1 - ROW_NUMBER() OVER(ORDER BY col1) AS groupid
  FROM #t1;
GO

col1 groupid
2 1
3 1
4 1
9 5
11 6
12 6
13 6
14 6
22 13
23 13
45 34
46 34

Now we have our identifier to GROUP BY we can do the following to get the range of each island of data.

SELECT DISTINCT
   MIN(col1) OVER(PARTITION BY groupid) AS start_range
  ,MAX(col1) OVER(PARTITION BY groupid) AS end_range
  FROM (
    SELECT 
       col1
      ,col1 - ROW_NUMBER() OVER(ORDER BY col1) AS groupid
      FROM #t1
  ) AS x;
GO

start_rangeend_range
24
99
1114
2223
4546

The PARTITION BY in the OVER clause is very similar to a GROUP BY, the difference being that the PARTITION BY takes the value in the current row and looks at all the related values in the current window, or simply in the case where col1 is equal to 3 the MIN ans MAX are found for all of the col1's with the related groupid of 1, which would be the value 2 ans 4.  Since the OVER clause is looked at for each row of data we need to use a DISTINCT to remove the duplicates.

There you have it, you can find islands of data by using simple windowing functions.

I've placed this code on SQL Fiddle here.


Saturday, January 12, 2013

On Immutablity

"And yet in our world everybody thinks of changing humanity, but nobody thinks of changing himself."
Three Methods of Reform by Leo Tolstoy

"When Moses was alive, these pyramids were a thousand years old. Here began the history of architecture. Here people learned to measure time by a calendar, to plot the stars by astronomy and chart the earth by geometry. And here they developed that most awesome of all ideas - the idea of eternity."
Walter Cronkite

In my best Yogi Berra, eternity is a rather long time, even longer than now.  One finds very quickly when learning about functional programming the idea of immutability.  In functional programming one does not find variables in the true since of the word, instead one hears of binding to a value or value functions.  Why is this?  What does this have to do with immutability?

Much like how Immanuel Kant's categorical imperative forbids lying, functional programming forbids mutating values.  This means you cannot say that foo is equal to 5 at one point in time and later on say that now foo is equal to 6.  You have just lied, which one is it, 5 or 6?  Even worst is it now 7?  Immutability solves this be not allowing you to change the value of foo once you say that foo is equal to 5.

One may at this time being saying, "How is immutability useful?  I have been programming for X number of years and that whole time I have been using variables and nothing bad as been happen!"  If we travel back in time a few decades, we can hear a similar argument about GOTO statements.

Why immutability now?  Well, as Uncle Bob points out in his first post about functional programming there is a tide wave coming.  Why?  Hardware designers have chosen to increase the number of processors instead of increasing the speed of a single processor.  What does this mean?  If us programmers want to be able to make use of these ever increasing number of processors, we'll need to be able to make our programs work in multi-core environments!

The easiest way to work in a multi-core environment would be to have functionality that does not have any side effects and thus could be reordered.  Hmmm, how does one do that?  Well if I had data structures that could not be mutated then those untrustworthy functions would not be able to produce side effects.  Hmmm, this sounds like immutability.

So what does immutability look like in F#?


> let foo = 5;;

val foo : int = 5

> foo <- 6="6" font="font">

  foo <- 6="6" font="font">
  ^^^^^^^^

/home/mike/stdin(2,1): error FS0027: This value is not mutable


We cannot even reassign foo to the value of 6!  That my friends is immutability.

Saturday, January 5, 2013

Sieve of Eratosthenes in F#

"God may have not play dice with the universe, but something strange is going on with the prime numbers."
Paul ErdÅ‘s

"Mathematicians  have tried in vain to this day to discover some order in the sequence of prime numbers, and we have reason to believe that it is a mystery into which the human mind will never penetrate."
Leonhard Euler

Before Caesar, before one anno domini, before the internet, there was the library of Alexander and  Eratosthenes.  Eratosthenes of Cyrene was the third librarian of the library of Alexander.  Among other things, he was the first person to calculate the circumference and tilt of the Earth!

If that was not enough, Eratosthenes summoned a simple prime sieve to find prime numbers.  The Sieve of Eratosthenes works in the following way:
  1. list integers from 2 to N
  2. initially mark 2 as prime (it is prime)
  3. cross off all of the integers which are increments of the marked number apart
  4. take the next non-crossed off number and mark it as prime
  5. repeat step 3 until all numbers are either crossed off or marked as prime

Visually here is what the sieve looks like:





skip to the end



We mark the prime numbers with green and those that have been crossed off with red.

What would the Sieve of Eratosthenes look like in F#?  Well we would want to have a list created from 2 to N, then a function which would take the head of the list and filter out the numbers from the list that are divisible by the head (I know this is not exactly the same as what is above) and loop back around to the top of the function with the new filtered list.


> let sieve n =
-   let rec aux list =      
-     match list with
-     | [] -> []
-     | hd :: tl -> hd :: aux (list |> List.filter (fun x -> x % hd <> 0) )
-   aux [2 .. n]
- ;;

val sieve : int -> int list


Let's test it (I'll blog about using FsUnit at a latter date, so we'll just use poking around testing for now).


> sieve 3;;
val it : int list = [2; 3]
> sieve 11;;
val it : int list = [2; 3; 5; 7; 11]
> sieve 100;;
val it : int list =
  [2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
   73; 79; 83; 89; 97]
> sieve 2;;  
val it : int list = [2]
> sieve 1;;
val it : int list = []

Looks like it works.

Friday, January 4, 2013

Using SQL Server's sp_MSforeachtable

If you just want the answer, here it is:

use MyDatabase

exec sp_MSforeachtable
  @command1 = N'PRINT ''?''',
  @whereand = N'AND o.name LIKE ''T_TablesWeWant_%'''

Another example (WHICH WILL DROP TABLES!!!!)

use MyDatabase

exec sp_MSforeachtable
  @command1 = N'DROP TABLE ?',
  @whereand = N'AND o.name LIKE ''T_TablesWeWantToDrop_%'''

Actual context

The most well documented undocumented stored procedure in SQL Server would have to be the two foreach stored procedures  sp_MSforeachdb and sp_MSforeachtable.

Being a lazy programmer who is currently working with a vended product that highly leverages SQL Server for it functionality, I did not want to physically go into each table in the tool to add the "hook" columns the tool needs to add to each table that it uses.  Instead I would much rather script out adding in the "hook" columns.  As one would assume the vendor did not have a script for this, much like learning that people in Columbus' time did not believe that the Earth was flat.

Luckily knowing that SQL Server had a few foreach stored procs for just this kind of scripting, Similar to von Bellingshausen, I hit the Google searching high and low.  I came across a very good write up at Database Journal on how sp_MSforeachtable sings.  It even included examples doing things similar to what I was looking to do.

I soon found that sp_MSforeachtable has an argument for commands (@command1) and another for filtering the tables that the commands would run against (@whereand).

I did the following quick test to see if I understood stored proc correctly (with some names changed and details left out):

use MyDatabase

exec sp_MSforeachtable
  @command1 = N'PRINT ''?''',
  @whereand = N'AND name LIKE ''T_TablesWeWant_%'''

This gave the following output:

Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'name'.

Seeing that the write up was from 2004, I assumed that something most likely changed in SQL Server 2012.  One of the difficulties related to using undocumented features, in fact that term is used as an euphemism for software bugs.

I figured I would use the force and read the source (note ... is code I do not think is important for the issue at hand, please who knows I maybe violating some term of contract if I show the whole thing).

exec sp_helptext 'sp_MSforeachtable'


create proc sys.sp_MSforeachtable

   @command1 nvarchar(2000), 
   @replacechar nchar(1) = N'?',
   @command2 nvarchar(2000) = null,
   @command3 nvarchar(2000) = null,
   @whereand nvarchar(2000) = null,
   @precommand nvarchar(2000) = null,
   @postcommand nvarchar(2000) = null

as

         ...



          /* Create the select */

   exec(N'declare hCForEachTable cursor global for select ''['' + REPLACE(schema_name(syso.schema_id), N'']'', N'']]'') + '']'' + ''.'' + ''['' + REPLACE(object_name(o.id), N'']'', N'']]'') + '']'' from dbo.sysobjects o join sys.all_objects syso on o.id = syso.object_id '

         ...


          return @retval

Beholding the FROM clause I knew exactly what I need to do for the @whereand to work.

use MyDatabase

exec sp_MSforeachtable
  @command1 = N'PRINT ''?''',
  @whereand = N'AND o.name LIKE ''T_TablesWeWant_%'''


Giving the output I was looking for (with some names changed and details left out again):

[dbo].[T_TablesWeWant_ACCOUNTING_ACCOUNT]
[dbo].[T_TablesWeWant_ACCOUNTING_TRANSACTION]
[dbo].[T_TablesWeWant_ACCOUNTING_TRANSACTION_TYPE]
...

Yep, using the alias o for the name column is all that I need to do to get sp_MSforeachtable to work in SQL Server 2012.