Sunday, April 28, 2013

Primality by Trial Division OR TDD Example in F# using FsUnit

"Youth, beauty, wisdom, courage – all
That happiness and prime can happy call."

-- Shakespeare All's Well That Ends Well
Act II, Scene I, Lines 182-183

I find RosettaCode to be a valuable resource of ideas for my daily Code Kata.  Recently I came across an entry in RosettaCode that did not have a F# example.  Once I finished my kata I mended that issue.

Problem statement (from RosettaCode):

Write a boolean function that tells whether a given integer is prime. Remember that 1 and all non-positive numbers are not prime.  Use trial division. Even numbers over two may be eliminated right away. A loop from 3 to √n will suffice, but other loops are allowed.



Step 1 (2 isPrime = true):


First step in solving the prime, if 2 is given in the input isPrime should return true.


Test Case:

[<Test>]
let ``Validate that 2 is prime`` () =
  isPrime 2 |> should equal true

Code (to pass):

let isPrime x = true

It may look dumb, but this is all that is needed to pass the test.

Step 2 (4 isPrime = false):

Test Case:

[<Test>]
let ``Validate that 4 is not prime`` () =
  isPrime 4 |> should equal false

Code (to pass):

let isPrime x =
  match x with
  | 2 -> true
  | x when x % 2 = 0 -> false

The mod 2, might be jumping the gun a little bit, but it follows the rules is just enough code to pass the tests.

Note that the match in this state will give a warning (warning FS0025: Incomplete pattern matches on this expression. For example, the value '0' may indicate a case not covered by the pattern(s). However, a pattern rule with a 'when' clause might successfully match this value.), we'll address that later.

Step 3 (3 isPrime = true):

Test Case:

[<Test>]
let ``Validate that 3 is prime`` () =
  isPrime 3 |> should equal true

Code (to pass):

let isPrime x =
  match x with
  | 2 -> true
  | x when x % 2 = 0 -> false
  | _ -> true

Again, this might seem dumb, but it is all we need to pass the test and we've fixed the warning from before.

Step 4 (9 isPrime = false):


Test Case:

[<Test>]
let ``Validate that 9 is not prime`` () =
  isPrime 9 |> should equal false

Code (to pass):

let isPrime x =
  match x with
  | 2 | 3 -> true
  | x when x % 2 = 0 -> false
  | _ -> false

Again, a bit dumb, but this was all that is needed to pass the tests.

Step 5 (5 isPrime = true):


Test Case:

[<Test>]
let ``Validate that 5 is prime`` () =
  isPrime 5 |> should equal true

Code:

let isPrime x =
  match x with
  | 2 | 3 -> true
  | x when x % 2 = 0 -> false
  | _ ->
     let rec aux i =
        match i with
        | i when x % i = 0 -> false
        | i when x < i*i -> true
        | _ -> aux (i+2)
     aux 3

This does not follow TPP, but RosettaCode gave us the algorithm they wanted us to use.  Since we followed TDD, we know that we did not break anything by putting a larger block of code in this iteration.

Last Step (277 isPrime = true):


Test Case:

[<Test>]
let ``Validate that 277 is prime`` () =
  isPrime 277 |> should equal true

Code:

The code is exactly the same as above, this test case was just a double check to make sure the solution scaled well.

Full Example:

Entry in RosettaCode (as of 28 April 2013):

http://rosettacode.org/wiki/Primality_by_trial_division#F.23

Code on TryFSharp.org:

Code:

open NUnit.Framework
open FsUnit

let isPrime x =
  match x with
  | 2 | 3 -> true
  | x when x % 2 = 0 -> false
  | _ ->
     let rec aux i =
        match i with
        | i when x % i = 0 -> false
        | i when x < i*i -> true
        | _ -> aux (i+2)
     aux 3

[<Test>]
let ``Validate that 2 is prime`` () =
  isPrime 2 |> should equal true

[<Test>]
let ``Validate that 4 is not prime`` () =
  isPrime 4 |> should equal false

[<Test>]
let ``Validate that 3 is prime`` () =
  isPrime 3 |> should equal true

[<Test>]
let ``Validate that 9 is not prime`` () =
  isPrime 9 |> should equal false

[<Test>]
let ``Validate that 5 is prime`` () =
  isPrime 5 |> should equal true

[<Test>]
let ``Validate that 277 is prime`` () =
  isPrime 277 |> should equal true

Thank you, F# Weekly for all the links to this blog!

Sunday, April 21, 2013

Finding Perfect Numbers

"Behoves me keep at utterance. I am perfect"
-- Shakespeare Cymbeline
Act III, Scene I, Line 73

Perfection is rare, one can even say perfection is anti-science.  For with perfection there is nothing more to investigate, nothing to ponder, nothing to study.

Perfect Numbers

In mathematics we have a class of numbers which are called perfect.  These numbers have the property that the some of their proper divisors is equal to them.


We can break this algorithm down into three steps.

Step 1: Find divisors




Given the example of n = 6

In Haskell

filter (\x -> mod 6 x == 0) [1..5]

In F#

[1..5] |> List.filter (fun x -> 6 % x = 0)

Step 2: Sum divisors


Still working with n = 6, we have [1, 2, 3] to sum.

In Haskell

sum [1, 2, 3]

In F#

[1; 2; 3] |> List.sum

Step 3: Compare


In Haskell

6 == 6

In F#

6 = 6

Putting it together

In Haskell (on codepad)

isPerfect :: Int -> Bool
isPerfect n =
  sum (
    filter (\x -> mod n x == 0) [1..(n-1)]) == n

main = print $ isPerfect 496

> true

In F# (on tryfsharp)

let isPerfect n =
  let d = [1..(n-1)] 
            |> List.filter (fun x -> n % x = 0)
            |> List.sum
  d = n
;;

isPerfect 496;;

> true


Sunday, April 14, 2013

Don't Mind the Gap

"That I might sleep out this great gap of time
My Antony is away."
-- Shakespeare's Antony and Cleopatra
Act I, Scene V, Lines 5-6


A very common problem with data sets is missing data.  It is such a common problem that we have well know patterns to identify missing data.

How to Find Data Gaps

A data gap is when data should be found either in some type of order or range or ... but is not.  Such as trade data for a security, the security should have some activity everyday, so if there are dates missing and open, close, high, and low then there is a data gap.



In the data set above, we see that 7, 8, 13, and 14 are missing.  In order to find what is missing we need to sort, remove duplicate records, group the data by a range id, and then find the min and max for the range id.



Given the algorithm above we can solve it with the following steps.

Step 1: sort



[1..5] @ [11..13] @ [33..40]
  |> List.sort

> val it : int list =
  [1; 2; 3; 4; 5; 11; 12; 13; 33; 34; 35; 36; 37; 38; 39; 40]

Step 2: dedup


[1..5] @ [11..13] @ [33..40]
  |> Seq.distinct

> val it : seq = seq [1; 2; 3; 4; ...]

Step 3: assign group ids


[1..5] @ [11..13] @ [33..40]
  |> Seq.mapi(fun i x -> (x, x - i))

> val it : seq = seq [(1, 1); (2, 1); (3, 1); (4, 1); ...]

Step 4: group



seq [(1, 1); (2, 1); (3, 1); (4, 1)]
  |> Seq.groupBy snd

> val it : seq> =
  seq [(1, seq [(1, 1); (2, 1); (3, 1); (4, 1)])]

Step 5: find group's min and max




seq [(1, seq [(1, 1); (2, 1); (3, 1); (4, 1)])]
  |> Seq.map (fun (k, v) -> (fst (Seq.min v), fst (Seq.max v)))

> val it : seq = seq [(1, 4)]

Finding Data Gaps with F# (tryfsharp)

let gapper x =
  x
    |> List.sort
    |> Seq.distinct
    |> Seq.mapi
      (fun i x -> (x, x - i))
    |> Seq.groupBy snd
    |> Seq.map
      (fun (k, v) -> 
        (fst (Seq.min v),
         fst (Seq.max v)))
    |> List.ofSeq

Examples


gapper ([1..5] @ [11..13] @ [33..40] @ [20] @ [2..5] @ [2]);;

> val it : (int * int) list = [(1, 5); (11, 13); (20, 20); (33, 40)]

gapper ([1..5] @ [2..9] @ [2..5] @ [2] @ [10..13] @ [102..110]);;

> val it : (int * int) list = [(1, 13); (102, 110)]

I blogged about this problem before in how to Find Islands of Data using T-SQL after reading about the problem in Itzik Ben-Gan's book High-Performance T-SQL Using Windowing Functions (which is really good).  

Finding Data Gaps with T-SQL (SQL Fiddle)

SELECT DISTINCT
   MIN(col1) OVER(PARTITION BY groupid) AS start_range
  ,MAX(col1) OVER(PARTITION BY groupid) AS end_range
  FROM (
    SELECT 
       col1
      ,col1 - ROW_NUMBER() OVER(ORDER BY col1) AS groupid
      FROM dbo.t1
  ) AS x
;

Sunday, April 7, 2013

On CPS - Part 2

"I was employed in passing to and fro
About relieving of the sentinels."
-- Shakespeare, Henry VI Part I
Act II Scene I Lines 69-70


Continuous passing style (CPS) is way of  programming in which you take direct control of the control flow by passing where the called function will return.


In the example above, the "function" is passed "values" to operate on along with the "continue function" it will pass control to after it is done (hence it does not return directly to the caller).  Using this style of programming one can construct the call stack.

Why would anyone want to do this?  Say you want to execute a recursive function obtaining some value really deep in the recursion.  The Fibonacci sequence is a great example of this type of problem.  Say you want to find the value of the 100,000th number in the Fibonacci sequence (well maybe this is pushing it for what I am going to show in this blog post, but you get the idea).

Non-CPS example (on tryfsharp.org):

let rec fib =
  function
  | 0 | 1 -> 1
  | n -> fib (n-2) + fib (n-1);;


If you were going to look for the 100,000th Fibonacci number using the standard recursive solution, you would get a stack overflow.  What you would like to do is use the heap instead of the stack, but how does one do that?  This is where CPS comes into play.  If you are working on a system which supports it, your continuations will go into the heap instead of being placed on the stack.

CPS example (on tryfsharp.org):


let rec fib_cps n k =
  match n with
  | 0 | 1 -> k 1
  | n -> fib_cps (n-1)
         (fun x -> fib_cps (n-2) (fun y -> k (x+y)));;


What you are doing in the example above is creating a closure with the next two calls which will placed in the heap.  Once fib_cps(0) and fib_cps(1) are hit then the whole calculation is solved for and the value is returned to the caller of the function.

For the Fibonacci sequence this is still wasteful since same values are calculated many times.



In the figure above, everything in salmon color is wasted effort.  What we would rather do is reuse the values we have already calculated (which is way you would not want to use this method to calculate the 100,000th Fibonacci number).

Along the lines of reducing waste we can look at using a data structure instead of using a closure.

Defunctionalization example (on tryfsharp.org):

type fib_cont =
  | Fib_zero
  | Fib_minus1 of int * fib_cont
  | Fib_minus2 of fib_cont * int;;

let rec fib_cps_defun n k =
  match n with
  | 0 | 1 -> fib_cont_eval 1 k
  | n -> fib_cps_defun (n-1) (Fib_minus1 (n, k))
and fib_cont_eval acc =
  function
  | Fib_zero -> acc
  | Fib_minus1 (n, k) -> 
      fib_cps_defun (n-2) (Fib_minus2 (k, acc))
  | Fib_minus2 (k, acc') -> fib_cont_eval (acc+acc') k;;


I first came across this idea, while reading what I will say is one of the greatest responses on StackOverflow.    What we are doing is constructing the heap with the Fib_minus1 type and then evaluating this to it function fib(n-1) + fib(n-2).  This can be more efficient on systems where data structures are easier to allocate than allocation of a closure call.

We still have not solved the problem of finding the 100,000th Fibonacci number, but we are along the correct path.

Full example (using FsUnit):

open NUnit.Framework
open FsUnit

type fib_cont =
  | Fib_zero
  | Fib_minus1 of int * fib_cont
  | Fib_minus2 of fib_cont * int

let rec fib_cps_defun n k =
  match n with
  | 0 | 1 -> fib_cont_eval 1 k
  | n -> fib_cps_defun (n-1) (Fib_minus1 (n, k))
and fib_cont_eval acc =
  function
  | Fib_zero -> acc
  | Fib_minus1 (n, k) -> fib_cps_defun (n-2) (Fib_minus2 (k, acc))
  | Fib_minus2 (k, acc') -> fib_cont_eval (acc+acc') k

let fib n = fib_cps_defun n Fib_zero

[<Test>]
let ``Validate that fib 0 is 1`` () =
  fib 0 |> should equal 1

[<Test>]
let ``Validate that fib 1 is 1`` () =
  fib 1 |> should equal 1

[<Test>]
let ``Validate that fib 2 is 2`` () =
  fib 2 |> should equal 2

[<Test>]
let ``Validate that fib 3 is 3`` () =
  fib 3 |> should equal 3

[<Test>]
let ``Validate that fib 4 is 5`` () =
  fib 4 |> should equal 5

[<Test>]
let ``Validate that fib 17 is 2584`` () =
  fib 17 |> should equal 2584